Introduction to PyTorch

PyTorch
16 Apr 202123:32

Summary

TLDRIn this video, Brad Heinz, a partner engineer with Facebook's PyTorch team, introduces PyTorch, an open-source machine learning framework that streamlines the process from research to production. Key features include tensors, autograd for dynamic computation, and tools for model building, data loading, and deployment. The tutorial covers installing PyTorch, tensor operations, building neural networks, training loops, and deploying models using TorchScript. It also highlights PyTorch's community, its use in enterprises, and associated projects like AllenNLP and ClassyVision.

Takeaways

  • 😀 Brad Heinz, a partner engineer at Facebook, introduces PyTorch, an open-source machine learning framework.
  • 💻 PyTorch is designed for rapid prototyping and deployment, supporting the full ML application lifecycle from research to production.
  • 📊 Tensors are the core data structure in PyTorch, functioning as multi-dimensional arrays with built-in operations for computation.
  • 🔄 Autograd is PyTorch's automatic differentiation engine that facilitates the computation of derivatives, crucial for model training.
  • 🏗️ PyTorch allows building models using modules, which can be composed into complex architectures like convolutional neural networks.
  • 🔁 The training loop in PyTorch involves feeding data through the model, computing loss, performing backpropagation, and updating model weights.
  • 📚 Torchvision and Torchaudio are associated libraries that provide datasets, pre-trained models, and other utilities for computer vision and audio applications.
  • 🚀 PyTorch supports hardware acceleration, particularly on NVIDIA GPUs, to enhance model training and inference performance.
  • 🔗 Torch Script is a way to serialize and optimize PyTorch models for deployment, enabling high-performance model serving.
  • 🌐 The PyTorch community is extensive, with contributions from around the world, supporting a diverse range of ML projects and applications.

Q & A

  • What is the primary role of Brad Heinz in the video?

    -Brad Heinz is a Partner Engineer working with the PyTorch team at Facebook.

  • What does the video aim to provide for viewers who are new to machine learning with PyTorch?

    -The video provides an introduction to PyTorch, covering its features, key concepts, and associated tools and libraries.

  • Why is it important to match the CUDA toolkit version with the CUDA drivers on a machine?

    -Matching the CUDA toolkit version with the CUDA drivers ensures compatibility for GPU acceleration with PyTorch on Linux or Windows machines with NVIDIA CUDA compatible GPUs.

  • What is PyTorch and what does it accelerate according to the video?

    -PyTorch is an open-source machine learning framework that accelerates the path from research prototyping to production deployment.

  • What are some of the associated libraries for PyTorch mentioned in the video?

    -Some associated libraries for PyTorch include Torch Vision for computer vision applications, and there are also libraries for text, natural language, and audio applications.

  • How does PyTorch enable fast iteration on ML models and applications?

    -PyTorch enables fast iteration by allowing work in regular idiomatic Python without a new domain-specific language, and by using Autograd, PyTorch's automatic differentiation engine.

  • What is the significance of the backward method in PyTorch's Autograd?

    -The backward method in Autograd is significant as it rapidly calculates the gradients needed for learning by utilizing the metadata of each tensor, which tracks the computation history.

  • How is a neural network model typically structured in PyTorch code?

    -A neural network model in PyTorch typically inherits from torch.nn.Module, has an __init__ method for constructing layers, and a forward method for the actual computation.

  • What is the purpose of the data loader in PyTorch?

    -The data loader in PyTorch is used to efficiently feed data to the model in batches, with options to shuffle the data and to parallelize data loading.

  • How does the training loop in PyTorch work as described in the video?

    -The training loop in PyTorch involves iterating over batches of training data, computing predictions and loss, performing a backward pass to calculate gradients, and updating model weights using an optimizer.

  • What is Torch Script and how does it relate to model deployment in PyTorch?

    -Torch Script is a static, high-performance subset of Python that allows for the serialization and optimization of PyTorch models for deployment, enabling them to run without the Python interpreter for better performance.

Outlines

00:00

🤖 Introduction to PyTorch

Brad Heinz, a partner engineer with the PyTorch team at Facebook, introduces PyTorch as an open-source machine learning framework designed to streamline the process from research prototyping to production deployment. The video targets newcomers to machine learning with PyTorch, covering its features, key concepts, and associated tools and libraries. It emphasizes the importance of installing PyTorch and Torch Vision for following along with the demonstrations and exercises. PyTorch is highlighted for its hardware acceleration on NVIDIA GPUs, its rich ecosystem of community projects, and its support for a variety of applications including computer vision, text, and natural language processing. The video also touches on the importance of CUDA compatibility for GPU acceleration on non-Mac platforms.

05:00

🔢 Deep Dive into PyTorch Tensors and Autograd

The script delves into the fundamentals of PyTorch tensors, which are the core data abstraction and central to all operations in PyTorch. It explains how tensors are multi-dimensional arrays with additional functionalities and come with over 300 mathematical and logical operations. The video demonstrates tensor creation, data type specification, and random tensor generation with seeding for reproducibility. It also covers basic arithmetic operations between tensors and the importance of tensor shape compatibility for element-wise operations. The Autograd feature of PyTorch is introduced as the automated differentiation engine that facilitates rapid model iteration by computing derivatives with a single function call, crucial for model training.

10:01

🏗 Building Neural Networks with PyTorch

This section of the script guides viewers through constructing a neural network using PyTorch, specifically the LeNet-5 model, which is a convolutional neural network designed for image classification. The video outlines the process of importing necessary PyTorch modules, defining the network architecture, and utilizing layers like convolutional and linear layers. It emphasizes the model structure, including the initialization method where layers are constructed, and the forward method where the computation happens. The script also demonstrates how to instantiate the model, pass input through it, and examine the output, highlighting the batch dimension in tensor shapes which is essential for processing multiple data points simultaneously.

15:02

📚 Efficient Data Handling with PyTorch

The script discusses the efficient handling of data in PyTorch, necessary for training machine learning models. It introduces the use of the CIFAR-10 dataset and the transformation of images into tensors with normalization to enhance the learning process. The video explains the creation of a dataset instance and the application of transformations to prepare data for model training. It also covers the use of data loaders to organize data into batches, shuffle the order for stochastic training, and parallelize data loading with multiple workers. The script emphasizes the importance of visualizing data batches to ensure correctness before training.

20:02

🏋️‍♂️ Training Neural Networks and Preventing Overfitting

The script describes the process of training a neural network in PyTorch, focusing on the setup of training and test datasets, the choice of a loss function, and the use of an optimizer for updating model weights. It outlines the training loop, which includes gradient zeroing, forward pass for predictions, loss computation, backward pass for gradient calculation, and optimizer updates. The video also addresses the issue of overfitting, suggesting the use of a separate test dataset to evaluate the model's generalization capabilities. The script concludes with a brief mention of deploying trained models using Torch Script, which allows for converting dynamic PyTorch models into a high-performance, serializable format suitable for production environments.

Mindmap

Keywords

💡PyTorch

PyTorch is an open-source machine learning framework used for applications such as computer vision and natural language processing. In the video, PyTorch is introduced as a tool that accelerates the process from research prototyping to production deployment. It is highlighted for its flexibility and ease of use, allowing for rapid iteration in model development, which is a core theme of the video.

💡Tensor

A tensor in PyTorch is a multi-dimensional array, serving as the fundamental data structure for all operations within the framework. The video emphasizes that tensors are central to PyTorch, as they represent the inputs, outputs, and weights of a model. Examples in the script include creating tensors and performing operations like addition and multiplication.

💡Autograd

Autograd is PyTorch's automatic differentiation engine, which facilitates the computation of derivatives in neural networks. It is crucial for the training process, as it automates the backward pass through the model to compute gradients. The video explains how Autograd enables rapid iteration and model flexibility, which are key to efficient machine learning workflows.

💡TorchScript

TorchScript is a way to create serializable and optimizable models from PyTorch code, allowing for deployment without dependency on the Python interpreter. The video discusses how TorchScript can be used to deploy models for production use, emphasizing its role in converting dynamic Python models into a high-performance format suitable for various hardware.

💡Neural Network Layers

Neural network layers are the building blocks of neural networks, used to create models in PyTorch. The video introduces these layers through the example of LeNet, a convolutional neural network, where layers such as convolutional layers and fully connected layers are used to process and classify images. These layers are essential for constructing the model's architecture.

💡Dataset

A dataset in PyTorch refers to a collection of data used for training and testing machine learning models. The video mentions the CIFAR-10 dataset, which consists of 32x32 color images representing 10 classes of objects. Datasets are critical for providing the necessary data for model training and evaluation, as showcased by the script's discussion on data loading and preprocessing.

💡Optimizer

An optimizer in the context of the video is an algorithm used to update the weights of a neural network during training. The video uses stochastic gradient descent as an example of an optimizer, which adjusts the model's weights based on the computed gradients to minimize the loss function. Optimizers are key to the learning process and improving model performance.

💡Loss Function

A loss function measures the difference between the predicted output of a model and the true output. In the video, cross-entropy loss is mentioned as a typical choice for classification problems. The loss function guides the training process by quantifying the error, which is essential for adjusting the model's parameters to improve predictions.

💡Training Loop

The training loop is the process by which a machine learning model is iteratively trained on a dataset. The video describes a basic training loop, which includes feeding batches of data to the model, computing predictions, calculating loss, and updating the model's weights. This loop is central to the model training process and is a key concept in the video's demonstration of how models learn from data.

💡Model Deployment

Model deployment refers to the process of integrating a trained machine learning model into a production environment. The video touches on deployment with TorchScript, indicating how models can be converted and optimized for use in systems that may not have a Python interpreter. This is an important aspect of transitioning from model development to real-world application.

Highlights

Introduction to PyTorch, its features, key concepts, and associated tools and libraries.

Assumption that the viewer is new to machine learning with PyTorch.

Overview of PyTorch and related projects, including core data abstraction and autograd.

Instructions on installing PyTorch and Torch Vision for demos and exercises.

Details on CUDA compatibility and GPU acceleration for PyTorch on different operating systems.

Definition of PyTorch as an open-source machine learning framework.

Description of PyTorch's full toolkit for building and deploying ML applications.

Explanation of hardware acceleration and associated libraries for computer vision, text, and audio applications.

Discussion on PyTorch's ability to enable fast iteration on ML models and applications.

Introduction to PyTorch's automatic differentiation engine, Autograd.

Overview of building a model with PyTorch modules.

Demonstration of a basic training loop for a model in PyTorch.

Introduction to deployment with Torch Script.

Explanation of tensors as the core data structure in PyTorch.

Tutorial on creating and manipulating tensors in PyTorch.

Discussion on the mathematical operations available on PyTorch tensors.

Introduction to autograd and its role in the training process.

Explanation of how to build and run a simple neural network model in PyTorch.

Discussion on data loading and preprocessing for model training.

Overview of the training loop and its components in PyTorch.

Introduction to testing a model for general learning and avoiding overfitting.

Explanation of Torch Script for model deployment and its advantages.

Conclusion and invitation to further explore PyTorch through additional resources.

Transcripts

play00:00

hello my name is brad heinz i'm a

play00:02

partner engineer working with the

play00:04

pytorch team at facebook

play00:06

in this video i'll be giving you an

play00:07

introduction to pi torch

play00:09

its features key concepts and associated

play00:12

tools and libraries

play00:13

this overview assumes that you are new

play00:15

to doing machine learning with pytorch

play00:19

in this video we're going to cover an

play00:21

overview of pi torch and related

play00:23

projects

play00:24

tensors which are the core data

play00:26

abstraction of pi torch

play00:27

auto grad which drives the eager mode

play00:29

computation that makes rapid iteration

play00:31

in your model possible

play00:33

we'll talk about building a model with

play00:35

uh with pi torch modules

play00:38

we'll talk about how to load your data

play00:40

efficiently to train your model

play00:43

we'll demonstrate a basic training loop

play00:46

and finally we'll talk about deployment

play00:47

with torch script

play00:52

before we get started you'll want to

play00:53

install pi torch and torch vision so you

play00:55

can follow along with the demos and

play00:57

exercises

play00:58

if you haven't installed the latest

play01:00

version of pytorch yet visit pytorch.org

play01:03

the front page has an install wizard

play01:05

shown here

play01:06

there are two important things to note

play01:08

here first cuda drivers are not

play01:10

available for the mac

play01:12

therefore gpu acceleration is not going

play01:14

to be available by a pi torch on the mac

play01:17

second if you're working on a linux or

play01:18

windows machine with one or more nvidia

play01:20

cuda compatible gpus attached

play01:22

make sure the version of cuda toolkit

play01:24

you install matches the cuda drivers on

play01:26

your machine

play01:29

so what is pi torch pytorch.org

play01:33

tells us that pi torch is an open source

play01:35

machine learning framework that

play01:36

accelerates the path from research

play01:38

prototyping to production deployment

play01:40

let's unpack that first

play01:44

pytorch is software for machine learning

play01:46

it contains a full tool kit for building

play01:48

and deploying ml applications

play01:49

including deep learning primitives such

play01:51

as neural network layer types

play01:53

activation functions and gradient based

play01:55

optimizers it has hardware acceleration

play01:57

on nvidia gpus

play01:59

and it has associated libraries for

play02:01

computer vision text and natural

play02:02

language and audio applications

play02:05

torch vision the pi torch library for

play02:07

computer vision applications

play02:08

also includes pre-trained models and

play02:10

package data sets that you can use to

play02:12

train your own models

play02:14

pi torch is built to enable fast

play02:16

iteration on your ml models and

play02:17

applications

play02:18

you can work in regular idiomatic python

play02:21

there's no new domain specific language

play02:23

to learn to build your computation graph

play02:25

with auto grad pytorch's automatic

play02:27

differentiation engine

play02:28

the backward pass over your model is

play02:30

done with a single function call

play02:32

and done correctly no matter which path

play02:33

through the code a computation took

play02:35

offering you unparalleled flexibility in

play02:37

model design

play02:39

pytorch has the tooling to work at

play02:41

enterprise scale with tools like torch

play02:43

script which is a way to create

play02:44

serializable and optimizable models from

play02:46

your pi torch code

play02:47

torch serve pi torch's model serving

play02:49

solution and multiple options for

play02:51

quantizing your model for performance

play02:55

and finally pytorch is free and open

play02:57

source software

play02:58

free to use and open to contributions

play03:00

from the community

play03:01

its open source nature fosters a rich

play03:03

ecosystem of community projects as well

play03:05

supporting use cases from stochastic

play03:07

processes to graph based neural networks

play03:11

the pi torch community is large and

play03:13

growing with over 1200 contributors to

play03:15

the project from around the world

play03:16

and over 50 percent year-on-year growth

play03:18

in research paper citations

play03:21

pytorch is in use at top tier companies

play03:23

like these and provides the foundations

play03:25

for projects like allen nlp

play03:27

the open source research library for

play03:28

deep learning with natural language

play03:31

ai which simplifies training fast and

play03:33

accurate neural nets using best modern

play03:35

practices

play03:36

classyvision an end-to-end framework for

play03:38

image and video classification

play03:41

and captum an open source extensible

play03:43

library that helps you understand and

play03:44

interpret your model's behavior

play03:47

now that you've been introduced to pi

play03:49

torch let's look under the hood

play03:51

tensors will be at the center of

play03:52

everything you do in pi torch

play03:54

your model's inputs outputs and learning

play03:56

weights are all in the form of tensors

play03:59

now if tensor is not a part of your

play04:00

normal mathematical vocabulary

play04:02

just know that in this context we're

play04:04

talking about a multi-dimensional array

play04:06

with a lot of extra bells and whistles

play04:08

pi torch tensors come bundled with over

play04:10

300 mathematical and logical operations

play04:12

that can be performed on them

play04:14

though you access tensors through a

play04:15

python api the computation actually

play04:18

happens in compiled c

play04:19

plus code optimized for cpu and gpu

play04:25

let's look at some typical tensor

play04:26

manipulations in pi torch

play04:32

the first thing we'll need to do is

play04:33

import pi torch with the import torch

play04:35

call

play04:38

then we'll go ahead and create our first

play04:39

tensor here i'm going to take

play04:41

create a two-dimensional tensor with

play04:43

five rows and three columns and fill it

play04:45

with zeros

play04:46

i'm going to query it for the data type

play04:47

of those zeros

play04:50

and here you can see i got my requested

play04:52

matrix of 15 zeros

play04:54

and the data type is 32-bit floating

play04:56

point by default

play04:57

pi torch creates all tensors as 32-bit

play05:00

floating point

play05:01

what if you wanted integers instead you

play05:03

can always override the default

play05:05

here in the next cell i create a

play05:07

tensorflow of ones i request that they

play05:09

be 16-bit integers

play05:11

and note that when i print it without

play05:12

being asked pi torch tells me that these

play05:14

are 16-bit integers because it's not the

play05:16

default that might not be what i expect

play05:20

it's common to initialize learning

play05:21

weights randomly often with a specific

play05:24

seed for the random number generators

play05:25

that you can reproduce your results on

play05:27

subsequent runs

play05:28

here we demonstrate seeding the pytorch

play05:31

random number generator with a specific

play05:33

number

play05:34

generating a random tensor generating a

play05:37

second random tensor which we expect to

play05:39

be different from the first

play05:41

re-seating the random number generator

play05:43

with the same input

play05:44

and then finally creating another random

play05:46

tensor which we expect to match the

play05:48

first

play05:48

since it was the first thing created

play05:50

after seeding the rng

play05:53

and sure enough those are the results we

play05:54

get first tensor and the third tensor do

play05:57

match and the second one does not

play06:01

arithmetic with pytorch tensors is

play06:03

intuitive tensors of similar

play06:05

shapes may be added or multiplied etc

play06:07

and operations between a scalar and a

play06:09

tensor will distribute

play06:10

over all the cells of the tensor so

play06:13

let's look at a couple of examples

play06:15

first i'm just going to create a

play06:16

tensorful of ones

play06:19

then i'm going to create another

play06:20

tensorflow of ones but i'm going to

play06:21

multiply it by a scalar 2. and what's

play06:23

going to happen

play06:24

is all of those ones are going to become

play06:27

twos the multiplication is distributed

play06:28

over every

play06:29

element of the tensor then i'll add the

play06:32

two tensors i can do this

play06:34

because they're of the same shape the

play06:36

operation happens element wise between

play06:38

the two of them

play06:39

and we get out now a tenser full of

play06:40

threes when i query that tensor for its

play06:43

shape

play06:43

it's the same shape as the two input

play06:46

tensors

play06:47

from the addition operation finally i

play06:50

create two random tensors of different

play06:52

shapes

play06:53

and attempt to add them i get a runtime

play06:55

error because there's no

play06:57

clean way to do element-wise arithmetic

play07:00

operations between two tensors of

play07:02

different shapes

play07:04

here's a small sample of the

play07:05

mathematical operations available on pi

play07:07

torch tensors

play07:08

i'm going to create a random tensor and

play07:10

adjust it so that its values are between

play07:12

-1 and 1.

play07:15

i can take the absolute value of it and

play07:17

see all the values turn positive

play07:19

i can take the inverse sine of it

play07:21

because the values

play07:22

are between -1 and 1 and get an angle

play07:25

back

play07:26

i can do linear algebra operations like

play07:29

taking the determinant or doing singular

play07:31

value decomposition

play07:33

and there are statistical and aggregate

play07:34

operations as well

play07:36

means and standard deviations and

play07:38

minimums and maximums etc

play07:40

there's a good deal more to know about

play07:42

the power of pi torch tensors including

play07:44

how to set them up for parallel

play07:46

computation on gpu

play07:47

we'll be going into more depth in

play07:48

another video

play07:51

as an introduction to auto grad

play07:52

pytorch's automated differentiation

play07:54

engine

play07:55

let's consider the basic mechanics of a

play07:57

single training pass

play08:00

for this example we'll use a simple

play08:01

recurrent neural network or rnn

play08:04

we start with four tensors x the input

play08:07

h the hidden state of the rnn that gives

play08:10

it its memory

play08:11

and two sets of learning weights one

play08:12

each for the input and the hidden state

play08:16

next we'll multiply the weights by their

play08:17

respective tensors

play08:19

mm here stands for matrix multiplication

play08:23

after that we add the outputs of the two

play08:25

matrix multiplications

play08:27

and pass the result through an

play08:28

activation function here hyperbolic

play08:30

tangent

play08:32

and finally we compute the loss for this

play08:34

output the loss is the difference

play08:36

between the correct

play08:37

output and the actual prediction of our

play08:39

model

play08:41

so we've taken a training input run it

play08:43

through a model

play08:44

gotten an output and determined the loss

play08:47

this is the point in the training loop

play08:48

where we have to compute the derivatives

play08:50

of that loss

play08:51

with respect to every parameter of the

play08:52

model and use the gradients over

play08:54

learning weights to decide how to adjust

play08:56

those weights

play08:57

in a way that reduces the loss even for

play09:00

a small model like this that's a bunch

play09:02

of parameters and a lot of derivatives

play09:03

to compute

play09:04

but here's the good news you can do it

play09:06

in one line of code

play09:09

each tensor generated by this

play09:10

computation knows how it came to be

play09:13

for example i2h carries metadata

play09:15

indicating that it came from the matrix

play09:17

multiplication

play09:18

of wx and x and so it continues down the

play09:20

rest of the graph

play09:22

this history tracking enables the

play09:24

backward method to rapidly calculate the

play09:26

gradients your model needs for learning

play09:28

this history tracking is one of the

play09:29

things that enables flexibility and

play09:31

rapid iteration in your models

play09:33

even in a complex model with decision

play09:35

branches and loops the computation

play09:36

history will track the particular path

play09:39

through the model that a particular

play09:40

input took and compute the backward

play09:42

derivatives correctly

play09:44

in a later video we'll show you how to

play09:46

do more tricks with auto grad

play09:47

like using the auto grad profiler and

play09:49

taking second derivatives

play09:50

and how to turn off autograd when you

play09:52

don't need it

play09:54

we've talked so far about tensors and

play09:56

automatic differentiation

play09:57

and some of the ways they interact with

play09:59

your pi torch model but what does that

play10:01

model look like in code

play10:03

let's build and run a simple one to get

play10:04

a feel for it

play10:07

first we're going to import pi torch

play10:09

we're also going to import torch.nn

play10:12

which contains the neural network layers

play10:14

that we're going to compose into our

play10:15

model

play10:16

as well as the parent class of the model

play10:17

itself and we're going to import

play10:20

torch.nn.functional to give us

play10:21

activation functions

play10:22

and max pooling functions that we'll use

play10:25

to connect the layers

play10:27

so here we have a diagram of linette v

play10:29

it's one of the earliest convolutional

play10:31

neural networks and one of the drivers

play10:33

of the explosion in deep learning

play10:34

it was built to read small images of

play10:36

handwritten numbers

play10:38

the endness data set and correctly

play10:40

classify which digit was represented in

play10:42

the image

play10:43

here's the abridged version of how it

play10:45

works layer c1

play10:46

is a convolutional layer meaning that it

play10:48

scans the input image for features that

play10:50

learn during training

play10:51

it outputs a map of where it saw each of

play10:54

each of its learned features in this

play10:55

image

play10:56

this activation map is down sampled in

play10:58

layer s2

play11:00

layer c3 is another convolutional layer

play11:03

this time scanning c1's activation map

play11:05

for combinations of features

play11:07

it also puts out an activation map

play11:08

describing the spatial locations of

play11:10

these

play11:10

feature combinations which is

play11:12

downsampled in layer s4

play11:15

finally the fully connected layers of

play11:17

the end f5

play11:18

f6 and output are a classifier that

play11:20

takes the final activation map

play11:22

and classifies it into one of 10 bins

play11:24

representing the 10 digits

play11:27

so how do we express this simple neural

play11:29

network in code

play11:32

looking over this code you should be

play11:33

able to spot some structural

play11:34

similarities with the diagram above

play11:37

this demonstrates the structure of a

play11:38

typical pi torch model

play11:40

it inherits from torch.n.module and

play11:43

modules may be nested

play11:45

in fact even the com2d and linear layers

play11:47

here

play11:48

are subclasses of torsta and n.module

play11:51

every model will have an init where it

play11:54

constructs the layers that it will

play11:55

compose into its computation graph

play11:58

and loads any data artifacts it might

play12:00

need for example an nlp model might load

play12:02

a vocabulary

play12:04

a model will have a forward function

play12:06

this is where the actual computation

play12:08

happens

play12:08

an input is passed through the network

play12:10

layers and various functions to generate

play12:12

an output a prediction

play12:14

other than that you can build out your

play12:15

model class like any other python class

play12:17

adding whatever properties and methods

play12:19

you need to support your model's

play12:20

computation

play12:22

so let's distantiate this

play12:27

and run an input through it so there are

play12:30

a few important things happening here

play12:32

we're creating an instance of limit we

play12:35

are printing the

play12:35

net object now a subclass of torster and

play12:38

in a module

play12:40

will report the layers it has created

play12:42

and their shapes and parameters

play12:43

this can provide a handy overview of a

play12:45

model if you want to get the gist of its

play12:46

processing

play12:48

below that we create a dummy input

play12:50

representing a 32 by 32

play12:52

image with one color channel normally

play12:54

you would load an image tile and convert

play12:56

it to a tensor of this shape

play12:58

you may have noticed an extra dimension

play12:59

to our tensor this is the batch

play13:01

dimension

play13:02

pi torch models assume they are working

play13:04

on batches of data

play13:06

for example a batch of 16 of our image

play13:08

tiles would have the shape

play13:09

16 by 1 by 32 by 32

play13:12

since we're only using one image we

play13:14

create a batch of one

play13:16

with shape one by one by 32 by 32.

play13:20

we ask the model for an inference by

play13:22

calling it like a function

play13:23

net input the output of this call

play13:26

represents the model's confidence that

play13:27

the input represents a particular digit

play13:30

since this instance of the model hasn't

play13:32

been trained we shouldn't expect to see

play13:34

any signal in the output

play13:36

looking at the shape of the output we

play13:37

can see that it also has a batch

play13:38

dimension the size of which should

play13:40

always match the input batch dimension

play13:42

had we passed in an input batch of 16

play13:44

instances output

play13:46

have a shape of 16 by 10. you've seen

play13:49

how a model is built and how to give it

play13:50

a batch of input and examine the output

play13:53

the model didn't do much though because

play13:54

it hasn't been trained yet

play13:56

for that we'll need to feed it a bunch

play13:58

of data

play14:01

in order to train our model we're going

play14:03

to need a way to feed it data in bulk

play14:05

this is where the pi torch data set and

play14:07

data loader classes come into play

play14:09

let's see them in action so here i'm

play14:12

declaring

play14:13

matplotlib inline because we'll be

play14:15

rendering some images in the notebook

play14:17

i'm importing pi torch i'm also

play14:19

importing torch vision and torch vision

play14:21

transforms

play14:22

these are going to give us our data sets

play14:24

and some transforms that we need to

play14:25

apply to the images to make them

play14:28

digestible by our pi torch model

play14:31

so the first thing we need to do is

play14:33

transform our incoming images into a pi

play14:35

torch tensor here we specify two

play14:38

transformations for our input

play14:40

transforms to tensor takes images loaded

play14:42

by pi the pillow library

play14:44

and converts them into pi torch tensors

play14:47

transformers.normalize adjusts the

play14:49

values of the tensor so that their

play14:50

average is zero and their standard

play14:52

deviation is 0.5

play14:54

most activation functions have their

play14:56

strongest gradients around the zero

play14:57

point

play14:58

so centering our data there can speed

play15:00

learning

play15:02

there are many more transforms available

play15:03

including cropping centering rotation

play15:06

reflection and most of the other things

play15:07

you might do to an image

play15:10

next we're going to create an instance

play15:11

of the cifar10 dataset

play15:13

this is a set of 32x32 color image tiles

play15:16

representing 10 classes of objects

play15:18

six of animals and four vehicles

play15:22

when you're in the cell above it may

play15:24

take a minute or two for this

play15:26

the data set to finish downloading for

play15:28

you so be aware of that

play15:30

so this is an example of creating a data

play15:32

set in pi torch downloadable data sets

play15:34

like cipher 10 above are subclasses of

play15:36

torch

play15:37

utils data data set data set classes in

play15:40

pi torch include the downloadable data

play15:42

sets in torch vision

play15:43

torch text and torch audio as well as

play15:45

utility data set classes such as

play15:48

torchvision.datasets.imagefolder

play15:49

which will read a folder of labeled

play15:51

images you can also create your own

play15:53

subclasses of data set

play15:55

when we instantiate our data set we need

play15:57

to tell it a few things

play15:59

the file system path where we want the

play16:01

data to go whether or not we're using

play16:03

this set for training because most data

play16:04

sets will be split between training and

play16:06

test subsets

play16:08

whether we would like to download the

play16:09

data set if we haven't already

play16:11

and the transformations that we want to

play16:12

apply to the images

play16:14

once you have your data set ready you

play16:16

can give it to the data loader

play16:19

now a data set subclass wraps access to

play16:22

the data

play16:23

and if specialized the type of the data

play16:24

is serving

play16:26

the data loader knows nothing about the

play16:27

data but organizes the input tensors

play16:30

served by the data set

play16:31

into batches with the parameters you

play16:32

specify in the example above we've asked

play16:35

the data loader to give us batches of

play16:36

four

play16:37

images from train set randomizing their

play16:40

order

play16:41

with shuffle equals true and we told it

play16:43

to spin up two workers to load data from

play16:45

disk

play16:46

it's good practice to visualize the

play16:47

batches your data loader loader serves

play16:51

running the running the shell should

play16:52

show you a strip of four images and you

play16:54

should see a correct label for each one

play16:56

and so here are four images which

play17:00

do in fact look like a cat a deer and

play17:01

two trucks

play17:03

we've looked under the hood at tensors

play17:04

and autograd and we've seen how pi torch

play17:06

models are constructed and how to

play17:08

efficiently

play17:09

feed them data it's time to put all the

play17:11

pieces together

play17:12

and see how a model gets trained

play17:17

so here we are back in our notebook

play17:18

you'll see the

play17:20

imports here all of these should look

play17:22

familiar from

play17:23

earlier in the video except for

play17:24

torch.optum which i'll be talking about

play17:26

soon

play17:31

the first thing we'll need is training

play17:33

and test data sets

play17:34

so if you haven't already run the cell

play17:36

below and make sure the data set is

play17:37

downloaded it may take a minute if you

play17:39

haven't done so already

play17:42

we'll run our check on the output from

play17:44

the data loader

play17:48

and again we should see a strip of four

play17:50

images

play17:51

a plain plain plain ship that looks

play17:53

correct

play17:54

so our data electro is good this is the

play17:57

model we'll train

play17:58

now if this model looks familiar it's

play18:00

because it's a variant of lynette which

play18:01

we discussed earlier in this video but

play18:03

it's adapted to take three color images

play18:09

the final ingredients we need are a loss

play18:11

function and an optimizer

play18:13

the loss function as discussed earlier

play18:15

in this video is a measure of how far

play18:17

from our ideal output the model's

play18:18

prediction was

play18:19

cross entropy loss is a typical loss

play18:21

function for classification models like

play18:23

ours

play18:24

the optimizer is what drives the

play18:26

learning here

play18:28

we've created an optimizer that

play18:29

implements stochastic gradient descent

play18:32

one of the more straightforward

play18:33

optimization algorithms besides

play18:35

parameters of the algorithm

play18:37

like the learning rate and momentum we

play18:39

also pass in net dot parameters

play18:41

which is a collection of all the

play18:43

learning weights in the model which is

play18:44

what the optimizer adjusts

play18:46

finally all this is assembled into the

play18:48

training loop

play18:50

go ahead and run this cell as it'll take

play18:52

a couple of minutes to execute

play18:56

so here we're only doing two training

play18:59

epics as

play19:00

you can see from line one that is two

play19:01

complete passes over the training data

play19:04

set

play19:04

each pass has an inner loop that

play19:06

iterates over the training data

play19:08

serving batches of transformed images in

play19:10

their correct labels

play19:12

zeroing the gradients in line nine is a

play19:14

very important step

play19:15

when you run a batch gradients are

play19:17

accumulated over that batch

play19:18

and if we don't reset the gradients for

play19:20

every batch they will keep accumulating

play19:22

and provide incorrect values and

play19:25

learning will stop

play19:27

in line 12 we ask the model for its

play19:29

actual prediction of the batch

play19:31

in the following line line 13 we compute

play19:33

the loss

play19:34

the difference between the outputs and

play19:36

the labels in line 14 we do our backward

play19:39

pass and calculate the gradients that

play19:41

will direct the learning

play19:42

in line 15 the optimizer performs one

play19:45

learning step

play19:46

it uses the gradients from the backward

play19:48

call to nudge the learning weights in

play19:50

the direction it thinks will reduce the

play19:52

loss

play19:53

so the remainder of the loop just does

play19:54

some light reporting on the epic number

play19:56

and how many training instances have

play19:57

been completed and

play19:59

what the collected loss is uh over the

play20:02

uh

play20:03

the training eight buck epic

play20:09

so note that the loss is monotonically

play20:11

descended indicating that our model is

play20:13

continuing to improve its performance on

play20:15

the training data set

play20:16

as a final step we should check that the

play20:18

model is actually doing general

play20:20

learning and not simply memorizing the

play20:21

data set this is called overfitting

play20:24

and will often indicate that either your

play20:25

data set is too small and doesn't have

play20:27

enough examples

play20:28

or that your model is too large it's

play20:30

overspecified

play20:32

for modeling the data uh you're feeding

play20:35

it

play20:37

so our training is done so anyways the

play20:41

way we uh check for overfitting and

play20:44

guard against it

play20:45

is to test the model on data it hasn't

play20:48

trained on

play20:48

that's why we have a test data set so

play20:51

here i'm just going to

play20:52

run the test data through we'll get an

play20:54

accuracy measure out

play20:57

55 okay so that's not exactly

play21:00

state-of-the-art but it's much better

play21:01

than the 10

play21:02

we'd expect to see from a random output

play21:05

this demonstrates that some general

play21:06

learning did happen in the model

play21:10

now when you go to the trouble of

play21:12

building and training a non-trivial

play21:13

model is usually because you want to use

play21:15

it for something

play21:16

you need to connect it to a system that

play21:18

feeds it inputs

play21:19

and processes the model's predictions if

play21:22

you're keen on optimizing performance

play21:24

you may want to do this without a

play21:25

dependency on the python interpreter

play21:27

the good news is that pi torch

play21:29

accommodates you with torch script

play21:32

torch script is a static high

play21:34

performance subset of python

play21:36

when you convert a model to torch script

play21:37

the dynamic and pythonic nature of your

play21:39

model is fully preserved

play21:41

control flow is preserved when referring

play21:42

to torch torchscript and you can still

play21:44

use convenient python data structures

play21:46

like lists and dictionaries

play21:48

looking at the code on the right you'll

play21:50

see a pi torch model defined in python

play21:53

below that an instance of the model is

play21:54

created and then we'll call

play21:56

torch.jet.script mymodule that one line

play21:59

of code is all it takes to convert your

play22:01

python model to torchscript

play22:03

the serialized version of this gets

play22:04

saved in the final line

play22:06

and it contains all the information

play22:08

about your model's computation graph

play22:10

and its learning weights

play22:13

the torch script rendering of the model

play22:15

is shown at the right

play22:17

torchscript is meant to be consumed by

play22:19

the pytorch just in time compiler

play22:21

or jit the jit seeks runtime

play22:23

optimizations such as operation

play22:25

reordering and layer fusion

play22:27

to maximize your model's performance on

play22:29

cpu or gpu hardware

play22:33

so how do you load and execute a torque

play22:34

script model you start by loading the

play22:37

serialized package with torch.jit.load

play22:40

and then you can call it just like any

play22:41

other model what's more you can do this

play22:44

in python

play22:45

or you can load it into the pi torch c

play22:48

plus

play22:48

runtime to remove the interpreted

play22:50

language dependency

play22:52

in subsequent videos we'll go into more

play22:53

detail about torch script

play22:55

best practices for deployment and we'll

play22:57

cover torch serve pi torch's model

play22:59

serving solution

play23:01

so that's our lightning fast overview of

play23:03

pi torch the models and data sets we

play23:05

used here were quite simple but pytorch

play23:07

is used in production at large

play23:09

enterprises for powerful

play23:10

real-world use cases like translating

play23:12

between human languages

play23:14

describing the content of video scenes

play23:16

or generating realistic human voices

play23:19

in the videos to follow will give you

play23:20

access to that power we'll go deeper on

play23:22

all the topics covered here

play23:24

with more complex use cases like the

play23:25

ones you'll see in the real world

play23:27

thank you for your time and attention

play23:29

and i hope to see you around the pytorch

play23:31

forums

Rate This

5.0 / 5 (0 votes)

相关标签
Machine LearningPyTorch TutorialDeep LearningNeural NetworksTensor OperationsAutograd EngineModel TrainingData LoadingTorch ScriptAI Deployment
您是否需要英文摘要?