What is Recurrent Neural Network (RNN)? Deep Learning Tutorial 33 (Tensorflow, Keras & Python)

codebasics
12 Jan 202115:59

Summary

TLDRThis video tutorial covers the basics of Recurrent Neural Networks (RNNs) and their applications in natural language processing (NLP). It explains why RNNs are suitable for sequence modeling tasks, such as language translation, sentiment analysis, and named entity recognition, due to their ability to handle sequential data and remember context. The tutorial also highlights the limitations of using traditional neural networks for these tasks, emphasizing the importance of sequence in language. Practical examples, including Google autocomplete and translation, illustrate RNNs' effectiveness in real-life scenarios.

Takeaways

  • 🧠 Recurrent Neural Networks (RNNs) are primarily used for natural language processing (NLP) tasks, in contrast to Convolutional Neural Networks (CNNs) which are mainly used for image processing.
  • πŸ”„ RNNs are designed to handle sequence data where the order of elements matters, as opposed to traditional Artificial Neural Networks (ANNs) which do not consider sequence order.
  • πŸ’‘ Google's Gmail auto-complete feature is an example of an application of RNNs, demonstrating their ability to predict and complete sentences based on the context of previous words.
  • 🌐 Google Translate is another application of RNNs, showcasing their use in translating sentences from one language to another, highlighting the importance of sequence in language translation.
  • πŸ” Named Entity Recognition (NER) is a use case for RNNs where the network identifies and categorizes entities in a text, such as names of people, companies, and places.
  • πŸ“Š Sentiment Analysis is a task where RNNs can determine the sentiment of a product review, classifying it into categories like one star to five stars.
  • πŸ“ˆ One of the limitations of using ANNs for sequence modeling is the fixed size of input and output layers, which does not accommodate variable sentence lengths.
  • πŸ”’ The computational inefficiency of ANNs in sequence tasks is due to the need for one-hot encoding of words, which can lead to a very large input layer with many neurons.
  • πŸ”— The lack of parameter sharing in ANNs when dealing with different sentence structures that convey the same meaning is another drawback, as RNNs can share parameters through their recurrent connections.
  • πŸ”„ RNNs maintain a 'memory' of previous inputs through their recurrent structure, allowing them to understand and process language with context, unlike ANNs.
  • πŸ”§ Training an RNN involves passing all training samples through the network multiple times (epochs), adjusting weights based on loss to minimize errors and improve predictions.

Q & A

  • What is the main focus of the video script?

    -The video script focuses on explaining Recurrent Neural Networks (RNNs), their applications in natural language processing (NLP), and how they differ from other types of neural networks like CNNs and ANNs.

  • What are the primary use cases for Recurrent Neural Networks mentioned in the script?

    -The primary use cases for RNNs mentioned are auto-completion in Gmail, language translation using Google Translate, Named Entity Recognition (NER), and Sentiment Analysis.

  • Why are Recurrent Neural Networks particularly suited for natural language processing tasks?

    -RNNs are suited for NLP tasks because they can handle sequence data where the order of elements is important, unlike traditional neural networks which do not consider the sequence of inputs.

  • How does the script describe the process of auto-completion in Gmail using RNNs?

    -The script describes the auto-completion process as the user types a sentence, and the RNN embedded in Gmail predicts and completes the sentence based on the context provided by the user's input.

  • What are the challenges with using a simple neural network for sequence modeling problems?

    -The challenges include the need for a fixed sentence size, high computational cost due to one-hot encoding, and the inability to share parameters across different sentences with the same meaning.

  • How does the script explain the concept of one-hot encoding in the context of neural networks?

    -One-hot encoding is explained as a method of converting words into vectors where each word is represented by a vector with a '1' at its corresponding position in the vocabulary and '0's elsewhere.

  • What is the significance of the sequence in language translation according to the script?

    -The sequence is significant because changing the order of words in a sentence can completely alter its meaning, which is a key aspect that RNNs can handle but simple neural networks cannot.

  • How does the script describe the architecture of an RNN?

    -The script describes an RNN as having a single hidden layer that processes words one by one, carrying the context from the previous word to the next, thus providing a memory of the sequence.

  • What is the purpose of Named Entity Recognition (NER) as explained in the script?

    -The purpose of NER is to identify and classify entities in text, such as recognizing 'Dhaval' and 'baby yoda' as person names in the given example.

  • How does the script explain the training process of an RNN for Named Entity Recognition?

    -The script explains the training process as initializing the network weights, processing each word in the training samples, calculating the predicted output, comparing it with the actual output, and adjusting the weights to minimize the loss through backpropagation.

  • What are the components of the language translation process using RNNs as described in the script?

    -The components include an encoder that processes the input sentence and a decoder that translates it into the target language, with the process requiring all words to be supplied before translation can occur.

Outlines

00:00

🧠 Introduction to Recurrent Neural Networks (RNNs) for NLP

This paragraph introduces the concept of Recurrent Neural Networks (RNNs) and their primary application in natural language processing (NLP) tasks. It contrasts RNNs with Convolutional Neural Networks (CNNs), which are more suited for image processing. The speaker outlines various real-life applications of RNNs, such as Gmail's auto-complete feature, Google Translate, Named Entity Recognition (NER), and Sentiment Analysis. The importance of sequence in language is emphasized, and the limitations of using simple neural networks for sequence modeling problems are discussed, including issues with sentence size variability, neuron count, and computational efficiency.

05:01

πŸ”„ Understanding the Working of RNNs in Named Entity Recognition

This section delves deeper into how RNNs work, using Named Entity Recognition (NER) as an example. It explains the process of converting words into vectors, typically through one-hot encoding, and then feeding these vectors into a neural network layer. The unique aspect of RNNs is their ability to maintain a memory of previous outputs, which is crucial for understanding the context in language. The paragraph illustrates how RNNs process words sequentially, carrying the context from previous words to inform the interpretation of subsequent words, thus effectively recognizing named entities within a text.

10:04

πŸ”„ Deep Dive into RNN's Sequential Processing and Training

The paragraph further explains the sequential processing nature of RNNs, emphasizing the 'time travel' concept where the same layer of neurons is used repeatedly at different time steps to process a sequence of words. It clarifies that despite the appearance of multiple layers in the visual representation, there is essentially only one hidden layer that is being used over time. The training process of RNNs is also discussed, detailing how the network is trained on samples for tasks like NER, using backpropagation and gradient descent to minimize loss and adjust weights over multiple epochs.

15:06

🌐 RNNs in Language Translation and Deep RNN Architecture

The final paragraph discusses the application of RNNs in language translation, describing the process of encoding and decoding where the network uses the context of all input words to produce an accurate translation. It also touches upon the possibility of having deep RNNs with multiple hidden layers to capture more complex patterns in data. The paragraph concludes by reinforcing the importance of RNNs in handling sequence-dependent tasks where the order of elements significantly impacts the outcome, and invites viewers to look forward to more tutorials on RNNs and NLP.

Mindmap

Keywords

πŸ’‘Recurrent Neural Network (RNN)

Recurrent Neural Network, or RNN, is a type of artificial neural network used in various fields, particularly in natural language processing (NLP). It is designed to recognize patterns in sequences of data, such as text or speech. In the video, RNN is highlighted as the main topic, with examples of its applications in Gmail's auto-complete feature and Google Translate, showcasing its ability to handle sequential data effectively.

πŸ’‘Natural Language Processing (NLP)

Natural Language Processing refers to the interaction between computers and human language. It involves understanding, interpreting, and generating human language in a way that computers can process and analyze. The video emphasizes the use of RNNs in NLP tasks, such as translation and sentiment analysis, where the sequence of words is crucial for meaning.

πŸ’‘Sequence Modeling

Sequence Modeling is a concept in machine learning where the model is trained to understand the order of the data points. In the context of the video, sequence modeling is essential for tasks like translation and Named Entity Recognition (NER), where the arrangement of words affects the overall meaning, and RNNs are particularly adept at handling such sequences.

πŸ’‘Auto-complete

Auto-complete is a feature in many text input systems that predicts and suggests the completion of a word or sentence based on the initial characters or words typed by the user. The video script mentions Gmail's auto-complete as an example of RNNs in action, where the network predicts the next word in a sequence after the user types a few characters.

πŸ’‘Google Translate

Google Translate is an online service that automatically translates text from one language to another. The script uses Google Translate as an example of how RNNs can be applied to language translation, where the network learns to map sequences of words from one language to another while preserving their meaning.

πŸ’‘Named Entity Recognition (NER)

Named Entity Recognition is a subfield of NLP that involves identifying and classifying named entities in text into predefined categories such as person names, organizations, locations, etc. The video script explains how RNNs can be used for NER, where the network learns to recognize and categorize entities within a given text.

πŸ’‘Sentiment Analysis

Sentiment Analysis is the process of determining the sentiment or emotional tone behind a series of words, used to gain an understanding of the attitudes, opinions, and emotions of a speaker or writer. The video mentions sentiment analysis as a use case for RNNs, where the network evaluates the sentiment of a product review or any given text.

πŸ’‘One-hot Encoding

One-hot Encoding is a representation of categorical variables as a binary vector, where each category is represented by a unique vector with a single '1' and the rest '0's. In the video, one-hot encoding is discussed as a method for converting words into vectors for input into neural networks, which is crucial for handling text data in machine learning models.

πŸ’‘Artificial Neural Network (ANN)

An Artificial Neural Network is a computational model inspired by the way biological neural networks in the human brain process information. The video script contrasts ANNs with RNNs, explaining that while ANNs are good for certain tasks, they do not handle sequential data as effectively as RNNs, which is why RNNs are preferred for tasks like translation and NER.

πŸ’‘Training Samples

Training samples are the input-output pairs used to train a machine learning model. In the context of the video, training samples are used to teach the RNN to perform tasks like NER, where the input is a statement and the output is the classification of each word as a person's name or not.

πŸ’‘Backpropagation

Backpropagation is a method used in artificial neural networks to calculate the gradient of the loss function with respect to the weights. The video script explains that after calculating the loss between the predicted and actual outputs, backpropagation is used to adjust the weights of the network, which is a fundamental part of the training process for RNNs.

Highlights

Introduction to Recurrent Neural Networks (RNNs) and their primary use in natural language processing (NLP) tasks.

Comparison of CNNs and RNNs, with CNNs being mainly for image processing and RNNs for NLP.

Explanation of RNNs' ability to handle sequence modeling problems where the order of data matters.

Demonstration of RNNs in Gmail's auto-complete feature, showcasing its practical application.

Use of RNNs in Google Translate for efficient language translation.

Application of RNNs in Named Entity Recognition (NER) to identify entities like person names in text.

RNNs in Sentiment Analysis to determine the sentiment of product reviews.

Challenges with using simple neural networks for sequence problems due to their inability to handle variable sentence lengths.

Issues with high computational costs when converting words to vectors using one-hot encoding for neural networks.

Problem of parameter non-sharing in translation when different English sentences result in the same Hindi translation.

The importance of sequence in language translation and how it affects the meaning of sentences.

Introduction to the concept of Named Entity Recognition (NER) and its goal to identify entities within text.

Description of how RNNs process text word by word while carrying the context from previous words.

Explanation of the training process for RNNs, including initialization, loss calculation, and weight adjustment.

RNN architecture in the context of language translation, emphasizing the encoder and decoder parts.

Potential for deep RNNs with multiple hidden layers for more complex processing.

Summary of RNN's architecture and its necessity for sequence-dependent tasks in language.

Transcripts

play00:00

So far in our deep learning tutorial

play00:02

series we looked at

play00:03

Artificial Neural Network and

play00:05

Convolutional Neural Network which is

play00:06

mainly used for image processing.

play00:08

In this video we will talk about

play00:10

Recurrent Neural Network which is used

play00:12

mainly for

play00:13

natural language processing tasks so if

play00:15

you think about deep learning overall,

play00:17

CNNs are mainly for images, RNNs are

play00:20

mainly for

play00:21

NLP. There are other use cases as well so

play00:24

we'll understand

play00:25

how Recurrent Neural Network works and

play00:27

we'll look at different

play00:29

applications of RNN in the field of NLP

play00:32

as well as some other domains.

play00:35

We will be looking at some real-life use

play00:37

cases where

play00:38

sequence models are useful. You must have

play00:41

used

play00:41

Google mail-Gmail. Here, when you type in

play00:45

a sentence it will auto complete it. So

play00:47

just see,

play00:48

when I type "not interested at this time"

play00:50

is something it auto completed.

play00:53

So google has this RRN or Recurrent

play00:57

Neural Network

play00:58

embedded into it where, when you type in

play01:01

a sentence "not interested at"

play01:03

it will auto complete with "this time". If

play01:06

you say "we'll let you know if it changes"

play01:08

it will also say "in the future" so this

play01:11

saves you time.

play01:12

It will write the sentence for you.

play01:15

Another use case is translation.

play01:18

You must have used Google Translate

play01:21

where you can translate

play01:22

sentence from one to another language

play01:24

easily.

play01:26

Third use case is Named Entity

play01:28

Recognization

play01:29

where in the X you know you give Neural

play01:33

Network a statement

play01:34

and in the Y Neural Network will tell

play01:36

you the person name

play01:38

the company and time. Rudolph Smith must

play01:41

be a millionaire with Tesla's

play01:43

prices skyrocketing.

play01:46

So these are various use cases where

play01:49

using sequence models or

play01:53

RNN-Recurrent Neural Network helps.

play01:56

The fourth use case is Sentiment

play01:57

Analysis where you have a paragraph and

play01:59

it will tell you

play02:00

the sentiment whether this product

play02:02

review is One star, Two star

play02:04

and so on. Now you would think -

play02:08

Why can't we use a simple Neural Network

play02:12

to solve this problem? See all these

play02:14

problems

play02:15

they are called Sequence Modeling

play02:18

problem

play02:18

because the sequence is important. When

play02:21

it comes to human language

play02:23

sequence is very important. For example

play02:24

when you say, "how are you?" versus

play02:27

"you are how" doesn't make sense,

play02:30

right? So the sequence is important here

play02:32

and you would

play02:34

think - Why don't we use simple neural

play02:36

network for that? Well,

play02:37

let's try it. So for language translation

play02:41

how about we build this kind of neural

play02:43

network

play02:44

we know where input is the English

play02:47

statement and the output could be Hindi

play02:48

statement

play02:51

Once I build this network, what if my

play02:54

sentence size changes? So i might be

play02:57

inputting

play02:58

different sentence size and with a fixed

play03:00

neural network architecture it's not

play03:02

going to work

play03:02

because you have to decide how many

play03:05

neurons are there in the input and

play03:06

output layer.

play03:07

So with language translation, number of

play03:10

neurons

play03:11

becomes a problem. Like what do you

play03:14

decide as a size of neurons?

play03:16

Now one would argue okay I would decide

play03:18

let's say a

play03:19

huge size let's say 100 neurons and

play03:23

remaining

play03:24

if I am saying, did you eat biryani? So it

play03:26

will occupy 4 neuron.

play03:28

Remaining 96 I will just say 0

play03:32

or you know blank statement. That might

play03:34

work

play03:35

but still it's not ideal. The second

play03:38

issue is too much computation.

play03:40

You all know neural networks work

play03:42

on numbers, they don't work on string.

play03:44

So you have to convert your word into

play03:47

a vector. So one of the ways of

play03:50

converting that into a vector

play03:52

is -l et's say there are 25000 words

play03:55

in your vocabulary and you will do one

play03:58

hot encoding

play03:59

where you know "how" let's say is at 46th

play04:01

position

play04:02

"are" is let's say second position "you" is

play04:05

let's say

play04:06

at 17000th position. So at that position

play04:09

you put 1,

play04:10

remaining position you put 0 and that's

play04:12

called

play04:13

one hot encoding. You have to do similar

play04:16

thing for output as well.

play04:18

But you realize this will increase too

play04:21

much computation.

play04:22

Each of the word when you convert it to

play04:24

a vector

play04:25

you know how many neurons you need in

play04:27

the input layer itself.

play04:28

Its humongous. The third issue

play04:32

is this - Sometimes when you translate

play04:34

language

play04:36

you for let's say two different English

play04:38

statements you might have a

play04:41

Hindi statement. So in this case

play04:44

when I say "On sunday I ate golgappa"

play04:47

let's say

play04:48

I train this network based on this

play04:50

statement

play04:51

and then for 'On Sunday' let's say it will

play04:55

adjust the weights of

play04:56

all these edges which I have highlighted

play04:59

in yellow color.

play05:01

Same statement I can say differently. I

play05:03

can say "I

play05:04

ate golgappa on Sunday". So now on Sunday

play05:08

the meaning of on Sunday

play05:09

is same but here neural network has to

play05:12

learn different set of edges you see all

play05:16

these edges are in yellow color.

play05:17

So the parameters are not shared.

play05:20

We looked at in our Convolutional Neural

play05:23

Network tutorial as well that by

play05:25

using convolution operation we can share

play05:27

the parameters.

play05:29

Here, the use of

play05:32

ANN or Artificial Neural Network doesn't

play05:34

allow you to

play05:36

do that okay. Also

play05:39

the most important part in all this

play05:41

discussion is the sequence.

play05:43

See when you have structured data, for

play05:45

example you're trying to figure out

play05:47

if the transaction is fraud or not and

play05:50

let's say

play05:51

your features are transaction amount,

play05:53

whether the transaction was made out of

play05:55

country

play05:56

or whether the SSN that customer

play05:57

provided is correct or not.

play05:59

Now here if you change the order of this

play06:03

features, let's say 'ssn correct?' I supply

play06:05

you know my first neuron

play06:08

it's not going to affect anything you know because

play06:11

the sequence in which you supply the

play06:13

input doesn't matter.

play06:14

Whereas if you have

play06:18

English to Hindi

play06:20

translation

play06:22

and instead of saying "I ate golgappa on sunday'

play06:24

and if I say 181 00:06:25,360 --> 00:06:31,600 "I ate Sunday on golgappa" the meaning

play06:28

becomes totally different. So now you

play06:31

cannot say that

play06:32

the Hindi translation is 'ravivar ko mene golgappe khaye"

play06:36

because it becomes invalid so sequence

play06:38

is very very important that's why

play06:41

Artificial Neural Network doesn't work

play06:43

in this case.

play06:44

Just to summarize these are the three

play06:47

major problems

play06:48

with using ANN for sequence problems.

play06:54

Let's once again talk about Named Entity

play06:57

Recognition.

play06:59

Let's say 'Dhaval loves baby yoda'

play07:02

I love my baby grogu. I love Mandalorian

play07:06

series and

play07:06

we have got this nice baby grogu at

play07:09

our home

play07:10

which actually talks with us. In this

play07:13

statement

play07:14

Dhaval and baby yoda are person names,

play07:17

okay.

play07:17

So the whole purpose of Named Entity

play07:19

Recognization is

play07:21

to find out the

play07:24

entity you know like 'Dhaval' as an entity

play07:26

is a person 'baby yoda'

play07:28

as an entity as a person so that's the

play07:30

whole

play07:31

goal of NER.

play07:34

Now you can represent this as ones and

play07:37

zero. So if the

play07:39

word is person's name you would mark it as

play07:42

one

play07:42

and if it is not a person's name you would mark it as zero. so let's see how RNN works here. RNN is also called Recurrent Neural Network. so first of all you have to convert Dhaval into some vector. It doesn't matter how you convert 216 00:07:57,199 --> 00:08:01,680 it you can take a vocabulary and use one

play07:59

hot encoding and there are other ways

play08:01

of

play08:02

vectorizing a word. Then

play08:05

you have a layer of neurons. So these are

play08:09

all individual neurons.

play08:10

Let's say this is one layer. It's a

play08:13

hidden layer

play08:14

you supply that and you get one output

play08:17

okay.

play08:18

So each neuron all you know has a sigma

play08:20

function

play08:21

and activation function. So now while

play08:24

processing the statement 'Dhaval loves

play08:26

baby yoda'

play08:27

now I will process it word by word.

play08:30

So I supply 'Dhaval', get the output and

play08:33

then I

play08:34

go back again. Now I supply

play08:37

'loves' converted into vector and the

play08:39

previous output which I

play08:41

got which was y Dhaval I now supply

play08:44

that as an input to this

play08:46

layer. So you see

play08:49

the input of the layer is not only the

play08:51

next word but the previous

play08:52

output because

play08:56

the language makes sense. Language needs

play08:58

to carry the context

play09:00

if I have just a word loss and if I

play09:02

don't have Dhaval in front of it

play09:04

it might mean a different thing. So there

play09:06

is a context that you need

play09:08

and this kind of architecture provides

play09:10

your context or a memory.

play09:13

In the third word again you supply 'baby'

play09:17

to the same network right. So our network

play09:20

has only one layer.

play09:21

it has only one layer, so there is input

play09:22

layer output layer and the hidden layer

play09:24

is just one

play09:25

and it has bunch of neurons. In that we

play09:28

are repeatedly processing word

play09:30

one by one okay

play09:34

and you keep on doing this. Now the

play09:35

benefit of this is

play09:37

when i'm processing 'baby' when i get why

play09:40

loves

play09:41

that 'why loves' carries the state, the

play09:44

previous state or previous memory

play09:46

of 'Dhaval loves' you know the whole

play09:49

statement.

play09:53

Now i'm presenting this in a different

play09:54

way make sure

play09:57

these are not four different hidden

play10:00

layers.

play10:00

This is a time travel

play10:04

okay so actual hidden layer is only one.

play10:08

I am just doing a time travel. So first

play10:11

when I supplied word 'Dhaval'

play10:13

i got this output and output was nothing

play10:15

but the activation function which I am

play10:16

denoting with a1

play10:19

and you need some previous activation a0

play10:22

as well.

play10:23

Let's say it's a vector of all zeros

play10:25

then you supply second word

play10:27

'loves' and use the previous output which

play10:30

was yDhaval

play10:31

so yDhaval and a1 they are

play10:33

both same here

play10:35

and then you get another output a2

play10:38

where

play10:38

that you supply along with the third

play10:41

word 'baby'

play10:42

to the same network. So these four

play10:44

neurons it's the same

play10:46

single layer. I am just showing

play10:49

the status of it at different times okay.

play10:53

So you have to be very clear on this

play10:56

that these are not four different layers.

play10:58

It's just one layer just because I am

play11:00

showing different time steps that's why

play11:02

I'm showing you almost a time travel

play11:04

here

play11:06

and once the network is trained

play11:09

of course it will output like 'Dhaval' is

play11:11

one 'loves' is zero 'baby' is one and so on

play11:14

okay. So you get your NER output

play11:18

individually here. One other way of

play11:20

representing the same network

play11:22

okay because just to avoid confusion and

play11:25

to

play11:26

make presentation little more clear. Many

play11:28

times

play11:29

in literature you will see presentation

play11:31

like this -

play11:33

Where each word which is an input comes

play11:35

from the bottom

play11:36

and there is activation. So again this

play11:39

and these two diagrams are exactly same

play11:42

okay

play11:43

I'm just putting this word at the bottom.

play11:48

Generic Representation of RNN is this.

play11:51

So this is the real representation. You have

play11:53

only one layer

play11:54

and you are you are kind of almost in a

play11:57

loop. You are

play11:58

supplying the output of previous

play12:00

word as an input to the second word.

play12:05

So now let's talk about training. So

play12:07

again

play12:08

the problem we are talking about is NER

play12:10

where these are my training samples okay

play12:13

x and y. 'x' is a statement 'y' is whether

play12:17

a given word is person name or not

play12:25

so we are processing first training

play12:28

sample 'Dhaval loves baby yoda'

play12:32

so this one I will first

play12:35

initialize my neural network weights

play12:38

with some random values,

play12:40

then I supply each word, then I calculate

play12:42

y hat which is predicted

play12:44

y, then I compare with the real y so real

play12:47

y here

play12:48

is 1 0 1 1 so I compare that with here

play12:51

so 1

play12:52

0 1 1 I compare that with y hat

play12:55

and I find out the loss okay

play12:59

and then I sum the loss.

play13:02

So that will be my total loss. You all

play13:05

know about grade and descent right. So we

play13:07

compute the loss then we back propagate

play13:10

the loss

play13:10

and we adjust the weights. So now I take

play13:13

the second statement

play13:15

'Iron man punched on hulk's face'

play13:18

he was very angry with hulk. Again i

play13:21

calculate all the losses

play13:23

then I find total loss and

play13:26

then I do grid and decent to reduce the

play13:28

loss.

play13:29

So i keep on doing this for all my

play13:31

training samples. Let's I have 100 training

play13:33

samples.

play13:35

Passing all hundred training samples

play13:38

through this network will be one epoch.

play13:41

We might do let's say 20 epochs and at

play13:44

the end of the 20 epoch

play13:45

my loss might become very minimum. At

play13:47

that point we can say my neural network

play13:49

is trained.

play13:51

Let's take a look at language

play13:54

translation. So in language translation

play13:55

what happens is

play13:56

you supply first word to your

play14:00

network then you get the output

play14:04

then again same network you supply

play14:06

second word and the output from previous

play14:08

step

play14:09

as an input and of course when you

play14:12

supply first where you have to

play14:14

pass in some activation values let's say

play14:17

all

play14:17

a vector of all zeros.

play14:20

Then you supply third word for fourth

play14:23

word and so on

play14:24

and when you're done with all the words

play14:28

that's when the network starts to

play14:31

translate it because

play14:34

you cannot translate one word by one,

play14:36

because after

play14:38

the statement I can push maybe one more

play14:40

word and that will just totally change

play14:42

my translation.

play14:43

That's why for language translation you

play14:45

have to supply all the words and only

play14:47

then

play14:48

the network can translate for you.

play14:51

So the network will translate it like

play14:54

this

play14:54

and the first part is called encoder

play14:58

the second part is called decoder.

play15:00

We will go

play15:01

more in depth into all this but I want

play15:04

to quickly demonstrate how

play15:06

the neural network looks in the case of

play15:08

language translation.

play15:11

Now this layer doesn't have to be just

play15:14

single layer. It can be a deep RNN as

play15:17

well

play15:18

where the actual network might have

play15:21

multiple hidden layers

play15:23

okay. So I hope that clarifies

play15:26

the architecture behind RNN and you

play15:30

understand why you can't use simple

play15:32

neural network here and you have to use

play15:34

specialized

play15:35

neural network called RNN which can

play15:38

memorize for you, which can

play15:40

remember previous state because language

play15:42

is

play15:43

all about sequence. If you change the

play15:45

sequence the meaning changes

play15:48

so if you like this video please give

play15:51

it a thumbs up

play15:51

and we'll be having more Recurrent

play15:54

Neural Network and NLP type

play15:56

tutorials in the future videos

play15:58

Thank you.

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Deep LearningRNNNLPTranslationSentiment AnalysisNamed Entity RecognitionGoogleNeural NetworksSequence ModelsLanguage Processing