State of the Art Neural Networks - Neural architecture search (NAS)

Google Cloud
18 Oct 202222:10

Summary

TLDRIn this talk, Jelena and Chris from Google Cloud discuss Neural Architecture Search (NAS), a technology for automating the design of artificial neural networks. They explain the motivation behind NAS, its building blocks, and its power to outperform hand-designed models. Real-world use cases, such as autonomous vehicles and mobile applications, demonstrate NAS's potential to improve efficiency and accuracy. The talk highlights how NAS can revolutionize machine learning by simplifying the process of designing and training neural networks.

Takeaways

  • 😀 Neural Architecture Search (NAS) is a technology for automating the design of artificial neural networks, aiming to outperform hand-designed architectures.
  • 🌟 NAS is a subfield of AutoML, focusing on finding optimal neural network architectures and hyperparameters, which is particularly useful for complex use cases like autonomous driving.
  • 📈 Google Brain team initiated NAS research in 2007 to improve machine learning model scaling and design, leading to algorithms that could design better neural network architectures.
  • 🔍 NAS has shown significant improvements in benchmarks like image classification, where NAS-designed architectures have achieved higher accuracy than hand-designed ones.
  • 🛠️ The NAS process involves four key building blocks: search spaces, search strategies or model generators, search algorithms, and model evaluation.
  • 🧩 Search spaces in NAS define the types of neural networks to be designed and optimized, with prebuilt and custom options available to cater to specific use cases.
  • 🔧 The search strategy or model generator samples proposed network architectures without constructing them, while the search algorithm optimizes these architectures based on performance metrics like accuracy or latency.
  • 🚀 NAS can significantly reduce the time and effort required to design neural networks, as it automates the trial and optimization process, which is traditionally manual and resource-intensive.
  • 🌐 NAS has real-world applications in various industries, including autonomous vehicles, medical imaging, and smartphone technology, where it has demonstrated improved performance and efficiency.
  • ⏱️ Companies can leverage NAS to accelerate machine learning development, reducing the need for large ML teams and the time spent on retraining networks, as NAS algorithms can automate these tasks.

Q & A

  • What is the main topic of the talk presented by Jelena and Chris?

    -The main topic of the talk is Neural Architecture Search (NAS), which is a technique for automating the design of artificial neural networks.

  • Why did Google Brain team start researching NAS?

    -The Google Brain team started researching NAS in 2007 because they recognized the need for a better approach in terms of scaling and designing machine learning models more efficiently.

  • What is the significance of NAS in machine learning development?

    -NAS is significant because it automates the process of designing neural network architectures, which can be time-consuming and requires expert knowledge. It aims to find optimal architectures and hyperparameters based on selected metrics.

  • How does NAS relate to AutoML?

    -NAS is a subfield of AutoML. While AutoML focuses on automating the process of applying machine learning, NAS specifically focuses on automating the design of neural network architectures within that process.

  • What are the four building blocks of NAS mentioned in the talk?

    -The four building blocks of NAS are search spaces, search strategy or model generator, search algorithm, and model evaluation.

  • What is a search space in the context of NAS?

    -A search space in NAS defines the type of neural networks that will be designed and optimized. It is essentially the pool of possible architectures from which the NAS algorithm can select.

  • How does the search algorithm in NAS work?

    -The search algorithm in NAS receives performance metrics as rewards for different trialed model architectures and uses these to optimize the performance of the architecture candidates.

  • What is the role of the controller in the NAS process?

    -The controller in NAS uses the search space to define an architecture for the child network. It iteratively improves the architecture based on the reward metrics, such as accuracy, latency, or memory usage.

  • How does NAS contribute to efficiency in machine learning?

    -NAS contributes to efficiency by automating the search for optimal neural network architectures, reducing the need for manual tuning by human experts, and allowing for the exploration of a vast number of configurations that would be impractical for humans to evaluate.

  • What are some real-world use cases of NAS mentioned in the talk?

    -Some real-world use cases of NAS include applications in autonomous vehicles, medical imaging, satellite hardware, and mobile devices, where NAS has been used to improve performance metrics such as accuracy, latency, and energy efficiency.

  • How does NAS impact the deployment process of machine learning models?

    -NAS can significantly reduce the deployment process time by automating the design and optimization of machine learning models, thus eliminating the need for a large ML engineering team and the lengthy retraining cycles.

Outlines

00:00

🌟 Introduction to Neural Architecture Search (NAS)

The video begins with Jelena Mijušković and Chris Mittendorf introducing themselves and the topic of Neural Architecture Search (NAS). They explain that NAS is a method for automating the design of artificial neural networks, which is a subfield of AutoML. The goal of NAS is to find optimal architectures and hyperparameters for machine learning models. The talk will cover the basics of NAS, its benefits, real-world use cases, and conclude with a summary. The speakers aim to provide an understanding of NAS technology and its potential to improve machine learning model design.

05:03

🧩 Building Blocks of NAS

Jelena outlines the four key building blocks of NAS: search spaces, search strategies or model generators, search algorithms, and model evaluation. The search space defines the type of neural networks that will be designed and optimized. The search strategy samples proposed network architectures without constructing them. The search algorithm uses performance metrics as rewards to optimize the architecture candidates. Model evaluation assesses the NAS model against validation data. Jelena also mentions that Google offers prebuilt search spaces and a custom search space option using PyGlove, a lightweight Python library for designing custom search spaces.

10:05

🔄 How NAS Works

Chris Mittendorf explains the process of how NAS works, starting with the theory and moving on to practical examples. He discusses the use of a controller and child network in NAS, where the controller defines an architecture based on the search space, and the child network is trained for a specific metric like accuracy. The results are fed back to the controller using reinforcement learning. Chris emphasizes the simplicity and revolutionary nature of NAS, which can lead to better performance than hand-designed models and requires less expert knowledge.

15:07

🚀 The Power of NAS

Chris highlights the power of NAS, emphasizing its ability to outperform handwritten models and reduce the need for expert ML engineering teams. He discusses how NAS can be used to improve not just accuracy but also other metrics like latency and memory usage. Chris also mentions that NAS is still somewhat dependent on human bias since the search space must be defined by humans, but future developments could see NAS designing its own search spaces. He provides examples of how NAS has been used to improve models for image detection, reducing computational requirements and increasing efficiency.

20:09

📱 Real-world Applications and Summary

The final paragraph discusses real-world applications of NAS, including autonomous vehicles, mobile devices, and natural language processing. Chris mentions that NAS has been used to reduce latency and error rates in Waymo's autonomous vehicle technology and to create a smaller, more efficient version of BERT for mobile devices. He summarizes the benefits of NAS, including the ability to design better models faster and with less human effort. The talk concludes with a call to action to use NAS for its numerous advantages in machine learning model development.

Mindmap

Keywords

💡Neural Architecture Search (NAS)

Neural Architecture Search (NAS) is a technique for automating the design of artificial neural networks. It aims to find the optimal architecture and hyperparameters for a given machine learning task. In the video, NAS is presented as a revolutionary approach that can outperform traditional, hand-designed neural networks. The script mentions how NAS can be used to improve various metrics such as accuracy, latency, and memory usage, which are crucial for applications like autonomous driving and image processing.

💡Machine Learning Models

Machine learning models are algorithms that improve their performance on a task with experience, by learning from data. They are central to the discussion in the video, where NAS is described as a method to automate and optimize the design of these models. The script emphasizes the time and expertise traditionally required to build effective machine learning models, and how NAS can streamline this process.

💡AutoML

AutoML (Automated Machine Learning) refers to the development of systems that automate the process of applying machine learning models to real-world problems. NAS is described as a subfield of AutoML in the script, focusing specifically on the automated design of neural network architectures. The video highlights how AutoML and NAS can simplify the machine learning workflow, making it more accessible to those without extensive expertise.

💡Hyperparameter Tuning

Hyperparameter tuning is the process of selecting the best model hyperparameters to improve the model's performance. It is mentioned in the script as a subset of NAS. The video explains how NAS goes beyond traditional hyperparameter tuning by not only optimizing the values of hyperparameters but also the structure of the neural network itself.

💡Search Spaces

In the context of NAS, search spaces define the range of possible neural network architectures that can be explored. The script explains that search spaces can be predefined or custom, allowing users to specify the types of neural networks to be designed and optimized. The video emphasizes the importance of search spaces in guiding the NAS algorithm to explore the most relevant architectures for a given task.

💡Reinforcement Learning

Reinforcement learning is a type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize some notion of cumulative reward. The script describes how reinforcement learning is used in NAS to train the controller network, which in turn designs child networks. The video highlights the efficiency of this approach, as it allows the NAS algorithm to learn from the performance of different architectures and iteratively improve the design process.

💡Controller Network

The controller network is a key component in the NAS process described in the video. It is responsible for generating the architecture of the child networks within the defined search space. The script explains how the controller network uses reinforcement learning to improve its decisions based on the performance feedback from the child networks, which is a critical part of the NAS methodology.

💡Child Network

A child network in NAS is a specific neural network architecture generated by the controller network. The script mentions that the controller defines the architecture of the child network based on the search space, and this child network is then trained and evaluated to provide feedback to the controller. The iterative process of generating and evaluating child networks is central to the NAS approach.

💡Policy Gradient

Policy gradient is a method used in reinforcement learning to optimize the parameters of a policy with respect to a reward signal. In the video, policy gradient is mentioned as the technique used to update the controller network in NAS, based on the performance of the child networks. The script illustrates how policy gradient allows the NAS algorithm to learn and improve the design of neural network architectures over time.

💡Parallelism and Asynchronous Updates

Parallelism and asynchronous updates are computational strategies mentioned in the script that enhance the efficiency of the NAS process. By training multiple child networks in parallel and updating the controller network asynchronously, NAS can explore a vast search space quickly. The video emphasizes how these strategies allow NAS to potentially outperform human-designed models by leveraging the power of modern computational resources.

Highlights

Introduction to Neural Architecture Search (NAS) and its significance in machine learning.

Jelena and Chris discuss the evolution of NAS and its role in automating the design of neural networks.

The motivation behind NAS: the need for a better approach to scale and design machine learning models.

Google Brain's contribution to NAS in 2007 and the development of algorithms to design neural networks.

The flexibility and power of neural networks and the challenges in designing them.

NAS as a technique for automating the design of artificial neural networks.

The concept of AutoML and how NAS is a subfield of it, focusing on hyperparameter tuning.

The historical development of NAS, starting with Google Brain's paper in 2017.

The impact of NAS on benchmarks and its ability to outperform hand-designed architectures.

The integration of NAS in the machine learning development flow, particularly in model training.

The four building blocks of NAS: search spaces, search strategy, search algorithm, and model evaluation.

The role of search spaces in defining the type of neural networks to be designed and optimized.

The function of the search strategy or model generator in proposing network architectures.

How the search algorithm optimizes the performance of architecture candidates based on reward metrics.

Model evaluation in NAS and its importance in assessing the performance of the NAS model.

The practical application of NAS within Google's Vertex AI platform for machine learning.

Chris Mittendorf's explanation of how NAS works, including the controller and child network dynamics.

The simplicity and revolutionary nature of NAS as a feedback loop for designing neural networks.

The potential of NAS to outperform machine learning teams and its implications for efficiency.

Real-world use cases of NAS, including autonomous vehicles, medical imaging, and smartphone applications.

The summary of NAS benefits, including its ability to redesign models, improve accuracy, and reduce deployment time.

Encouragement for the audience to adopt NAS for its potential to streamline machine learning processes.

Transcripts

play00:04

JELENA MIJUSKOVIC: All right.

play00:07

Good afternoon, everybody.

play00:09

We're so glad to see that so many of you

play00:11

decided to listen to our talk.

play00:15

Just a quick heads-up that there has been a change in the title.

play00:21

So just that you know, today we'll

play00:23

be talking about neural architecture search.

play00:26

My name is Jelena.

play00:27

I'm a Customer Engineer at Google Cloud,

play00:29

and I'm focusing on data analytics and machine learning,

play00:32

and I have Chris with me.

play00:34

Chris.

play00:34

CHRIS MITTENDORF: Hi, I'm Chris.

play00:36

I'm based in Munich.

play00:37

I'm a Cloud Space Architect focusing on machine learning,

play00:40

and that's why we brought this beautiful new title

play00:42

and the beautiful new agenda to you

play00:44

today because it's way more interesting than just data.

play00:50

JELENA MIJUSKOVIC: All right, let's

play00:51

look quickly about the content.

play00:52

So, our talk will be short and crisp.

play00:55

Nevertheless, we want to explain what NAS is to explain you

play01:00

a little bit the building blocks so you can understand

play01:02

how the technology works.

play01:04

And Chris is going to talk more about the power of NAS,

play01:08

so why is it really a good technology to use?

play01:12

And we will tackle some real-world use

play01:14

cases some of our customers and ourselves have been using NAS.

play01:19

And we're going to wrap up with a short summary.

play01:23

Before we start, I would like to know are you familiar with NAS?

play01:29

Just raise your hand if you are.

play01:32

If not, I'm not going to bring you to the stage.

play01:34

No worries.

play01:35

That's all good.

play01:36

No?

play01:37

All right.

play01:37

OK, you will have a lot of things

play01:40

to learn in the next 20 minutes.

play01:43

So this talk will actually explain a little bit

play01:47

about the motivation why NAS technology.

play01:50

So our Google Brain team back in 2007

play01:53

recognized that we need a better approach in terms of scaling

play01:58

and designing machine learning models,

play02:01

and they thought like, can we build an algorithm that

play02:04

can design a neural network architecture that

play02:09

can outperform the handwritten ones that will help us--

play02:14

Google internally-- as we use a lot of machine

play02:17

learning models in production, and also

play02:20

to the broader community to build

play02:22

faster and better neural networks based on the metrics

play02:27

that you select?

play02:31

Why?

play02:32

So we all know that neural networks

play02:35

are really flexible and powerful for many different use cases.

play02:41

But still even for us-- so in production--

play02:45

we have a lot of experienced machine learning engineers

play02:47

and data scientists that build these models,

play02:49

and nevertheless, it takes a lot of time

play02:52

and takes a lot of tuning, a lot of effort and, of course,

play02:59

expert knowledge to build them.

play03:03

So what is it?

play03:04

So it's basically a technique for automating the design

play03:07

of artificial neural networks.

play03:11

So as mentioned, they can go from very simple architectures,

play03:15

but more often, especially for some complex use

play03:19

cases such as autonomous driving, for example,

play03:23

that can really be, let's say, challenging in terms

play03:26

of that 10-layer network.

play03:28

You can imagine how many different

play03:29

architectures we're talking here that you can select from.

play03:36

Where does it belong?

play03:37

So for those who you are familiar with machine learning

play03:40

in general, you probably know that there

play03:42

is a concept of AutoML basically where on GCP, you can build--

play03:50

or actually, you can train your models

play03:52

on already prebuilt-- you can train

play03:54

your models with your data with prebuilt models.

play03:57

And NAS is actually subfield of AutoML.

play04:02

And if we look at the hyperparameter tuning,

play04:09

hyperparameter optimization is a subset of NAS.

play04:13

So with NAS, the whole idea is to find optimal architecture

play04:18

and also to find best hyperparameters.

play04:24

Talking about how it all started,

play04:26

so I mentioned Google Brain team back in 2017.

play04:30

So they published a paper, and they

play04:33

have been using these two data sets that, in both cases,

play04:38

they developed NAS, actually, as an algorithm.

play04:41

Developed basically a network architecture

play04:45

that outperformed handcoded architectures.

play04:48

And you can see here in terms of test error

play04:51

rate, the improvement, and also almost 0.9% faster.

play04:58

So one time, 0.5 times faster than the previous one that

play05:03

used a similar architecture.

play05:05

And the same was with a pantry data set as well.

play05:12

And from there, if you look what NAS has contributed

play05:15

in terms of different benchmarks,

play05:18

here we see image classification benchmarks.

play05:21

So we can see some of the state-of-the-art neural network

play05:25

architectures that kind of performed much better--

play05:30

here, the metric was accuracy-- than the hand-designed ones.

play05:36

Where does NAS sit in your machine

play05:39

learning development flow?

play05:40

So if you see the whole flow from generating the data,

play05:43

labeling the data, and building and training the models,

play05:48

that's where NAS sits--

play05:51

in the actual training part.

play05:54

In the training part, Chris will explain how the process works

play05:59

in more detail, but the whole idea

play06:01

is to search for the optimal architecture

play06:05

either for the particular metric such as latency, memory,

play06:10

or power, or even combination of them.

play06:15

And here you see how we think as Google about this AutoML

play06:21

and NAS in general.

play06:23

So obviously we work with a lot of smart people,

play06:27

with smart customers, and the idea

play06:29

is everybody has a lot of experience.

play06:33

But in reality, we all spend a lot of time

play06:36

in bringing our machine learning models,

play06:38

finding the optimal architectures

play06:40

and hyperparameters.

play06:42

But you can do this also from scratch with NAS.

play06:48

The idea is that you don't have to build them from scratch.

play06:51

Of course there are some, let's say, human interaction,

play06:55

some things that you can select, but obviously that process

play06:58

is rather simplified.

play07:01

And we have a NAS within our Vertex AI platform,

play07:06

so our end-to-end machine learning platform,

play07:10

as one of the offerings that we have for deploying and building

play07:15

your machine learning models on GCP.

play07:19

Let's look at the building blocks.

play07:21

That's very important to understand

play07:23

before Chris goes into how the technology actually works.

play07:27

So we have four important building blocks for you

play07:30

to understand in order to better understand how

play07:33

actually the technology works.

play07:35

So we have search spaces, we have search strategy or model

play07:42

generator, we have search algorithm, and model

play07:45

of evaluation that we are all familiar with.

play07:49

Search space, in a simple language,

play07:52

is actually your use case.

play07:54

So search space defines the type of neural networks

play07:59

that will be designed and optimized.

play08:01

And of course, as I mentioned before and you will see later,

play08:04

we offer prebuilt search spaces and custom ones

play08:07

so that you can basically select based on your use case.

play08:13

Search strategy, or as I prefer to call it model generator--

play08:17

so what it actually does is samples the number

play08:22

of, let's say, proposed network architectures without actually

play08:29

constructing and [INAUDIBLE] in it.

play08:31

From there, we have a search algorithm

play08:34

that receives these different trialed model performance

play08:40

metrics as rewards.

play08:42

So as I was mentioning before, you

play08:44

can choose accuracy, latency, memory, cost metric, too,

play08:48

as well.

play08:49

And then the idea is to optimize the performance

play08:52

of these architecture candidates.

play08:55

And finally, we have model ovulation

play08:58

that we evaluate our NAS model against evaluation validation

play09:03

data.

play09:06

So we didn't search spaces.

play09:07

We have prebuilt ones.

play09:09

And here you can see the example ones

play09:12

that you can select already.

play09:14

But nevertheless, if your use case

play09:16

is a little bit more custom than this,

play09:19

then you can use the custom search spaces.

play09:23

And for that, we have PyGlove that

play09:26

is lightweight Python library that

play09:28

can be used for designing your custom search spaces.

play09:35

And that we published already on GitHub.

play09:41

And with that, I'll let Chris to explain you how it works.

play09:46

CHRIS MITTENDORF: Thank you very much, Jelena.

play09:47

So how does it work?

play09:50

You've just seen the theory of NAS.

play09:52

You've seen the building blocks, and ideally you

play09:54

know or have an idea how this all should come together.

play09:58

So actually, it feels fairly simple.

play10:01

And I think it's so simple that it's actually revolutionary.

play10:05

And we are kind of stupid how we did it in the past,

play10:07

to be really honest with you.

play10:09

So back in the day, it was always

play10:11

whenever you took a machine learning course

play10:14

or you got to your professor, it was always the question

play10:16

how do I tune my hyperparameters?

play10:17

That was stage one.

play10:18

And they were always like yeah, you're not an expert.

play10:21

You should try and figure it out by yourself

play10:23

and maybe iterate through it, and it seemed fun to do that.

play10:27

Obviously you're going to improve on that.

play10:30

You build a really nice model by the end of the day that you

play10:33

think is state-of-the-art for your knowledge.

play10:36

However, your knowledge is fairly limited, or at least

play10:38

my knowledge in the machine learning space,

play10:40

so that I'm not able to outperform an entire machine

play10:43

learning team.

play10:45

However, having a simple feedback loop

play10:47

might solve that problem.

play10:49

So the paper that Jelena mentioned is from 2017.

play10:52

You can look it up.

play10:52

It's free to access for everyone.

play10:55

And it has a really simple architecture.

play10:57

So we use a recurrent neural network

play10:59

where a controller sits on the left side.

play11:02

We have a so-called child network on the right side.

play11:05

And the idea is pretty simple.

play11:07

So basically, controller is using search space,

play11:10

whether it's a custom search space or a predefined search

play11:13

space.

play11:14

The search spaces--

play11:15

I'll tell you in a bit what they are actually capable of.

play11:17

But however it starts, you pick a search space,

play11:20

you let the controller define an architecture

play11:25

based on the search space.

play11:27

This defined architecture is called the child network.

play11:31

This child network in the paper was trained for accuracy

play11:35

so we got a result out of it.

play11:37

We used reinforcement learning to basically

play11:39

feed in the result of the accuracy back to the controller

play11:42

and saw how we did.

play11:44

So based on the parameters that we

play11:46

used in order to design the child network,

play11:49

we can basically then--

play11:51

or we did use for accuracy, which is unfortunately not

play11:54

differentiable--

play11:55

a policy gradient to update the parameters

play11:58

and reiterated the process.

play11:59

So just building a network out of the search space

play12:02

until it becomes better and better.

play12:04

So this is the theory.

play12:05

So in words, it's actually fairly simple how it started.

play12:08

So we have a controller.

play12:10

It uses hyperparameters, and hyperparameters,

play12:12

as you might know, are like, parameters within the networks.

play12:14

But we see the hyperparameters as the networks themselves,

play12:18

so also their layers and their individual building blocks.

play12:20

So the child network is generated,

play12:23

and that's the beautiful part-- the reward metric is not

play12:26

only accuracy.

play12:26

It wasn't a paper because it performed really good.

play12:29

However, you are capable of choosing

play12:32

a reward metric for the reinforcement algorithm

play12:34

by yourself.

play12:35

This can be accuracy.

play12:37

This can be latency, memory, or combination of those.

play12:40

And this is really important because

play12:41

on every other industry, you do have a different focus.

play12:44

And we're going to see real use cases

play12:47

by the end of the presentation.

play12:49

Then we have the policy optimization,

play12:50

which I mentioned is under uncertainty

play12:52

because we used accuracy here.

play12:54

It's non-differentiable, but you can adjust the policy gradient

play12:57

then.

play12:58

And we do basically do three computations.

play13:00

Prior, we do an estimation of the child network

play13:04

to see how does it perform?

play13:06

How do we think it performs?

play13:08

Then we build it.

play13:10

We evaluate how it really performs.

play13:12

You get the accuracy out of it.

play13:13

And then we adjust the policy gradient.

play13:15

So that's the theory behind it.

play13:16

So how does it work?

play13:18

And it's fairly simple as you might think.

play13:21

And this is what I really like about NAS-- the power of it

play13:24

goes more in the direction of artificial general intelligence

play13:27

because now we don't build a neural network to solve

play13:31

a problem, we ideally build a neural network that

play13:34

can build neural networks to solve problems.

play13:37

So one step prior to that.

play13:39

So this is like a nice animation that we

play13:42

published in the blog post.

play13:43

You can basically see this is the search

play13:46

space that we have been using.

play13:48

So the convolutional layers, deconvolution layers,

play13:50

multi-head attention layers, some activation functions.

play13:54

And this is like the search space

play13:56

where the algorithm can pick from.

play13:59

And we designed neural networks based on the search base.

play14:02

And as you can see, when you iterate through it,

play14:04

it basically has a lot of possible layers or segment

play14:08

building blocks-- however you want to call them--

play14:10

to pick from in order to improve on a certain parameter.

play14:14

And by a number of iterations that you

play14:15

are able to define because it's a custom metric,

play14:18

you can basically see how it improves over time.

play14:21

And when you're satisfied with the accuracy,

play14:23

you can basically export the model and use it from there.

play14:27

So these are the hyperparameters that you might know.

play14:30

We've just seen the building blocks,

play14:31

like convolutional layers.

play14:32

Within those convolutional layers,

play14:34

obviously they are like the hyperparameters

play14:36

that you know from hyperparameter optimization,

play14:38

and these will be automatically adjusted by the NAS algorithm

play14:41

as well, like number of filters, filter height,

play14:45

filter with stride height, et cetera, et cetera.

play14:47

And you can feel that outperforming an ML expert

play14:51

team is fairly more easy with an algorithm

play14:54

because there's so many different parameters you

play14:56

can tune and might tweak, and having

play14:59

an algorithm to do that is better than just

play15:02

a number of human beings.

play15:03

So this is when you deploy it, basically, how it looks like.

play15:06

Defining the search base--

play15:09

you have the search strategy that I was mentioning,

play15:11

what the model generated.

play15:12

That generates the child network.

play15:13

And you have the child, which you evaluate then,

play15:16

and then it goes back.

play15:17

So this is like OK, if we can do that for one child network,

play15:21

it's actually fairly simple to do that with multiple child

play15:25

networks, ideally at the same time so we don't need to wait.

play15:29

And you know machine learning problems

play15:31

that can run for weeks and weeks of training.

play15:33

Ideally we have better compute power, but the power of it--

play15:36

you can now basically build hundreds of child networks

play15:41

at the same time and let them train,

play15:43

define the number of epochs-- how often

play15:45

do you want to iterate through them--

play15:47

and get the feedback back of the best model,

play15:49

and just build on this one.

play15:51

So we're having lots and lots of child networks,

play15:53

and we like those best, and basically we feed from there

play15:56

and improve for the model metric that we are actually

play16:00

looking for.

play16:02

So some theory behind it is obviously from the paper here,

play16:06

we work with probabilities.

play16:08

It's an autoregressive controller

play16:10

so we basically predict the hyperparameter one at a time.

play16:13

And the beautiful part that I mentioned

play16:15

is the training speed because you

play16:17

can accelerate your training with parallelism

play16:19

and asynchronous updates.

play16:20

So basically what you do-- you spin up a container,

play16:23

put the child network in, let it train.

play16:25

When it's finished, basically, you

play16:27

can use the container for either another child network

play16:30

or get rid of it.

play16:31

So there's a lot of compute involved obviously,

play16:33

but the main idea here is OK, we spend

play16:36

a lot of time in training, and obviously a lot of resources

play16:39

because a lot of iterations, but the end model itself

play16:42

needs to be a little bit better than what

play16:45

we could do by handwritten code or handwritten engineers

play16:48

so that we can save later on-- whether it's energy,

play16:51

or whether it's, for example, the car example,

play16:54

that we can avoid crashes.

play16:56

So the power of NAS-- and I only have like,

play16:58

four minutes left so I need to hurry up,

play17:00

but there's so much to speak about.

play17:02

Anyway, the power of NAS--

play17:03

in a nutshell, if we try to sell it,

play17:05

I think it's nothing that we actually

play17:08

want to sell in terms of I think it's a no-brainer,

play17:11

and there's no idea why no one should use it.

play17:14

So NAS can obviously outperform handwritten models,

play17:18

or at least it is on par so you don't need the ML engineering

play17:22

team with the expert knowledge to design it.

play17:26

So there's no preparation needed.

play17:28

And as I was mentioning or Jelena was mentioning,

play17:31

it's of 2017 so it's fairly new research.

play17:34

So research grows quickly, but as of right now,

play17:37

it's not totally independent of human bias

play17:40

because you need to pick the search space, right?

play17:42

Going a little bit further on that idea

play17:44

would be actually getting rid of all the search space

play17:47

and let another neural network design the search space for you

play17:50

that the NAS algorithm can pick from.

play17:53

So the wide search width as of right

play17:55

now is the hyperparameter optimization,

play17:56

the structure itself, and the building blocks.

play17:59

The reward signals we have seen.

play18:01

This is just one example that I found really beautiful.

play18:04

So we have R50, which is a hand-coded image detection

play18:09

algorithm for 2D images.

play18:10

And this is really good.

play18:12

It performed with an average position of 37.

play18:15

And for one iteration, it needed almost 100 billion flops,

play18:20

so 100 billion floating point operations,

play18:22

to do a fairly good in the data set and a fairly good

play18:25

average position.

play18:26

So this is R50, as I was mentioning.

play18:29

We did the same thing with the NAS-coded model.

play18:33

We found obviously we did a really good or better position

play18:37

here--

play18:38

over 5%-- and at the same time, decreased the number

play18:42

of floating point operations so we are more energy efficient

play18:45

and more precise.

play18:46

And now look at that--

play18:48

it's fairly difficult to build this from scratch, right?

play18:52

Because you see not only the different layers

play18:54

that are structured in a different order

play18:56

than to the traditional convolutional networks

play18:58

that you know, but also we have a lot of skip connections.

play19:02

And you see that there are so many possibilities

play19:04

you could choose from, and NAS does that for you.

play19:07

So that's the example.

play19:09

And we can not only do this with 2D image detection,

play19:11

so especially in vision we are on the forefront

play19:14

of using NAS networks, so there's

play19:16

a lot of classification models.

play19:17

There are image detection models,

play19:20

so basically the boundary boxes around the image,

play19:22

and even image segmentation where NAS outperformed

play19:25

handwritten models today.

play19:27

Real-world use cases in one minute--

play19:29

OK, that's going to be exciting.

play19:30

So we have autonomous vehicles, and you can think of Waymo,

play19:34

for example, which you're going to see in a second.

play19:36

And for example, there's accuracy,

play19:38

and latency is really important because you

play19:39

want to avoid the crash, right?

play19:41

And you want to avoid evaluating whether this

play19:44

is a car or a human being on the street.

play19:45

So we have medical imaging, satellite, hardware.

play19:50

Waymo-- it did really good.

play19:52

This seems fairly simple, this model,

play19:54

but you can also see the skip connections

play19:56

that we used in the different layer types.

play19:57

So we reached like, 20% to 30% lower latency with the NAS

play20:02

network and 8% to 10% lower error rates.

play20:05

And this is revolutionary.

play20:06

So this is our beautiful case from the automotive industry

play20:09

that Waymo uses.

play20:10

Same thing in your phones for the Pixel 1.

play20:14

So we can basically do, with the TPU

play20:18

on the tensor chip on pixels, we have also

play20:21

prebuilt a NAS model that basically uses machine learning

play20:24

to unblur faces, for example, or to refocus the image

play20:28

afterwards.

play20:29

That's already in production, and maybe you use it every day,

play20:32

and you didn't even recognize that it was built by NAS.

play20:35

Another example is BERT, our famous natural language

play20:38

processing model.

play20:39

There is now a mobile version of BERT out there, built by NAS,

play20:43

that's way, way smaller than the previous build

play20:46

version on edge devices.

play20:48

Why does it need to be smaller?

play20:50

Why is it cool that it's smaller?

play20:52

Because you have a lower latency when you get the response back.

play20:56

Especially when you do translations and stuff,

play20:58

that really comes in handy.

play21:00

Summary-- oh, I'm over time.

play21:02

Summary anyway-- redesign new models from scratch,

play21:06

basically just giving the search space.

play21:07

We can do better than handwritten coders.

play21:09

We are faster with it.

play21:10

And obviously you still can tweak parameters.

play21:13

How many child networks do I want to deploy, for example?

play21:16

With that-- and that's the key, the two send-offs I want

play21:19

to give you away with--

play21:21

use NAS.

play21:22

You can use it, actually, not only to improve accuracy,

play21:26

but the major part of it-- what we see at a lot of digital

play21:29

native companies, for example--

play21:30

the deployment process time decreases because you

play21:33

don't need to staff an ML team.

play21:36

And you don't need to spend weeks in retraining the network

play21:38

because the algorithm does it for you.

play21:40

With that, 21 minutes I think.

play21:44

We're going to the next call.

play21:46

It's Jai.

play21:46

So thank you for being here, and I hope

play21:48

NAS was kind of interesting.

play21:49

And stay forward looking good.

play21:53

Thank you.

Rate This

5.0 / 5 (0 votes)

関連タグ
Neural ArchitectureGoogle CloudAI DesignMachine LearningAutoMLNAS TechnologyEfficiencyInnovationData AnalyticsML Models
英語で要約が必要ですか?