State of the Art Neural Networks - Neural architecture search (NAS)
Summary
TLDRIn this talk, Jelena and Chris from Google Cloud discuss Neural Architecture Search (NAS), a technology for automating the design of artificial neural networks. They explain the motivation behind NAS, its building blocks, and its power to outperform hand-designed models. Real-world use cases, such as autonomous vehicles and mobile applications, demonstrate NAS's potential to improve efficiency and accuracy. The talk highlights how NAS can revolutionize machine learning by simplifying the process of designing and training neural networks.
Takeaways
- 😀 Neural Architecture Search (NAS) is a technology for automating the design of artificial neural networks, aiming to outperform hand-designed architectures.
- 🌟 NAS is a subfield of AutoML, focusing on finding optimal neural network architectures and hyperparameters, which is particularly useful for complex use cases like autonomous driving.
- 📈 Google Brain team initiated NAS research in 2007 to improve machine learning model scaling and design, leading to algorithms that could design better neural network architectures.
- 🔍 NAS has shown significant improvements in benchmarks like image classification, where NAS-designed architectures have achieved higher accuracy than hand-designed ones.
- 🛠️ The NAS process involves four key building blocks: search spaces, search strategies or model generators, search algorithms, and model evaluation.
- 🧩 Search spaces in NAS define the types of neural networks to be designed and optimized, with prebuilt and custom options available to cater to specific use cases.
- 🔧 The search strategy or model generator samples proposed network architectures without constructing them, while the search algorithm optimizes these architectures based on performance metrics like accuracy or latency.
- 🚀 NAS can significantly reduce the time and effort required to design neural networks, as it automates the trial and optimization process, which is traditionally manual and resource-intensive.
- 🌐 NAS has real-world applications in various industries, including autonomous vehicles, medical imaging, and smartphone technology, where it has demonstrated improved performance and efficiency.
- ⏱️ Companies can leverage NAS to accelerate machine learning development, reducing the need for large ML teams and the time spent on retraining networks, as NAS algorithms can automate these tasks.
Q & A
What is the main topic of the talk presented by Jelena and Chris?
-The main topic of the talk is Neural Architecture Search (NAS), which is a technique for automating the design of artificial neural networks.
Why did Google Brain team start researching NAS?
-The Google Brain team started researching NAS in 2007 because they recognized the need for a better approach in terms of scaling and designing machine learning models more efficiently.
What is the significance of NAS in machine learning development?
-NAS is significant because it automates the process of designing neural network architectures, which can be time-consuming and requires expert knowledge. It aims to find optimal architectures and hyperparameters based on selected metrics.
How does NAS relate to AutoML?
-NAS is a subfield of AutoML. While AutoML focuses on automating the process of applying machine learning, NAS specifically focuses on automating the design of neural network architectures within that process.
What are the four building blocks of NAS mentioned in the talk?
-The four building blocks of NAS are search spaces, search strategy or model generator, search algorithm, and model evaluation.
What is a search space in the context of NAS?
-A search space in NAS defines the type of neural networks that will be designed and optimized. It is essentially the pool of possible architectures from which the NAS algorithm can select.
How does the search algorithm in NAS work?
-The search algorithm in NAS receives performance metrics as rewards for different trialed model architectures and uses these to optimize the performance of the architecture candidates.
What is the role of the controller in the NAS process?
-The controller in NAS uses the search space to define an architecture for the child network. It iteratively improves the architecture based on the reward metrics, such as accuracy, latency, or memory usage.
How does NAS contribute to efficiency in machine learning?
-NAS contributes to efficiency by automating the search for optimal neural network architectures, reducing the need for manual tuning by human experts, and allowing for the exploration of a vast number of configurations that would be impractical for humans to evaluate.
What are some real-world use cases of NAS mentioned in the talk?
-Some real-world use cases of NAS include applications in autonomous vehicles, medical imaging, satellite hardware, and mobile devices, where NAS has been used to improve performance metrics such as accuracy, latency, and energy efficiency.
How does NAS impact the deployment process of machine learning models?
-NAS can significantly reduce the deployment process time by automating the design and optimization of machine learning models, thus eliminating the need for a large ML engineering team and the lengthy retraining cycles.
Outlines
🌟 Introduction to Neural Architecture Search (NAS)
The video begins with Jelena Mijušković and Chris Mittendorf introducing themselves and the topic of Neural Architecture Search (NAS). They explain that NAS is a method for automating the design of artificial neural networks, which is a subfield of AutoML. The goal of NAS is to find optimal architectures and hyperparameters for machine learning models. The talk will cover the basics of NAS, its benefits, real-world use cases, and conclude with a summary. The speakers aim to provide an understanding of NAS technology and its potential to improve machine learning model design.
🧩 Building Blocks of NAS
Jelena outlines the four key building blocks of NAS: search spaces, search strategies or model generators, search algorithms, and model evaluation. The search space defines the type of neural networks that will be designed and optimized. The search strategy samples proposed network architectures without constructing them. The search algorithm uses performance metrics as rewards to optimize the architecture candidates. Model evaluation assesses the NAS model against validation data. Jelena also mentions that Google offers prebuilt search spaces and a custom search space option using PyGlove, a lightweight Python library for designing custom search spaces.
🔄 How NAS Works
Chris Mittendorf explains the process of how NAS works, starting with the theory and moving on to practical examples. He discusses the use of a controller and child network in NAS, where the controller defines an architecture based on the search space, and the child network is trained for a specific metric like accuracy. The results are fed back to the controller using reinforcement learning. Chris emphasizes the simplicity and revolutionary nature of NAS, which can lead to better performance than hand-designed models and requires less expert knowledge.
🚀 The Power of NAS
Chris highlights the power of NAS, emphasizing its ability to outperform handwritten models and reduce the need for expert ML engineering teams. He discusses how NAS can be used to improve not just accuracy but also other metrics like latency and memory usage. Chris also mentions that NAS is still somewhat dependent on human bias since the search space must be defined by humans, but future developments could see NAS designing its own search spaces. He provides examples of how NAS has been used to improve models for image detection, reducing computational requirements and increasing efficiency.
📱 Real-world Applications and Summary
The final paragraph discusses real-world applications of NAS, including autonomous vehicles, mobile devices, and natural language processing. Chris mentions that NAS has been used to reduce latency and error rates in Waymo's autonomous vehicle technology and to create a smaller, more efficient version of BERT for mobile devices. He summarizes the benefits of NAS, including the ability to design better models faster and with less human effort. The talk concludes with a call to action to use NAS for its numerous advantages in machine learning model development.
Mindmap
Keywords
💡Neural Architecture Search (NAS)
💡Machine Learning Models
💡AutoML
💡Hyperparameter Tuning
💡Search Spaces
💡Reinforcement Learning
💡Controller Network
💡Child Network
💡Policy Gradient
💡Parallelism and Asynchronous Updates
Highlights
Introduction to Neural Architecture Search (NAS) and its significance in machine learning.
Jelena and Chris discuss the evolution of NAS and its role in automating the design of neural networks.
The motivation behind NAS: the need for a better approach to scale and design machine learning models.
Google Brain's contribution to NAS in 2007 and the development of algorithms to design neural networks.
The flexibility and power of neural networks and the challenges in designing them.
NAS as a technique for automating the design of artificial neural networks.
The concept of AutoML and how NAS is a subfield of it, focusing on hyperparameter tuning.
The historical development of NAS, starting with Google Brain's paper in 2017.
The impact of NAS on benchmarks and its ability to outperform hand-designed architectures.
The integration of NAS in the machine learning development flow, particularly in model training.
The four building blocks of NAS: search spaces, search strategy, search algorithm, and model evaluation.
The role of search spaces in defining the type of neural networks to be designed and optimized.
The function of the search strategy or model generator in proposing network architectures.
How the search algorithm optimizes the performance of architecture candidates based on reward metrics.
Model evaluation in NAS and its importance in assessing the performance of the NAS model.
The practical application of NAS within Google's Vertex AI platform for machine learning.
Chris Mittendorf's explanation of how NAS works, including the controller and child network dynamics.
The simplicity and revolutionary nature of NAS as a feedback loop for designing neural networks.
The potential of NAS to outperform machine learning teams and its implications for efficiency.
Real-world use cases of NAS, including autonomous vehicles, medical imaging, and smartphone applications.
The summary of NAS benefits, including its ability to redesign models, improve accuracy, and reduce deployment time.
Encouragement for the audience to adopt NAS for its potential to streamline machine learning processes.
Transcripts
JELENA MIJUSKOVIC: All right.
Good afternoon, everybody.
We're so glad to see that so many of you
decided to listen to our talk.
Just a quick heads-up that there has been a change in the title.
So just that you know, today we'll
be talking about neural architecture search.
My name is Jelena.
I'm a Customer Engineer at Google Cloud,
and I'm focusing on data analytics and machine learning,
and I have Chris with me.
Chris.
CHRIS MITTENDORF: Hi, I'm Chris.
I'm based in Munich.
I'm a Cloud Space Architect focusing on machine learning,
and that's why we brought this beautiful new title
and the beautiful new agenda to you
today because it's way more interesting than just data.
JELENA MIJUSKOVIC: All right, let's
look quickly about the content.
So, our talk will be short and crisp.
Nevertheless, we want to explain what NAS is to explain you
a little bit the building blocks so you can understand
how the technology works.
And Chris is going to talk more about the power of NAS,
so why is it really a good technology to use?
And we will tackle some real-world use
cases some of our customers and ourselves have been using NAS.
And we're going to wrap up with a short summary.
Before we start, I would like to know are you familiar with NAS?
Just raise your hand if you are.
If not, I'm not going to bring you to the stage.
No worries.
That's all good.
No?
All right.
OK, you will have a lot of things
to learn in the next 20 minutes.
So this talk will actually explain a little bit
about the motivation why NAS technology.
So our Google Brain team back in 2007
recognized that we need a better approach in terms of scaling
and designing machine learning models,
and they thought like, can we build an algorithm that
can design a neural network architecture that
can outperform the handwritten ones that will help us--
Google internally-- as we use a lot of machine
learning models in production, and also
to the broader community to build
faster and better neural networks based on the metrics
that you select?
Why?
So we all know that neural networks
are really flexible and powerful for many different use cases.
But still even for us-- so in production--
we have a lot of experienced machine learning engineers
and data scientists that build these models,
and nevertheless, it takes a lot of time
and takes a lot of tuning, a lot of effort and, of course,
expert knowledge to build them.
So what is it?
So it's basically a technique for automating the design
of artificial neural networks.
So as mentioned, they can go from very simple architectures,
but more often, especially for some complex use
cases such as autonomous driving, for example,
that can really be, let's say, challenging in terms
of that 10-layer network.
You can imagine how many different
architectures we're talking here that you can select from.
Where does it belong?
So for those who you are familiar with machine learning
in general, you probably know that there
is a concept of AutoML basically where on GCP, you can build--
or actually, you can train your models
on already prebuilt-- you can train
your models with your data with prebuilt models.
And NAS is actually subfield of AutoML.
And if we look at the hyperparameter tuning,
hyperparameter optimization is a subset of NAS.
So with NAS, the whole idea is to find optimal architecture
and also to find best hyperparameters.
Talking about how it all started,
so I mentioned Google Brain team back in 2017.
So they published a paper, and they
have been using these two data sets that, in both cases,
they developed NAS, actually, as an algorithm.
Developed basically a network architecture
that outperformed handcoded architectures.
And you can see here in terms of test error
rate, the improvement, and also almost 0.9% faster.
So one time, 0.5 times faster than the previous one that
used a similar architecture.
And the same was with a pantry data set as well.
And from there, if you look what NAS has contributed
in terms of different benchmarks,
here we see image classification benchmarks.
So we can see some of the state-of-the-art neural network
architectures that kind of performed much better--
here, the metric was accuracy-- than the hand-designed ones.
Where does NAS sit in your machine
learning development flow?
So if you see the whole flow from generating the data,
labeling the data, and building and training the models,
that's where NAS sits--
in the actual training part.
In the training part, Chris will explain how the process works
in more detail, but the whole idea
is to search for the optimal architecture
either for the particular metric such as latency, memory,
or power, or even combination of them.
And here you see how we think as Google about this AutoML
and NAS in general.
So obviously we work with a lot of smart people,
with smart customers, and the idea
is everybody has a lot of experience.
But in reality, we all spend a lot of time
in bringing our machine learning models,
finding the optimal architectures
and hyperparameters.
But you can do this also from scratch with NAS.
The idea is that you don't have to build them from scratch.
Of course there are some, let's say, human interaction,
some things that you can select, but obviously that process
is rather simplified.
And we have a NAS within our Vertex AI platform,
so our end-to-end machine learning platform,
as one of the offerings that we have for deploying and building
your machine learning models on GCP.
Let's look at the building blocks.
That's very important to understand
before Chris goes into how the technology actually works.
So we have four important building blocks for you
to understand in order to better understand how
actually the technology works.
So we have search spaces, we have search strategy or model
generator, we have search algorithm, and model
of evaluation that we are all familiar with.
Search space, in a simple language,
is actually your use case.
So search space defines the type of neural networks
that will be designed and optimized.
And of course, as I mentioned before and you will see later,
we offer prebuilt search spaces and custom ones
so that you can basically select based on your use case.
Search strategy, or as I prefer to call it model generator--
so what it actually does is samples the number
of, let's say, proposed network architectures without actually
constructing and [INAUDIBLE] in it.
From there, we have a search algorithm
that receives these different trialed model performance
metrics as rewards.
So as I was mentioning before, you
can choose accuracy, latency, memory, cost metric, too,
as well.
And then the idea is to optimize the performance
of these architecture candidates.
And finally, we have model ovulation
that we evaluate our NAS model against evaluation validation
data.
So we didn't search spaces.
We have prebuilt ones.
And here you can see the example ones
that you can select already.
But nevertheless, if your use case
is a little bit more custom than this,
then you can use the custom search spaces.
And for that, we have PyGlove that
is lightweight Python library that
can be used for designing your custom search spaces.
And that we published already on GitHub.
And with that, I'll let Chris to explain you how it works.
CHRIS MITTENDORF: Thank you very much, Jelena.
So how does it work?
You've just seen the theory of NAS.
You've seen the building blocks, and ideally you
know or have an idea how this all should come together.
So actually, it feels fairly simple.
And I think it's so simple that it's actually revolutionary.
And we are kind of stupid how we did it in the past,
to be really honest with you.
So back in the day, it was always
whenever you took a machine learning course
or you got to your professor, it was always the question
how do I tune my hyperparameters?
That was stage one.
And they were always like yeah, you're not an expert.
You should try and figure it out by yourself
and maybe iterate through it, and it seemed fun to do that.
Obviously you're going to improve on that.
You build a really nice model by the end of the day that you
think is state-of-the-art for your knowledge.
However, your knowledge is fairly limited, or at least
my knowledge in the machine learning space,
so that I'm not able to outperform an entire machine
learning team.
However, having a simple feedback loop
might solve that problem.
So the paper that Jelena mentioned is from 2017.
You can look it up.
It's free to access for everyone.
And it has a really simple architecture.
So we use a recurrent neural network
where a controller sits on the left side.
We have a so-called child network on the right side.
And the idea is pretty simple.
So basically, controller is using search space,
whether it's a custom search space or a predefined search
space.
The search spaces--
I'll tell you in a bit what they are actually capable of.
But however it starts, you pick a search space,
you let the controller define an architecture
based on the search space.
This defined architecture is called the child network.
This child network in the paper was trained for accuracy
so we got a result out of it.
We used reinforcement learning to basically
feed in the result of the accuracy back to the controller
and saw how we did.
So based on the parameters that we
used in order to design the child network,
we can basically then--
or we did use for accuracy, which is unfortunately not
differentiable--
a policy gradient to update the parameters
and reiterated the process.
So just building a network out of the search space
until it becomes better and better.
So this is the theory.
So in words, it's actually fairly simple how it started.
So we have a controller.
It uses hyperparameters, and hyperparameters,
as you might know, are like, parameters within the networks.
But we see the hyperparameters as the networks themselves,
so also their layers and their individual building blocks.
So the child network is generated,
and that's the beautiful part-- the reward metric is not
only accuracy.
It wasn't a paper because it performed really good.
However, you are capable of choosing
a reward metric for the reinforcement algorithm
by yourself.
This can be accuracy.
This can be latency, memory, or combination of those.
And this is really important because
on every other industry, you do have a different focus.
And we're going to see real use cases
by the end of the presentation.
Then we have the policy optimization,
which I mentioned is under uncertainty
because we used accuracy here.
It's non-differentiable, but you can adjust the policy gradient
then.
And we do basically do three computations.
Prior, we do an estimation of the child network
to see how does it perform?
How do we think it performs?
Then we build it.
We evaluate how it really performs.
You get the accuracy out of it.
And then we adjust the policy gradient.
So that's the theory behind it.
So how does it work?
And it's fairly simple as you might think.
And this is what I really like about NAS-- the power of it
goes more in the direction of artificial general intelligence
because now we don't build a neural network to solve
a problem, we ideally build a neural network that
can build neural networks to solve problems.
So one step prior to that.
So this is like a nice animation that we
published in the blog post.
You can basically see this is the search
space that we have been using.
So the convolutional layers, deconvolution layers,
multi-head attention layers, some activation functions.
And this is like the search space
where the algorithm can pick from.
And we designed neural networks based on the search base.
And as you can see, when you iterate through it,
it basically has a lot of possible layers or segment
building blocks-- however you want to call them--
to pick from in order to improve on a certain parameter.
And by a number of iterations that you
are able to define because it's a custom metric,
you can basically see how it improves over time.
And when you're satisfied with the accuracy,
you can basically export the model and use it from there.
So these are the hyperparameters that you might know.
We've just seen the building blocks,
like convolutional layers.
Within those convolutional layers,
obviously they are like the hyperparameters
that you know from hyperparameter optimization,
and these will be automatically adjusted by the NAS algorithm
as well, like number of filters, filter height,
filter with stride height, et cetera, et cetera.
And you can feel that outperforming an ML expert
team is fairly more easy with an algorithm
because there's so many different parameters you
can tune and might tweak, and having
an algorithm to do that is better than just
a number of human beings.
So this is when you deploy it, basically, how it looks like.
Defining the search base--
you have the search strategy that I was mentioning,
what the model generated.
That generates the child network.
And you have the child, which you evaluate then,
and then it goes back.
So this is like OK, if we can do that for one child network,
it's actually fairly simple to do that with multiple child
networks, ideally at the same time so we don't need to wait.
And you know machine learning problems
that can run for weeks and weeks of training.
Ideally we have better compute power, but the power of it--
you can now basically build hundreds of child networks
at the same time and let them train,
define the number of epochs-- how often
do you want to iterate through them--
and get the feedback back of the best model,
and just build on this one.
So we're having lots and lots of child networks,
and we like those best, and basically we feed from there
and improve for the model metric that we are actually
looking for.
So some theory behind it is obviously from the paper here,
we work with probabilities.
It's an autoregressive controller
so we basically predict the hyperparameter one at a time.
And the beautiful part that I mentioned
is the training speed because you
can accelerate your training with parallelism
and asynchronous updates.
So basically what you do-- you spin up a container,
put the child network in, let it train.
When it's finished, basically, you
can use the container for either another child network
or get rid of it.
So there's a lot of compute involved obviously,
but the main idea here is OK, we spend
a lot of time in training, and obviously a lot of resources
because a lot of iterations, but the end model itself
needs to be a little bit better than what
we could do by handwritten code or handwritten engineers
so that we can save later on-- whether it's energy,
or whether it's, for example, the car example,
that we can avoid crashes.
So the power of NAS-- and I only have like,
four minutes left so I need to hurry up,
but there's so much to speak about.
Anyway, the power of NAS--
in a nutshell, if we try to sell it,
I think it's nothing that we actually
want to sell in terms of I think it's a no-brainer,
and there's no idea why no one should use it.
So NAS can obviously outperform handwritten models,
or at least it is on par so you don't need the ML engineering
team with the expert knowledge to design it.
So there's no preparation needed.
And as I was mentioning or Jelena was mentioning,
it's of 2017 so it's fairly new research.
So research grows quickly, but as of right now,
it's not totally independent of human bias
because you need to pick the search space, right?
Going a little bit further on that idea
would be actually getting rid of all the search space
and let another neural network design the search space for you
that the NAS algorithm can pick from.
So the wide search width as of right
now is the hyperparameter optimization,
the structure itself, and the building blocks.
The reward signals we have seen.
This is just one example that I found really beautiful.
So we have R50, which is a hand-coded image detection
algorithm for 2D images.
And this is really good.
It performed with an average position of 37.
And for one iteration, it needed almost 100 billion flops,
so 100 billion floating point operations,
to do a fairly good in the data set and a fairly good
average position.
So this is R50, as I was mentioning.
We did the same thing with the NAS-coded model.
We found obviously we did a really good or better position
here--
over 5%-- and at the same time, decreased the number
of floating point operations so we are more energy efficient
and more precise.
And now look at that--
it's fairly difficult to build this from scratch, right?
Because you see not only the different layers
that are structured in a different order
than to the traditional convolutional networks
that you know, but also we have a lot of skip connections.
And you see that there are so many possibilities
you could choose from, and NAS does that for you.
So that's the example.
And we can not only do this with 2D image detection,
so especially in vision we are on the forefront
of using NAS networks, so there's
a lot of classification models.
There are image detection models,
so basically the boundary boxes around the image,
and even image segmentation where NAS outperformed
handwritten models today.
Real-world use cases in one minute--
OK, that's going to be exciting.
So we have autonomous vehicles, and you can think of Waymo,
for example, which you're going to see in a second.
And for example, there's accuracy,
and latency is really important because you
want to avoid the crash, right?
And you want to avoid evaluating whether this
is a car or a human being on the street.
So we have medical imaging, satellite, hardware.
Waymo-- it did really good.
This seems fairly simple, this model,
but you can also see the skip connections
that we used in the different layer types.
So we reached like, 20% to 30% lower latency with the NAS
network and 8% to 10% lower error rates.
And this is revolutionary.
So this is our beautiful case from the automotive industry
that Waymo uses.
Same thing in your phones for the Pixel 1.
So we can basically do, with the TPU
on the tensor chip on pixels, we have also
prebuilt a NAS model that basically uses machine learning
to unblur faces, for example, or to refocus the image
afterwards.
That's already in production, and maybe you use it every day,
and you didn't even recognize that it was built by NAS.
Another example is BERT, our famous natural language
processing model.
There is now a mobile version of BERT out there, built by NAS,
that's way, way smaller than the previous build
version on edge devices.
Why does it need to be smaller?
Why is it cool that it's smaller?
Because you have a lower latency when you get the response back.
Especially when you do translations and stuff,
that really comes in handy.
Summary-- oh, I'm over time.
Summary anyway-- redesign new models from scratch,
basically just giving the search space.
We can do better than handwritten coders.
We are faster with it.
And obviously you still can tweak parameters.
How many child networks do I want to deploy, for example?
With that-- and that's the key, the two send-offs I want
to give you away with--
use NAS.
You can use it, actually, not only to improve accuracy,
but the major part of it-- what we see at a lot of digital
native companies, for example--
the deployment process time decreases because you
don't need to staff an ML team.
And you don't need to spend weeks in retraining the network
because the algorithm does it for you.
With that, 21 minutes I think.
We're going to the next call.
It's Jai.
So thank you for being here, and I hope
NAS was kind of interesting.
And stay forward looking good.
Thank you.
Ver Más Videos Relacionados
UNIT-1 INTRODUCTION TO AI SUB-UNIT - 1.1- EXCITE CLASS 8-9 CBSE (AI-417)
Liquid Neural Networks
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples | Simplilearn
How Computer Vision Applications Work
Understanding Artificial Intelligence and Its Future | Neil Nie | TEDxDeerfield
How Neural Networks Work
5.0 / 5 (0 votes)