Energy Efficient AI Hardware: neuromorphic circuits and tools
Summary
TLDRIn this presentation, Loretta Matia from the Fraunhofer Institute discusses the need for energy-efficient AI hardware for edge devices. She emphasizes the advantages of on-device AI, such as reduced latency and enhanced privacy, and the challenges of designing low-energy, low-latency ASICs for neural networks. Matia outlines the importance of neuromorphic computing and the six key areas of expertise required for its success, including system architecture, circuits, algorithms, and software tools. She also addresses the importance of accurate, energy-efficient computation at the circuit level and the need for use-case-based benchmarking.
Takeaways
- 🌐 Loretta is a department head at the Fraunhofer Institute for Integrated Circuits, focusing on energy-efficient AI hardware and neuromorphic circuits.
- 💡 The motivation for developing edge AI hardware is to reduce latency, increase energy efficiency, and enhance privacy by processing data locally rather than in the cloud.
- 🔋 Key advantages of edge AI include low latency due to local processing, higher energy efficiency by avoiding wireless data transmission, and improved privacy as data stays where it's generated.
- 📉 The push for edge AI is also driven by the end of Moore's Law, necessitating new computing paradigms like neuromorphic architectures to overcome limitations of traditional von Neumann architectures.
- 🛠️ Six critical areas of expertise for neuromorphic computing include system architecture, circuit design, algorithms, software tools, physical devices, and embedded non-volatile memories.
- 🔄 The presentation discusses an inference accelerator ASIC designed for neural networks with objectives like energy efficiency, speed, small area, and scalability to cover a broad range of use cases.
- 🔄 Analog memory computing is highlighted as a method to achieve high computation speed and energy efficiency by performing operations in-memory, leveraging the inherent parallelism of analog circuits.
- 📏 Hardware-aware training is crucial to deal with the non-idealities of analog computing, such as weight distribution mismatches, to ensure accurate and robust neural network models.
- 🔧 The importance of careful mapping of neural networks onto hardware is emphasized, as it significantly impacts performance metrics like latency and energy consumption.
- 📊 Benchmarking should be use-case based, focusing on energy per inference and inference latency rather than just top performance metrics, which may not reflect real-world application performance.
Q & A
What is the primary motivation for developing energy-efficient AI hardware?
-The primary motivation is to bring AI to edge devices, IoT devices, and sensors where data is generated and collected. This allows for low latency, higher energy efficiency, and improved privacy since data does not need to be sent to the cloud.
What are the three main advantages of bringing AI to the edge?
-The three main advantages are low latency, higher energy efficiency, and enhanced privacy. Low latency is achieved by processing data locally, energy efficiency is improved by not sending raw data wirelessly to the cloud, and privacy is enhanced because data remains where it's generated.
Why is it necessary to move away from conventional von Neumann architectures for AI hardware?
-As Moore's Law is reaching its limits, conventional von Neumann architectures, which have a bottleneck between memory and computation, are no longer sufficient. Instead, architectures that compute close to or in the memory, like neuromorphic computing, are needed to achieve the required low latency and high energy efficiency.
What are the key objectives for an inference accelerator ASIC to be successful in the market?
-The key objectives include being designed in established semiconductor processes, having ultra-low energy consumption per inference, being fast, having a smaller area for lower cost, and being configurable and scalable for a range of use cases.
Why is in-memory computing considered advantageous for AI hardware?
-In-memory computing is advantageous because it allows for high parallelism and computation speed, and since there is no data movement as the computation happens on the memory, it significantly improves energy efficiency.
How does the presenter's team address the issue of inaccuracy in analog memory computing?
-The team addresses inaccuracy through hardware-aware training, which involves quantizing the weights to reduce memory footprint and training the neural network model to be robust against hardware variances.
What is the importance of a mapping tool in the design of AI hardware?
-A mapping tool is crucial as it determines how a neural network model is mapped onto the hardware. The strategy used for mapping can significantly impact performance indicators such as latency and energy consumption.
Why should benchmarking for AI hardware focus on use cases rather than just top performance metrics?
-Benchmarking should focus on use cases because top performance metrics like TOPS per Watt can be misleading and do not reflect real-world performance. Use case-based benchmarking provides a more accurate comparison of how well the hardware performs for specific applications.
How does the presenter's team ensure their AI hardware is robust against environmental changes, especially temperature variations?
-The team ensures robustness against environmental changes by performing corner simulations that cover a wide range of temperatures. They define temperature corners and simulate to ensure the hardware meets specifications across these extremes.
What is the significance of the tinyML community in the context of AI hardware benchmarking?
-The tinyML community provides a framework for benchmarking AI hardware using neural network reference models. This allows for a standardized comparison of different hardware's performance on the same neural network models.
Outlines
🙌 Introducing Loretta Matia and Energy-Efficient AI Hardware
The speaker, Loretta Matia, department head at Fraunhofer Institute, introduces her work on energy-efficient AI hardware. She explains the need for hardware solutions to support AI applications in edge devices, emphasizing the importance of low latency, high energy efficiency, and data privacy. By processing AI algorithms locally rather than sending data to the cloud, these solutions offer faster processing, better energy use, and enhanced privacy.
⚡ AI Hardware Design Challenges and Requirements
Loretta discusses the challenges in designing AI-specific integrated circuits (ASICs) that are both energy-efficient and low-latency. She outlines three essential factors: circuits must be designed using qualified semiconductor processes, must minimize energy consumption per inference, and must have a small area to reduce costs. Additionally, the architecture should be configurable and scalable to meet different use cases. Loretta also highlights the need for system architectures that include multicore setups, digital, analog, or mixed-signal approaches.
🧠 Importance of System Architecture and Algorithms in AI ASICs
The discussion shifts to the six key enablers for neuromorphic computing: system architecture, circuit design, algorithms, software tools, physical devices, and embedded memories. Each of these elements must work together to achieve the required energy efficiency and performance in AI ASICs. The speaker also stresses the importance of compression techniques for neural networks to fit within the ASIC and the role of software tools like mappers and compilers to ensure efficient operation.
🔄 Analog vs Digital Approaches in In-Memory Computing
Loretta explains the concept of in-memory computing, where computations are done within the memory itself to minimize data movement and enhance efficiency. She compares analog and digital methods, advocating for the analog approach due to its potential for high parallelism and computation speed. However, she also acknowledges challenges such as weight distribution inaccuracies, which must be addressed through techniques like hardware-aware training to ensure accurate performance.
🌍 Mapping Neural Networks and Benchmarking ASICs
The focus shifts to the importance of effective mapping tools in determining how neural networks are mapped onto multicore ASIC architectures. Poor mapping can lead to suboptimal performance even if the hardware itself is well-designed. Loretta also critiques the common benchmarking methods, which focus on tops per watt or operations per second, and argues for benchmarking based on specific use cases for more accurate comparisons.
🏁 Conclusions: Accurate Hardware and Software Co-Design
Loretta wraps up her presentation by emphasizing the importance of co-design between hardware and software in developing AI systems. She calls for a comprehensive toolchain that integrates architecture, circuits, and software tools. Loretta also advises caution in benchmarking and stresses the need for accurate, robust designs that perform well under real-world conditions. Her final remarks focus on ensuring energy efficiency and robustness at the circuit level.
Mindmap
Keywords
💡AI Hardware
💡Low Latency
💡Energy Efficiency
💡In-Memory Computing
💡Neuromorphic Circuits
💡Application-Specific Integrated Circuits (ASICs)
💡Quantization-Aware Training
💡Non-Volatile Memory
💡Multi-Core Architecture
💡Use Case-Based Benchmarking
Highlights
Introduction of Loretta, the department head of integrated circuits at the Fraunhofer Institute for Integrated Circuits.
The motivation behind the need for energy-efficient AI hardware.
Advantages of bringing AI to edge devices, such as low latency, higher energy efficiency, and improved privacy.
The significance of microseconds and nanojoules per inference in the context of AI hardware.
The end of Moore's Law and the need for neuromorphic architectures to overcome bottlenecks in conventional computing.
Objectives for a successful neural network inference accelerator ASIC, including ultra-low energy consumption and fast computation.
The importance of system architecture in designing AI hardware, including options for single-core or multi-core architectures.
The role of circuit design in creating low-power circuits for synapses, neurons, and activation functions.
The necessity of algorithm expertise for fitting neural network models into ASICs and the challenge of model compression.
The need for software tools in neuromorphic computing, such as quantization training tools and mappers for hardware.
The choice of technology node and embedded non-volatile memories for achieving high energy efficiency.
Explanation of mixed-signal ANN ASIC inference accelerator with analog memory computing.
The concept of in-memory computing and its benefits for energy efficiency and speed.
Challenges with analog memory computing, such as weight distribution and the need for hardware-aware training.
The importance of mapping tools in efficiently utilizing hardware resources and reducing data movement.
The need for accurate and robust computation at the circuit level for reliable AI hardware performance.
The significance of use-case-based benchmarking over generic performance metrics like TOPS per Watt.
Discussion on the energy efficiency of analog computing and the integration of ADCs for interfacing with the digital world.
The potential of the presented AI hardware for edge computing and handling environmental changes.
Transcripts
current so I'd like to welcome up who
lead to karabalam no no no sorry Peggy
pardon
Loretta Loretto I seem to have the order
amongst out my apologies I have Loretta
Mattie Matia she's the department head
of integrated circuits insist in the
systems department for franhofer
institute for integrated circuits and
her title is energy efficient AI
Hardware neuromorphic circuits and tools
thank you
[Music]
okay so
um I will first in my tool give you a
motivation why we need a Hai hardware
and then I will just jump in into which
are the challenges of Designing energy
efficient and a low latency application
specific integrated circuits for neural
networks
so if we want to bring
AI to the age
devices to iot to devices to sensors so
really where and the data
um is generated and uncollected we need
also to bring their the AI Hardware that
can compute the AI algorithms and not do
it anymore on on the cloud so which are
the advantages of bringing AI to to the
age so we will have a low latency
because we are not sending all the raw
data that we are generating to the cloud
anymore so it um the the AI algorithm
will run locally then we will have a
higher Energy Efficiency right because
we are Computing again locally and we
are not sending all this raw data to the
cloud and this is done mainly in a
wireless way that even consumes more and
then we have as a third Advantage
Advantage the Privacy so because the
data reminds where it's generated and we
don't send it to the cloud
so these are the three main advantages
and um there are really use cases and
applications that that read need this
low latency and and High Energy
Efficiency
and um
um and when I'm talking here about low
latency and High Energy Efficiency I'm
talking about microseconds
and I'm talking about nanojoules per
inference so we heard before cloud
computing high performance Computing
about hundreds of Parts thousands of
parts so I'm really talking here about
millibots or microbots you're right so
this this open also the new
possibilities new use cases and and
really to master this low latency and
High Energy Efficiency since we are
dealing with the end of Moore's Law and
we cannot rely anymore in um
conventional for Newman architectures we
really need here to go for
grindinspire architectures so none for
Newman architectures right where we
don't
um have this bottleneck
this phone human bottleneck between the
memory block
and the computation engine so we really
need to compute close to the memory or
in the memory to achieve this um these
numbers
so for now on I will be talking about to
you about a inference accelerator X6 for
neural networks and what are the
objectives that um this inference
accelerator Asics need to achieve in
order to be successful in the market and
also in order to cover up gold range of
use cases so first they need to be
designed and fabricated in established
qualified semiconductor processes
because no company right it's it's going
um to to will to to to buy an Asic of a
non-qualified CMOS process then we
really need this ultra low energy
consumption per inference so while we
run one inference we need to be really
energy efficient and we need to be
really fast
and then we um another objective to
achieve is a smaller area why because
this will lead to us a low price of of
your Asic and then the fifth objective
is to make a configurable and scalable
multi-core architecture because on this
way you can cover several use cases with
the same architecture
okay so um
how we can achieve this so I in my
opinion there are six enables for
neuromorphic computing six areas of
expertise that you have and need to
master in order to reach the the
objectives I talked to you before the
first one is system architecture we
heard about this before an unexplained
this so
um there are architectures for
artificial neural networks there are
architectures for spiking neural
networks
um you can do for example a single core
architecture a multi-core architecture
you can have a multi-core architecture
with a network on chip for neural
networks or for spiking neural networks
you can also have a multi-core
architecture with mesh routing and you
can do this in a digital way in an
analog way in a signal way and then you
have the circuits right because you have
an architecture but you also need the
circuit designers and you need low power
circuits and for the synapses for the
neurons for the activation functions and
so on
and then you have another expertise
that's related with algorithms
so we need to fit this neural network
model into an Asic
so we have faced often with the
challenge of compressing the algorith
sometimes the algorithm the neural
network model that it's giving to us is
too big to be fitted in the on the Asic
and we need to compress it
and then we need also software tools
we need a quantization quantization
training tools Hardware hour training
tools we need a mapper and a compiler to
be able to map this neural network into
our hardware and we have also the
physical devices right we we do the
integrated circuit design with process
design kits with the CMOS transistors we
need to choose the right technology node
so that we really are have a high energy
efficiency
and then another topic that's very broad
are embedded number Latin memories in
order to decrease the leakage that this
AC can have this can help of course in
some use cases and there are many
flavors of embellino volatile memories
rare Rams M Rams PCMS ferroelectric fits
so for now on I will focus on the
architectures the circuits and and the
software tools but just that you are
aware this you need all these expertises
and you need really to combine them so
you cannot focus in one and forget the
others
so
um
IIs
we are doing a mixed signal
a-n-n Asic inference accelerator Asic
with analoging memory computing
so a lot of companies and research
institutes and universities are doing
this in a digital way I would like to
explain you why we are doing this in an
analogue way and it's not only because
we want to give nice challenges to our
analog IC designers it has also other
reasons
um so here you can see the uh the
question of the output of a of a neuron
and you see here it's a multiply and
accumulate operation mainly
so if we can accelerate this and do it
in a very energy efficient way
the whole Asic will be energy efficient
and it will be also fast
so how can we do this
um can we do this in an analog way so
it's called in memory Computing it means
the weights are
stored there so our the this transistor
with the the resistor there these are
the weights and um the X are the inputs
right of your neuron so you multiply
each input with the the resistance and
um with Ohm's law you get a current flow
in there and you do the same for the
second input X2 and by Ketch of slow you
are adding the currents right
so
um
on this way you um can really also do
the non-linear activation function on
analog analog way or do it in a digital
way but what's important is to do the
mark operation in an analog way why you
see here you can get you can profit from
high parallelism and from high
computation speed and
um since there is no data movement
because you are Computing on the memory
then you get the Energy Efficiency on
top
okay but uh we saw the advantages so
let's see that which are the
disadvantages of this analoging memory
Computing so here you can see the
distribution so I make sure distribution
of the weights on this uh analog
circuits so we target seven and
different values for the for the weights
and you can see here okay some of them
match but there is a distribution so
there is no exact uh matching between
our measure value and the target value
so how we deal with this because anyway
we did we need an accurate and then
computation right so no one is going to
buy our Asic if we have a very bad
accuracy right
so
um for dealing with this we profit for
um what we call Hardware hour training
so we don't do only quantization our
training right where we quantize the
weight so that we have a low memory
footprint and then again better Energy
Efficiency we do what also what we
called fall our
quantization training so we give a
variance to the weight and we train with
these variants in order to get a robust
neural network model that can deal with
the variance of the hardware
and then the the first
um box here related with Hardware our
training is we have
um uh also seen that we need uh to
explore the model so in order to
exchange the model also with the end
users
so the second tool that it's very
important is the mapping tool
so you have a neural network model and
you have for example like here a
multi-core Asic with six different cores
and here we see an example of a voice
activity detection Network mapping where
we have seven different layers that are
mapped in these six different cores so
um the mapping for a memory Computing
can follow different strategies or
reduction of data movement maximum
utilization of Hardware resources and
depending on the strategy you follow
this will have a huge impact on the gay
performance indicators of your Asic
and so it will have a huge impact on
through output on latency on energy
consumption so you need to be very
careful and it can happen sometimes you
have a very good architecture very weird
circuits but if your mapper is not so
good you will get not such great results
right
so and then I won and just um maybe to
comment about benchmarking
so there are a lot of inference
accelerator A6 that compares themselves
with others based on the tops per Watts
or on the operations per second and bad
that they can perform be very careful
with this because it's not so easy to
find out how this number came up yeah
and um I prefer really to turn use this
number and focus on the use case right
so
um for this of course we need a friend a
benchmarking framework and um it will be
necessary to have a new network
architecture search engine that really
um
explores all Hardware capabilities and
at the end keeps us the model that
better can perform on a certain Hardware
right and when we get different models
that are the best model that
gives you the best kpis on a certain
Hardware only then can you do a fair
comparison but please do always this
comparison based on the use cases
and then I go to the conclusions so
um
we need a tool change and for doing a
really hard works of work co-design so
we talk about these six enables so we
need architecture circuits we need
software tools and we need these
Hardware software code design and this
tool chain for the design and we need it
also for the end users and and then
another point that I want to bring the
irritation to is the and that we need
accurate and then computation at circuit
level so there are many papers that talk
about tops per watt and that they
outperform but they never said it's this
a typical simulation did they do corner
simulation mismatch simulation it's this
circuit really robust or not right
and as I said before
be careful with benchmarking and to use
case-based benchmarking thank you very
much so here you have my contact
information
thank you Loretto very interesting
um do we have any questions from the
audience
yeah come to the mic if you can yeah
the question about Energy Efficiency
again
um
so you had the tops per what and there
are metrics that we're trying to use in
data centers for example that's called
the it
ee for example which is an isometric but
when you actually look at the units it's
actually top per Joule yes so it's the
work that needs to be done we always get
stuck stuck in this problem of uh of the
time it takes to compute so is it
critical is time critical in a lot of
applications because then you know when
you go out and buy an automobile of a
vehicle you say how is efficient is the
vehicle the person selling you the
vehicle should say well how do you drive
because that's the important thing yeah
so it was more of an issue around the
How We Do deal with this Energy
Efficiency metrics or how we get to a
good Energy Efficiency yeah so
um that's why I brought the the use case
benchmarking so we think the energy per
inference
yeah and the latency for inference these
are good metrics here yeah
to compare use cases but but not the
tops for but because you don't know
so normally they they just give you the
the maximum number of operations that
theoretically this Hardware can perform
and then they divide by some middle
value of energy from some use case yeah
and
right so that's
not good for comparison so I recommend
the energy for influence on the latency
for inference
any other questions
no I have one so yeah this is a very
interesting area I don't know I've got
actually a couple of questions but the
first one is in this analog world
you've got you went through all the
details and you can just see the
tremendous energy benefits from from the
potential of that but unfortunately we
have to interface with the digital world
so do you take into account the adcs in
the that power as well we we take this
into account so we we develop an
architecture that doesn't need the Ducks
so they are integrated in this in memory
Computing part but we need the adcs so
we design for example SAR adcs because
they are low energy right so nine bits
are ADC that's the interface at the
moment between the analog and the
digital world yeah and we take the
energy into account yeah and I think the
other thing that you basically covered
at the beginning of the presentation is
you're focused on the really low energy
Edge yes
um and and the way I like to look at
this is actually we need to start
thinking of the the whole as the
computer so from the cloud to the near
Edge uh to the the intelligent iot Edge
which I guess is really you're working
in those two yes but when you partition
your application you want to decide can
I push it to the edge so I'm not burning
all that power taking it to the cloud so
it's very very linked to you know these
are yeah so so we we even have seen some
applications in which uh the bottleneck
you know it's not the energy but it's
the latency yes so where the fpgas and
and CPUs can really not manage the
computation
um with a small amount of time that it's
needed and and so that there are two
aspects here not that kind of fit for
from this architecture so one is the
energy of course but the other aspect is
it's the latency so there are some
rather leader applications in which you
have such amount of
data and and you need to compute the
next one really close right so you you
need to be really fast right I I tend to
find that if you reduce latency you tend
to reduce energy yes of course you do
get one with the other any more
questions
yeah thanks
okay so do you already have preliminary
data about the Energy Efficiency
compared let's say to traditional Nvidia
systems that is one question and the
other question
and the other question is
um yeah I like the approach to go on the
application question right so how
efficient do you solve this application
are there already standards out how to
do this in the community uh where are we
there at this point yeah
so for the first question so once we
compare a use case a neural network on
an Nvidia fpga I I don't remember any
model and on our Hardware but it was a
theoretical comparison because we for
this architecture for this multi-core
architecture will receive the A6
yesterday so we we didn't measure them
so it's based on the simulation on
Cadence simulations and we were like 350
times uh on on the energy efficient side
better right and um
the second question if there are some
standards and so on so there there are
um there is this tiny ml Community where
um they have like a neural network
reference models
and they try to
Benchmark different hard words while you
use the same neural network model but I
think this is also a little bit
misleading right because if um or you
you need to really do this for a huge
amount or or some neural different
neural network models because you can
have a hardware accelerator for for
example for the resnet no it will
outperform when you do the the resonated
model and it will not do anything when
you run there another NN model right
so but but you can look on a tiny ml so
there is a community there trying to do
this this benchmarking okay thank you we
are good on time so we can take one more
question Maybe
okay
hello uh
it doesn't work I think this one yeah uh
this uh this Association is also for
Edge Computing as I understand correctly
or sorry again uh is this your so these
accelerators are also for Edge Computing
or mainly for Edge Computing so they
they are meant for I will say more for
iot sensors but Hai yeah on the edge yes
how do you handle the environmental
changes uh on the edge because when you
are using analog analog part uh I think
there is a big role of environmental
properties many temperature changing the
temperature how do you handle this this
problem we do corner simulations of
course and and we Define our temperature
corner so with minus 40 up to 85 so we
simulate and and we see if the the
design is inside the the corners notes
of the if the specifications are inside
the corner so we run inferences so for
example 1000 inferences on on Cadence
and and we see okay do we get the the
same accuracy done with our Howard where
our training uh model
and if it's the case we we know the
secret is it's good oh okay thank you
thank you Loretta that was very good
thank you
[Applause]
Посмотреть больше похожих видео
Neuromorphic Intelligence: Brain-inspired Strategies for AI Computing Systems
Architecture All Access: Neuromorphic Computing Part 2
The Next Generation Of Brain Mimicking AI
Sakya Dasgupta: A Journey from Gaming To Enabling Embodied Intelligence
Really Low Power AI-enabled Microcontroller is SPOT On
The Basics of Neuromorphic Computing
5.0 / 5 (0 votes)