Energy Efficient AI Hardware: neuromorphic circuits and tools

Open Compute Project
1 May 202325:38

Summary

TLDRIn this presentation, Loretta Matia from the Fraunhofer Institute discusses the need for energy-efficient AI hardware for edge devices. She emphasizes the advantages of on-device AI, such as reduced latency and enhanced privacy, and the challenges of designing low-energy, low-latency ASICs for neural networks. Matia outlines the importance of neuromorphic computing and the six key areas of expertise required for its success, including system architecture, circuits, algorithms, and software tools. She also addresses the importance of accurate, energy-efficient computation at the circuit level and the need for use-case-based benchmarking.

Takeaways

  • 🌐 Loretta is a department head at the Fraunhofer Institute for Integrated Circuits, focusing on energy-efficient AI hardware and neuromorphic circuits.
  • 💡 The motivation for developing edge AI hardware is to reduce latency, increase energy efficiency, and enhance privacy by processing data locally rather than in the cloud.
  • 🔋 Key advantages of edge AI include low latency due to local processing, higher energy efficiency by avoiding wireless data transmission, and improved privacy as data stays where it's generated.
  • 📉 The push for edge AI is also driven by the end of Moore's Law, necessitating new computing paradigms like neuromorphic architectures to overcome limitations of traditional von Neumann architectures.
  • 🛠️ Six critical areas of expertise for neuromorphic computing include system architecture, circuit design, algorithms, software tools, physical devices, and embedded non-volatile memories.
  • 🔄 The presentation discusses an inference accelerator ASIC designed for neural networks with objectives like energy efficiency, speed, small area, and scalability to cover a broad range of use cases.
  • 🔄 Analog memory computing is highlighted as a method to achieve high computation speed and energy efficiency by performing operations in-memory, leveraging the inherent parallelism of analog circuits.
  • 📏 Hardware-aware training is crucial to deal with the non-idealities of analog computing, such as weight distribution mismatches, to ensure accurate and robust neural network models.
  • 🔧 The importance of careful mapping of neural networks onto hardware is emphasized, as it significantly impacts performance metrics like latency and energy consumption.
  • 📊 Benchmarking should be use-case based, focusing on energy per inference and inference latency rather than just top performance metrics, which may not reflect real-world application performance.

Q & A

  • What is the primary motivation for developing energy-efficient AI hardware?

    -The primary motivation is to bring AI to edge devices, IoT devices, and sensors where data is generated and collected. This allows for low latency, higher energy efficiency, and improved privacy since data does not need to be sent to the cloud.

  • What are the three main advantages of bringing AI to the edge?

    -The three main advantages are low latency, higher energy efficiency, and enhanced privacy. Low latency is achieved by processing data locally, energy efficiency is improved by not sending raw data wirelessly to the cloud, and privacy is enhanced because data remains where it's generated.

  • Why is it necessary to move away from conventional von Neumann architectures for AI hardware?

    -As Moore's Law is reaching its limits, conventional von Neumann architectures, which have a bottleneck between memory and computation, are no longer sufficient. Instead, architectures that compute close to or in the memory, like neuromorphic computing, are needed to achieve the required low latency and high energy efficiency.

  • What are the key objectives for an inference accelerator ASIC to be successful in the market?

    -The key objectives include being designed in established semiconductor processes, having ultra-low energy consumption per inference, being fast, having a smaller area for lower cost, and being configurable and scalable for a range of use cases.

  • Why is in-memory computing considered advantageous for AI hardware?

    -In-memory computing is advantageous because it allows for high parallelism and computation speed, and since there is no data movement as the computation happens on the memory, it significantly improves energy efficiency.

  • How does the presenter's team address the issue of inaccuracy in analog memory computing?

    -The team addresses inaccuracy through hardware-aware training, which involves quantizing the weights to reduce memory footprint and training the neural network model to be robust against hardware variances.

  • What is the importance of a mapping tool in the design of AI hardware?

    -A mapping tool is crucial as it determines how a neural network model is mapped onto the hardware. The strategy used for mapping can significantly impact performance indicators such as latency and energy consumption.

  • Why should benchmarking for AI hardware focus on use cases rather than just top performance metrics?

    -Benchmarking should focus on use cases because top performance metrics like TOPS per Watt can be misleading and do not reflect real-world performance. Use case-based benchmarking provides a more accurate comparison of how well the hardware performs for specific applications.

  • How does the presenter's team ensure their AI hardware is robust against environmental changes, especially temperature variations?

    -The team ensures robustness against environmental changes by performing corner simulations that cover a wide range of temperatures. They define temperature corners and simulate to ensure the hardware meets specifications across these extremes.

  • What is the significance of the tinyML community in the context of AI hardware benchmarking?

    -The tinyML community provides a framework for benchmarking AI hardware using neural network reference models. This allows for a standardized comparison of different hardware's performance on the same neural network models.

Outlines

00:00

🙌 Introducing Loretta Matia and Energy-Efficient AI Hardware

The speaker, Loretta Matia, department head at Fraunhofer Institute, introduces her work on energy-efficient AI hardware. She explains the need for hardware solutions to support AI applications in edge devices, emphasizing the importance of low latency, high energy efficiency, and data privacy. By processing AI algorithms locally rather than sending data to the cloud, these solutions offer faster processing, better energy use, and enhanced privacy.

05:03

⚡ AI Hardware Design Challenges and Requirements

Loretta discusses the challenges in designing AI-specific integrated circuits (ASICs) that are both energy-efficient and low-latency. She outlines three essential factors: circuits must be designed using qualified semiconductor processes, must minimize energy consumption per inference, and must have a small area to reduce costs. Additionally, the architecture should be configurable and scalable to meet different use cases. Loretta also highlights the need for system architectures that include multicore setups, digital, analog, or mixed-signal approaches.

10:05

🧠 Importance of System Architecture and Algorithms in AI ASICs

The discussion shifts to the six key enablers for neuromorphic computing: system architecture, circuit design, algorithms, software tools, physical devices, and embedded memories. Each of these elements must work together to achieve the required energy efficiency and performance in AI ASICs. The speaker also stresses the importance of compression techniques for neural networks to fit within the ASIC and the role of software tools like mappers and compilers to ensure efficient operation.

15:07

🔄 Analog vs Digital Approaches in In-Memory Computing

Loretta explains the concept of in-memory computing, where computations are done within the memory itself to minimize data movement and enhance efficiency. She compares analog and digital methods, advocating for the analog approach due to its potential for high parallelism and computation speed. However, she also acknowledges challenges such as weight distribution inaccuracies, which must be addressed through techniques like hardware-aware training to ensure accurate performance.

20:07

🌍 Mapping Neural Networks and Benchmarking ASICs

The focus shifts to the importance of effective mapping tools in determining how neural networks are mapped onto multicore ASIC architectures. Poor mapping can lead to suboptimal performance even if the hardware itself is well-designed. Loretta also critiques the common benchmarking methods, which focus on tops per watt or operations per second, and argues for benchmarking based on specific use cases for more accurate comparisons.

25:08

🏁 Conclusions: Accurate Hardware and Software Co-Design

Loretta wraps up her presentation by emphasizing the importance of co-design between hardware and software in developing AI systems. She calls for a comprehensive toolchain that integrates architecture, circuits, and software tools. Loretta also advises caution in benchmarking and stresses the need for accurate, robust designs that perform well under real-world conditions. Her final remarks focus on ensuring energy efficiency and robustness at the circuit level.

Mindmap

Keywords

💡AI Hardware

AI Hardware refers to specialized computing systems designed to accelerate artificial intelligence algorithms, particularly neural networks, at the edge of networks or IoT devices. In the video, the presenter discusses the need to bring AI Hardware closer to data generation points to reduce latency, improve energy efficiency, and ensure data privacy by avoiding the transmission of raw data to the cloud.

💡Low Latency

Low latency refers to the short time delay between the input to a system and the corresponding output. In the context of AI hardware, low latency ensures that AI algorithms running on edge devices can process data in real-time without the delays associated with sending data to the cloud. The video emphasizes the importance of low latency in applications that require immediate responses, such as real-time IoT sensors.

💡Energy Efficiency

Energy efficiency in AI Hardware design is critical for reducing the power consumption of devices running AI algorithms. The presenter explains how moving computation from the cloud to the edge not only improves latency but also results in greater energy efficiency, as the data is processed locally, reducing the need for wireless data transmission and power-hungry cloud computations.

💡In-Memory Computing

In-memory computing is a design approach where computations, such as multiply and accumulate operations in neural networks, are performed directly in the memory where the data is stored, rather than transferring data to a separate processing unit. The video describes how this technique enhances energy efficiency and computational speed by eliminating the bottleneck of data movement between memory and processors, a problem inherent in traditional von Neumann architectures.

💡Neuromorphic Circuits

Neuromorphic circuits are specialized hardware designed to mimic the functionality of biological neural systems. In the video, these circuits are discussed as part of the next generation of AI hardware, where energy-efficient, application-specific integrated circuits (ASICs) are developed to handle neural network computations with high speed and low energy consumption. These circuits are especially useful for edge computing applications.

💡Application-Specific Integrated Circuits (ASICs)

ASICs are custom-built chips optimized for specific applications. In the video, the presenter focuses on inference accelerator ASICs that are designed to run neural network models with ultra-low energy consumption and high speed, making them ideal for AI tasks at the edge of networks. These ASICs must meet stringent energy, latency, and cost requirements to be successful in the market.

💡Quantization-Aware Training

Quantization-aware training is a technique used in neural network training where the model is trained with knowledge of the limitations of hardware, such as reduced precision in weight representation. The video explains how this technique helps ensure that AI models maintain high accuracy even when deployed on energy-efficient, low-power ASICs, where hardware variations can cause discrepancies in computation.

💡Non-Volatile Memory

Non-volatile memory (NVM) retains stored data even when the device is powered off. The video highlights the use of embedded NVM in AI hardware to reduce energy leakage, which is crucial for maintaining energy efficiency in AI accelerators at the edge. Various types of NVM, such as ReRAM, MRAM, and ferroelectric FETs, are mentioned as solutions for improving hardware performance in edge devices.

💡Multi-Core Architecture

A multi-core architecture refers to a hardware design that incorporates multiple processing cores on a single chip, allowing for parallel processing of tasks. In the video, the presenter discusses the importance of creating scalable, configurable multi-core architectures for AI hardware, which enables different cores to handle different layers of a neural network, thereby optimizing energy consumption and performance for a range of use cases.

💡Use Case-Based Benchmarking

Use case-based benchmarking is the process of evaluating AI hardware performance based on specific, real-world applications rather than generic metrics like TOPS per Watt. The video emphasizes the importance of this approach for comparing different AI accelerators since performance can vary significantly depending on the type of neural network or task being executed. This method ensures that the chosen hardware is well-suited for the intended application.

Highlights

Introduction of Loretta, the department head of integrated circuits at the Fraunhofer Institute for Integrated Circuits.

The motivation behind the need for energy-efficient AI hardware.

Advantages of bringing AI to edge devices, such as low latency, higher energy efficiency, and improved privacy.

The significance of microseconds and nanojoules per inference in the context of AI hardware.

The end of Moore's Law and the need for neuromorphic architectures to overcome bottlenecks in conventional computing.

Objectives for a successful neural network inference accelerator ASIC, including ultra-low energy consumption and fast computation.

The importance of system architecture in designing AI hardware, including options for single-core or multi-core architectures.

The role of circuit design in creating low-power circuits for synapses, neurons, and activation functions.

The necessity of algorithm expertise for fitting neural network models into ASICs and the challenge of model compression.

The need for software tools in neuromorphic computing, such as quantization training tools and mappers for hardware.

The choice of technology node and embedded non-volatile memories for achieving high energy efficiency.

Explanation of mixed-signal ANN ASIC inference accelerator with analog memory computing.

The concept of in-memory computing and its benefits for energy efficiency and speed.

Challenges with analog memory computing, such as weight distribution and the need for hardware-aware training.

The importance of mapping tools in efficiently utilizing hardware resources and reducing data movement.

The need for accurate and robust computation at the circuit level for reliable AI hardware performance.

The significance of use-case-based benchmarking over generic performance metrics like TOPS per Watt.

Discussion on the energy efficiency of analog computing and the integration of ADCs for interfacing with the digital world.

The potential of the presented AI hardware for edge computing and handling environmental changes.

Transcripts

play00:00

current so I'd like to welcome up who

play00:02

lead to karabalam no no no sorry Peggy

play00:06

pardon

play00:07

Loretta Loretto I seem to have the order

play00:10

amongst out my apologies I have Loretta

play00:13

Mattie Matia she's the department head

play00:16

of integrated circuits insist in the

play00:18

systems department for franhofer

play00:20

institute for integrated circuits and

play00:23

her title is energy efficient AI

play00:25

Hardware neuromorphic circuits and tools

play00:28

thank you

play00:35

[Music]

play00:47

okay so

play00:48

um I will first in my tool give you a

play00:52

motivation why we need a Hai hardware

play00:55

and then I will just jump in into which

play00:59

are the challenges of Designing energy

play01:02

efficient and a low latency application

play01:06

specific integrated circuits for neural

play01:10

networks

play01:11

so if we want to bring

play01:15

AI to the age

play01:18

devices to iot to devices to sensors so

play01:24

really where and the data

play01:27

um is generated and uncollected we need

play01:30

also to bring their the AI Hardware that

play01:34

can compute the AI algorithms and not do

play01:37

it anymore on on the cloud so which are

play01:42

the advantages of bringing AI to to the

play01:46

age so we will have a low latency

play01:50

because we are not sending all the raw

play01:53

data that we are generating to the cloud

play01:56

anymore so it um the the AI algorithm

play02:01

will run locally then we will have a

play02:05

higher Energy Efficiency right because

play02:08

we are Computing again locally and we

play02:10

are not sending all this raw data to the

play02:13

cloud and this is done mainly in a

play02:16

wireless way that even consumes more and

play02:20

then we have as a third Advantage

play02:22

Advantage the Privacy so because the

play02:25

data reminds where it's generated and we

play02:28

don't send it to the cloud

play02:31

so these are the three main advantages

play02:34

and um there are really use cases and

play02:37

applications that that read need this

play02:40

low latency and and High Energy

play02:42

Efficiency

play02:43

and um

play02:46

um and when I'm talking here about low

play02:48

latency and High Energy Efficiency I'm

play02:52

talking about microseconds

play02:55

and I'm talking about nanojoules per

play02:58

inference so we heard before cloud

play03:00

computing high performance Computing

play03:03

about hundreds of Parts thousands of

play03:05

parts so I'm really talking here about

play03:08

millibots or microbots you're right so

play03:12

this this open also the new

play03:14

possibilities new use cases and and

play03:18

really to master this low latency and

play03:22

High Energy Efficiency since we are

play03:24

dealing with the end of Moore's Law and

play03:27

we cannot rely anymore in um

play03:31

conventional for Newman architectures we

play03:35

really need here to go for

play03:39

grindinspire architectures so none for

play03:43

Newman architectures right where we

play03:47

don't

play03:48

um have this bottleneck

play03:50

this phone human bottleneck between the

play03:53

memory block

play03:55

and the computation engine so we really

play03:59

need to compute close to the memory or

play04:02

in the memory to achieve this um these

play04:06

numbers

play04:07

so for now on I will be talking about to

play04:12

you about a inference accelerator X6 for

play04:16

neural networks and what are the

play04:19

objectives that um this inference

play04:22

accelerator Asics need to achieve in

play04:25

order to be successful in the market and

play04:29

also in order to cover up gold range of

play04:32

use cases so first they need to be

play04:36

designed and fabricated in established

play04:38

qualified semiconductor processes

play04:40

because no company right it's it's going

play04:45

um to to will to to to buy an Asic of a

play04:51

non-qualified CMOS process then we

play04:54

really need this ultra low energy

play04:56

consumption per inference so while we

play04:59

run one inference we need to be really

play05:03

energy efficient and we need to be

play05:06

really fast

play05:08

and then we um another objective to

play05:11

achieve is a smaller area why because

play05:13

this will lead to us a low price of of

play05:17

your Asic and then the fifth objective

play05:20

is to make a configurable and scalable

play05:24

multi-core architecture because on this

play05:27

way you can cover several use cases with

play05:30

the same architecture

play05:32

okay so um

play05:34

how we can achieve this so I in my

play05:39

opinion there are six enables for

play05:42

neuromorphic computing six areas of

play05:45

expertise that you have and need to

play05:48

master in order to reach the the

play05:52

objectives I talked to you before the

play05:56

first one is system architecture we

play05:59

heard about this before an unexplained

play06:01

this so

play06:04

um there are architectures for

play06:06

artificial neural networks there are

play06:09

architectures for spiking neural

play06:12

networks

play06:13

um you can do for example a single core

play06:16

architecture a multi-core architecture

play06:18

you can have a multi-core architecture

play06:21

with a network on chip for neural

play06:24

networks or for spiking neural networks

play06:26

you can also have a multi-core

play06:29

architecture with mesh routing and you

play06:31

can do this in a digital way in an

play06:33

analog way in a signal way and then you

play06:36

have the circuits right because you have

play06:38

an architecture but you also need the

play06:42

circuit designers and you need low power

play06:44

circuits and for the synapses for the

play06:47

neurons for the activation functions and

play06:50

so on

play06:51

and then you have another expertise

play06:54

that's related with algorithms

play06:57

so we need to fit this neural network

play07:00

model into an Asic

play07:03

so we have faced often with the

play07:06

challenge of compressing the algorith

play07:08

sometimes the algorithm the neural

play07:11

network model that it's giving to us is

play07:13

too big to be fitted in the on the Asic

play07:16

and we need to compress it

play07:19

and then we need also software tools

play07:23

we need a quantization quantization

play07:27

training tools Hardware hour training

play07:30

tools we need a mapper and a compiler to

play07:33

be able to map this neural network into

play07:36

our hardware and we have also the

play07:39

physical devices right we we do the

play07:42

integrated circuit design with process

play07:45

design kits with the CMOS transistors we

play07:49

need to choose the right technology node

play07:52

so that we really are have a high energy

play07:55

efficiency

play07:57

and then another topic that's very broad

play08:00

are embedded number Latin memories in

play08:03

order to decrease the leakage that this

play08:06

AC can have this can help of course in

play08:09

some use cases and there are many

play08:11

flavors of embellino volatile memories

play08:15

rare Rams M Rams PCMS ferroelectric fits

play08:20

so for now on I will focus on the

play08:23

architectures the circuits and and the

play08:26

software tools but just that you are

play08:29

aware this you need all these expertises

play08:32

and you need really to combine them so

play08:35

you cannot focus in one and forget the

play08:38

others

play08:39

so

play08:41

um

play08:42

IIs

play08:43

we are doing a mixed signal

play08:48

a-n-n Asic inference accelerator Asic

play08:51

with analoging memory computing

play08:54

so a lot of companies and research

play08:58

institutes and universities are doing

play09:01

this in a digital way I would like to

play09:04

explain you why we are doing this in an

play09:06

analogue way and it's not only because

play09:09

we want to give nice challenges to our

play09:13

analog IC designers it has also other

play09:16

reasons

play09:18

um so here you can see the uh the

play09:22

question of the output of a of a neuron

play09:26

and you see here it's a multiply and

play09:30

accumulate operation mainly

play09:33

so if we can accelerate this and do it

play09:36

in a very energy efficient way

play09:39

the whole Asic will be energy efficient

play09:42

and it will be also fast

play09:45

so how can we do this

play09:48

um can we do this in an analog way so

play09:51

it's called in memory Computing it means

play09:54

the weights are

play09:57

stored there so our the this transistor

play10:02

with the the resistor there these are

play10:05

the weights and um the X are the inputs

play10:09

right of your neuron so you multiply

play10:13

each input with the the resistance and

play10:18

um with Ohm's law you get a current flow

play10:21

in there and you do the same for the

play10:24

second input X2 and by Ketch of slow you

play10:28

are adding the currents right

play10:30

so

play10:31

um

play10:32

on this way you um can really also do

play10:37

the non-linear activation function on

play10:41

analog analog way or do it in a digital

play10:44

way but what's important is to do the

play10:47

mark operation in an analog way why you

play10:51

see here you can get you can profit from

play10:54

high parallelism and from high

play10:56

computation speed and

play10:59

um since there is no data movement

play11:01

because you are Computing on the memory

play11:03

then you get the Energy Efficiency on

play11:06

top

play11:08

okay but uh we saw the advantages so

play11:12

let's see that which are the

play11:13

disadvantages of this analoging memory

play11:16

Computing so here you can see the

play11:18

distribution so I make sure distribution

play11:20

of the weights on this uh analog

play11:23

circuits so we target seven and

play11:26

different values for the for the weights

play11:29

and you can see here okay some of them

play11:32

match but there is a distribution so

play11:34

there is no exact uh matching between

play11:38

our measure value and the target value

play11:41

so how we deal with this because anyway

play11:44

we did we need an accurate and then

play11:47

computation right so no one is going to

play11:50

buy our Asic if we have a very bad

play11:53

accuracy right

play11:55

so

play11:56

um for dealing with this we profit for

play12:00

um what we call Hardware hour training

play12:02

so we don't do only quantization our

play12:07

training right where we quantize the

play12:10

weight so that we have a low memory

play12:12

footprint and then again better Energy

play12:15

Efficiency we do what also what we

play12:18

called fall our

play12:20

quantization training so we give a

play12:24

variance to the weight and we train with

play12:27

these variants in order to get a robust

play12:30

neural network model that can deal with

play12:33

the variance of the hardware

play12:36

and then the the first

play12:40

um box here related with Hardware our

play12:43

training is we have

play12:45

um uh also seen that we need uh to

play12:48

explore the model so in order to

play12:51

exchange the model also with the end

play12:54

users

play12:55

so the second tool that it's very

play12:59

important is the mapping tool

play13:03

so you have a neural network model and

play13:06

you have for example like here a

play13:09

multi-core Asic with six different cores

play13:13

and here we see an example of a voice

play13:16

activity detection Network mapping where

play13:19

we have seven different layers that are

play13:22

mapped in these six different cores so

play13:25

um the mapping for a memory Computing

play13:28

can follow different strategies or

play13:30

reduction of data movement maximum

play13:32

utilization of Hardware resources and

play13:35

depending on the strategy you follow

play13:39

this will have a huge impact on the gay

play13:42

performance indicators of your Asic

play13:46

and so it will have a huge impact on

play13:49

through output on latency on energy

play13:51

consumption so you need to be very

play13:54

careful and it can happen sometimes you

play13:57

have a very good architecture very weird

play13:59

circuits but if your mapper is not so

play14:01

good you will get not such great results

play14:04

right

play14:05

so and then I won and just um maybe to

play14:10

comment about benchmarking

play14:13

so there are a lot of inference

play14:16

accelerator A6 that compares themselves

play14:20

with others based on the tops per Watts

play14:23

or on the operations per second and bad

play14:26

that they can perform be very careful

play14:29

with this because it's not so easy to

play14:33

find out how this number came up yeah

play14:36

and um I prefer really to turn use this

play14:42

number and focus on the use case right

play14:46

so

play14:48

um for this of course we need a friend a

play14:51

benchmarking framework and um it will be

play14:55

necessary to have a new network

play14:58

architecture search engine that really

play15:02

um

play15:03

explores all Hardware capabilities and

play15:07

at the end keeps us the model that

play15:10

better can perform on a certain Hardware

play15:13

right and when we get different models

play15:16

that are the best model that

play15:19

gives you the best kpis on a certain

play15:22

Hardware only then can you do a fair

play15:26

comparison but please do always this

play15:29

comparison based on the use cases

play15:32

and then I go to the conclusions so

play15:35

um

play15:36

we need a tool change and for doing a

play15:41

really hard works of work co-design so

play15:43

we talk about these six enables so we

play15:47

need architecture circuits we need

play15:49

software tools and we need these

play15:52

Hardware software code design and this

play15:55

tool chain for the design and we need it

play15:58

also for the end users and and then

play16:02

another point that I want to bring the

play16:04

irritation to is the and that we need

play16:08

accurate and then computation at circuit

play16:11

level so there are many papers that talk

play16:13

about tops per watt and that they

play16:15

outperform but they never said it's this

play16:19

a typical simulation did they do corner

play16:23

simulation mismatch simulation it's this

play16:26

circuit really robust or not right

play16:29

and as I said before

play16:32

be careful with benchmarking and to use

play16:36

case-based benchmarking thank you very

play16:39

much so here you have my contact

play16:41

information

play16:46

thank you Loretto very interesting

play16:48

um do we have any questions from the

play16:50

audience

play16:54

yeah come to the mic if you can yeah

play17:00

the question about Energy Efficiency

play17:02

again

play17:03

um

play17:04

so you had the tops per what and there

play17:08

are metrics that we're trying to use in

play17:09

data centers for example that's called

play17:11

the it

play17:12

ee for example which is an isometric but

play17:16

when you actually look at the units it's

play17:19

actually top per Joule yes so it's the

play17:22

work that needs to be done we always get

play17:25

stuck stuck in this problem of uh of the

play17:28

time it takes to compute so is it

play17:30

critical is time critical in a lot of

play17:33

applications because then you know when

play17:36

you go out and buy an automobile of a

play17:39

vehicle you say how is efficient is the

play17:41

vehicle the person selling you the

play17:43

vehicle should say well how do you drive

play17:46

because that's the important thing yeah

play17:48

so it was more of an issue around the

play17:50

How We Do deal with this Energy

play17:52

Efficiency metrics or how we get to a

play17:54

good Energy Efficiency yeah so

play17:57

um that's why I brought the the use case

play18:00

benchmarking so we think the energy per

play18:03

inference

play18:04

yeah and the latency for inference these

play18:09

are good metrics here yeah

play18:12

to compare use cases but but not the

play18:16

tops for but because you don't know

play18:19

so normally they they just give you the

play18:22

the maximum number of operations that

play18:24

theoretically this Hardware can perform

play18:27

and then they divide by some middle

play18:30

value of energy from some use case yeah

play18:34

and

play18:35

right so that's

play18:39

not good for comparison so I recommend

play18:41

the energy for influence on the latency

play18:45

for inference

play18:47

any other questions

play18:51

no I have one so yeah this is a very

play18:55

interesting area I don't know I've got

play18:57

actually a couple of questions but the

play18:59

first one is in this analog world

play19:03

you've got you went through all the

play19:05

details and you can just see the

play19:06

tremendous energy benefits from from the

play19:08

potential of that but unfortunately we

play19:11

have to interface with the digital world

play19:12

so do you take into account the adcs in

play19:15

the that power as well we we take this

play19:17

into account so we we develop an

play19:21

architecture that doesn't need the Ducks

play19:25

so they are integrated in this in memory

play19:28

Computing part but we need the adcs so

play19:31

we design for example SAR adcs because

play19:36

they are low energy right so nine bits

play19:39

are ADC that's the interface at the

play19:41

moment between the analog and the

play19:43

digital world yeah and we take the

play19:46

energy into account yeah and I think the

play19:50

other thing that you basically covered

play19:51

at the beginning of the presentation is

play19:53

you're focused on the really low energy

play19:55

Edge yes

play19:57

um and and the way I like to look at

play19:59

this is actually we need to start

play20:01

thinking of the the whole as the

play20:04

computer so from the cloud to the near

play20:07

Edge uh to the the intelligent iot Edge

play20:11

which I guess is really you're working

play20:12

in those two yes but when you partition

play20:14

your application you want to decide can

play20:17

I push it to the edge so I'm not burning

play20:19

all that power taking it to the cloud so

play20:21

it's very very linked to you know these

play20:24

are yeah so so we we even have seen some

play20:28

applications in which uh the bottleneck

play20:31

you know it's not the energy but it's

play20:33

the latency yes so where the fpgas and

play20:37

and CPUs can really not manage the

play20:40

computation

play20:41

um with a small amount of time that it's

play20:44

needed and and so that there are two

play20:47

aspects here not that kind of fit for

play20:50

from this architecture so one is the

play20:53

energy of course but the other aspect is

play20:55

it's the latency so there are some

play20:59

rather leader applications in which you

play21:01

have such amount of

play21:04

data and and you need to compute the

play21:07

next one really close right so you you

play21:10

need to be really fast right I I tend to

play21:13

find that if you reduce latency you tend

play21:15

to reduce energy yes of course you do

play21:19

get one with the other any more

play21:21

questions

play21:31

yeah thanks

play21:36

okay so do you already have preliminary

play21:39

data about the Energy Efficiency

play21:41

compared let's say to traditional Nvidia

play21:44

systems that is one question and the

play21:46

other question

play21:47

and the other question is

play21:50

um yeah I like the approach to go on the

play21:53

application question right so how

play21:54

efficient do you solve this application

play21:56

are there already standards out how to

play21:59

do this in the community uh where are we

play22:02

there at this point yeah

play22:04

so for the first question so once we

play22:08

compare a use case a neural network on

play22:11

an Nvidia fpga I I don't remember any

play22:14

model and on our Hardware but it was a

play22:18

theoretical comparison because we for

play22:22

this architecture for this multi-core

play22:24

architecture will receive the A6

play22:25

yesterday so we we didn't measure them

play22:29

so it's based on the simulation on

play22:33

Cadence simulations and we were like 350

play22:36

times uh on on the energy efficient side

play22:39

better right and um

play22:43

the second question if there are some

play22:46

standards and so on so there there are

play22:50

um there is this tiny ml Community where

play22:55

um they have like a neural network

play22:59

reference models

play23:01

and they try to

play23:04

Benchmark different hard words while you

play23:07

use the same neural network model but I

play23:10

think this is also a little bit

play23:12

misleading right because if um or you

play23:16

you need to really do this for a huge

play23:22

amount or or some neural different

play23:24

neural network models because you can

play23:26

have a hardware accelerator for for

play23:30

example for the resnet no it will

play23:34

outperform when you do the the resonated

play23:36

model and it will not do anything when

play23:39

you run there another NN model right

play23:43

so but but you can look on a tiny ml so

play23:47

there is a community there trying to do

play23:49

this this benchmarking okay thank you we

play23:53

are good on time so we can take one more

play23:55

question Maybe

play23:59

okay

play24:03

hello uh

play24:05

it doesn't work I think this one yeah uh

play24:10

this uh this Association is also for

play24:12

Edge Computing as I understand correctly

play24:14

or sorry again uh is this your so these

play24:17

accelerators are also for Edge Computing

play24:19

or mainly for Edge Computing so they

play24:21

they are meant for I will say more for

play24:24

iot sensors but Hai yeah on the edge yes

play24:30

how do you handle the environmental

play24:32

changes uh on the edge because when you

play24:35

are using analog analog part uh I think

play24:38

there is a big role of environmental

play24:42

properties many temperature changing the

play24:44

temperature how do you handle this this

play24:45

problem we do corner simulations of

play24:49

course and and we Define our temperature

play24:52

corner so with minus 40 up to 85 so we

play24:58

simulate and and we see if the the

play25:02

design is inside the the corners notes

play25:05

of the if the specifications are inside

play25:08

the corner so we run inferences so for

play25:11

example 1000 inferences on on Cadence

play25:15

and and we see okay do we get the the

play25:19

same accuracy done with our Howard where

play25:23

our training uh model

play25:26

and if it's the case we we know the

play25:28

secret is it's good oh okay thank you

play25:32

thank you Loretta that was very good

play25:34

thank you

play25:38

[Applause]

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
Neuromorphic HardwareEnergy EfficiencyEdge ComputingAI AcceleratorsAnalog ComputingHardware DesignMachine LearningIoT DevicesInference ASICsLow Latency
¿Necesitas un resumen en inglés?