Stanford "Octopus v2" SUPER AGENT beats GPT-4 | Runs on Google Tech | Tiny Agent Function Calls

AI Unleashed - The Coming Artificial Intelligence Revolution and Race to AGI
7 Apr 202410:17

Summary

TLDRStanford University's breakthrough on-device language model, Octopus B2, outperforms GPT-4 in accuracy and speed. This compact model operates on various devices, enhancing privacy and reducing costs. It excels in automatic workflows and function calling, with potential applications in smartphones, cars, and more. The research emphasizes the viability of smaller AI agents that maintain high accuracy and efficiency without the need for large-scale cloud models.

Takeaways

  • 🌟 Stanford University has developed Octopus B2, an on-device language model that outperforms GPT-4 in accuracy and latency.
  • 📱 On-device models like Octopus B2 and Apple's RM can run on personal devices, addressing privacy and cost concerns associated with cloud-based AI models.
  • 🚀 Octopus B2 is a small model with two billion parameters, offering fast function calling capabilities and high accuracy.
  • 🔍 The model reduces context length by 95%, making it efficient for deployment across various edge devices.
  • 📱 Examples of edge devices include smartphones, cars, thermostats, and VR headsets, where the AI can perform tasks like setting reminders, providing weather updates, and messaging.
  • 📊 The research compares the performance of Octopus models with GPT-4, showing that smaller models can surpass larger ones in specific tasks.
  • 🌐 The AI industry is moving towards on-device AI agents that are private, cost-effective, and can be deployed on personal devices.
  • 🔧 The study uses Google's Gemma 2 billion model as a base and compares it with state-of-the-art models like GPT-4.
  • 🏆 Octopus models demonstrated superior accuracy and latency in tests, with Octopus 2 being particularly notable.
  • 📉 The research also explores the use of lower rank adaptation to reduce model size without significantly impacting performance.
  • 🌐 The advancements in AI agents are rapid, and the industry is focusing on creating dependable software that empowers users through function calling and reasoning abilities.

Q & A

  • What is the significance of the development of the Octopus B2 model by Stanford University?

    -The Octopus B2 model is significant because it is an on-device language model that surpasses the performance of GPT-4 in both accuracy and latency. This means it can run efficiently on personal devices like computers and phones, offering faster and more accurate function calling without the need for cloud-based services, which can have associated privacy and cost concerns.

  • How does the on-device model like Octopus B2 differ from cloud-based models in terms of privacy and cost?

    -On-device models like Octopus B2 offer the advantage of processing data locally, which enhances privacy as the data doesn't have to be transmitted to external servers. Additionally, they eliminate the costs associated with cloud-based models where users might be charged per token or per million token usage fees.

  • What is the importance of reducing the context length by 95% in the development of on-device AI agents?

    -Reducing the context length by 95% is crucial as it allows for the creation of more efficient and lightweight AI agents that can operate on a wider range of devices, from smartphones to smart home appliances. This reduction in data requirements makes the AI agents faster and more adaptable to various edge devices without compromising on performance.

  • How does the Octopus B2 model compare to Apple's on-device vision model in terms of size and functionality?

    -While both Octopus B2 and Apple's on-device vision model are designed for efficient on-device processing, the Octopus B2 model is slightly larger in size but is optimized for language processing and function calling. Apple's vision model, on the other hand, is tinier and focuses on visual tasks, such as understanding text on screens.

  • What are some of the specific tasks that the Octopus B2 model can perform effectively?

    -The Octopus B2 model can perform tasks such as creating calendar reminders, retrieving weather information, sending text messages about the weather, and searching YouTube for specific content, like a Taylor Swift concert. These tasks demonstrate its capability in understanding and executing function calls related to personal assistance and information retrieval.

  • How does the performance of the Octopus B2 model compare to GPT-4 in terms of accuracy and latency?

    -The Octopus B2 model outperforms GPT-4 in both accuracy and latency. It has demonstrated higher accuracy rates in certain tasks and has significantly lower latency times, making it faster and more efficient for on-device applications.

  • What is the role of the RAG (Retrieval-Augmented Generation) technique in improving AI models?

    -The RAG technique enhances AI models by providing them with a sort of 'cheat sheet' or database to reference when generating responses. This reduces the likelihood of 'hallucinations' or incorrect information being generated, thereby improving the accuracy and reliability of the AI's responses.

  • How does the development of smaller AI models like Octopus B2 impact the future of AI technology?

    -The development of smaller AI models like Octopus B2 suggests that advancements in AI technology can be achieved not just by increasing model size but also by optimizing smaller models for specific tasks. This can lead to more efficient, cost-effective, and privacy-friendly AI solutions that can be deployed across a wide range of devices and applications.

  • What does the comparison between the performance of the Octopus models and GPT-4 indicate about the potential of on-device AI agents?

    -The comparison indicates that on-device AI agents can match or even surpass the performance of larger, cloud-based models like GPT-4 in terms of accuracy and latency. This suggests a promising future where AI agents can operate efficiently and effectively on personal devices without relying on cloud services.

  • How does the use of lower rank adaptation in the Octopus models affect their performance and deployment?

    -Lower rank adaptation allows for the fine-tuning and simplification of models, reducing the number of parameters used while still maintaining similar results. This technique enables the deployment of AI agents that are robust and efficient, suitable for product use, while also reducing computational requirements and potentially lowering costs.

  • What are some of the emerging trends in the AI industry highlighted by the development of on-device models like Octopus B2?

    -The development of on-device models like Octopus B2 highlights emerging trends such as the focus on creating AI agents that are highly efficient, lightweight, and capable of performing specific tasks with high accuracy. It also underscores the shift towards edge computing in AI, where the processing power is brought closer to the source of data, enhancing speed, reducing latency, and improving privacy.

Outlines

00:00

🤖 Introducing Octopus B2: A Compact On-Device Language Model

Stanford University has introduced a new on-device language model called Octopus B2, which is designed to surpass the performance of GPT-4 in both accuracy and latency. This compact model is capable of running on various devices, such as computers and smartphones, without the need for cloud-based processing. The model's efficiency is highlighted by its ability to decrease context length by 95% and its potential for deployment across a wide range of edge devices. The script discusses the importance of function calling in AI agents and the benefits of having on-device models that address concerns over privacy and cost, which are often associated with large-scale cloud models.

05:01

🏆 Benchmarking Octopus Models Against Industry Standards

The Octopus models, including Octopus 2, are benchmarked against industry standards like GPT-4 and GPT-3.5 to evaluate their performance. The models are tested on their ability to perform function calling and reduce hallucinations using the RAG (Retrieval-Augmented Generation) technique. The results show that while the performance of the models is close to 100%, the Octopus models outperform their counterparts, including GPT-4, in both accuracy and latency. The discussion also touches on the training methodologies, such as full model training and low-rank adaptation, and the impact of data set size on model performance. The significance of these findings lies in the potential for deploying highly efficient, cost-effective, and accurate AI agents on edge devices for tasks like creating reminders, fetching weather updates, and messaging.

10:03

📈 The Future of AI: Smaller is Smarter

The script concludes with a reflection on the future of AI, suggesting that advancements in the field are not limited to increasing model size. Instead, the development of smaller, more efficient models like Octopus B2 and Apple's on-device vision models demonstrate that compact AI agents can perform specific tasks with high accuracy and speed. This trend challenges the conventional belief that larger models are inherently better, highlighting the potential for smaller models to deliver robust performance while addressing concerns related to cost, efficiency, and privacy.

Mindmap

Keywords

💡Autonomous AI agent

An autonomous AI agent refers to artificial intelligence systems that can operate independently, without human intervention, to perform tasks or make decisions. In the context of the video, it highlights the advancements in AI technology where agents like Octopus B2 and Apple's RM are developed to be efficient and perform well on device, meaning they can run on personal devices such as phones or computers without relying on cloud services.

💡On-device language model

An on-device language model is a type of AI model designed to run directly on a user's device, such as a smartphone or personal computer, rather than relying on cloud-based processing. This approach offers benefits in terms of privacy, as data does not need to be transmitted off the device, and cost, as there are no per-token fees associated with cloud-based AI services.

💡Function calling

Function calling refers to the process of invoking a function or a set of instructions within a program that performs a specific task. In the context of AI agents, this involves the AI system's ability to execute certain actions or services when prompted by the user, such as taking a photo, retrieving news, or sending an email.

💡Latency

Latency in the context of technology refers to the delay between the initiation of an action and its completion. In AI systems, low latency is desirable as it indicates quick response times and efficient processing of tasks. The script discusses how on-device models like Octopus B2 have improved latency compared to cloud-based models, leading to faster and more responsive AI agents.

💡Privacy

Privacy in this context refers to the protection of personal information and data from unauthorized access or disclosure. The use of on-device AI models can enhance privacy as data processing occurs locally on the user's device, reducing the risk of sensitive information being transmitted and potentially exposed through cloud services.

💡Cost

In the context of AI models, cost refers to the financial expenditure associated with using or deploying these systems. Cloud-based models often involve per-token or subscription fees, which can become expensive with extensive use. On-device models, however, can reduce costs as they eliminate the need for continuous payment to cloud service providers.

💡Edge devices

Edge devices are any peripheral devices that are connected to a network and are designed to be used at the 'edge' of a network, away from the central data processing hub. This includes smartphones, cars, smart home devices, and more. The script talks about deploying AI agents on such edge devices, enabling a variety of applications and services to be more responsive and efficient.

💡Parameter

In machine learning and AI, a parameter is a variable that is used to tune the model during training. The number of parameters in a model can be an indicator of its complexity and capacity for learning. The script discusses the Octopus models, which have a relatively small number of parameters compared to larger models like GPT 4, yet still manage to outperform them in certain tasks.

💡Latency reduction

Latency reduction refers to the process of minimizing the delay in the response time of a system. In the context of AI, reducing latency can lead to faster and more efficient AI agents that can perform tasks more quickly, enhancing the user experience. The script emphasizes the importance of latency reduction in the development of on-device AI models like Octopus B2.

💡AI advancements

AI advancements refer to the progress and improvements made in the field of artificial intelligence, including the development of new algorithms, models, and applications. The video discusses the rapid advancements in AI agents, with a focus on the development of smaller, faster, and more efficient models that can be deployed on personal devices.

💡Low-rank adaptation

Low-rank adaptation is a technique used in machine learning to reduce the complexity of a model while maintaining its performance. This involves simplifying the model by reducing the number of parameters it uses, which can lead to a smaller, faster, and more efficient AI system without significantly compromising on accuracy.

Highlights

Stanford University introduces Octopus B2, an on-device language model for super agents that outperforms GPT-4 in accuracy and latency.

Octopus B2 is a small model that can run on various devices, such as computers and phones, offering faster and more accurate performance than GPT-4.

Apple has also developed an on-device model called RM, a vision model that is significantly smaller and more efficient than GPT-4 for visual tasks.

Language models like Octopus B2 have the potential to be highly effective in automatic workflows due to their ability to quickly call functions.

Large-scale language models in cloud environments, while high-performing, often raise concerns over privacy and cost.

Stanford's research presents a new method that empowers an on-device model with two billion parameters, which is smaller yet surpasses GPT-4 in performance.

The new on-device model decreases the context length by 95%, making it faster and more efficient for various tasks.

The AI agent's growing presence is rapid, with advancements leading to the development of more efficient and cost-effective on-device models.

Octopus models are capable of performing tasks like creating calendar reminders, getting weather forecasts, and texting, demonstrating their practical applications.

The research uses Google's Gemma 2 billion model as a basis for the development of the Octopus models, showcasing the potential of open-source models.

The Octopus models were tested against GPT-4 and other state-of-the-art models, with Octopus 2 showing superior accuracy and latency.

The development of on-device AI agents like Octopus B2 and Apple's RM indicates a trend towards smaller, faster, and more efficient models for specific tasks.

The Octopus models demonstrate that AI can be improved by making it smaller, challenging the notion that bigger models are always better.

The research suggests that for certain tasks, tiny AI agents can outperform larger models, offering a new direction for AI development.

The Octopus models' performance indicates that there is potential for deploying AI agents on edge devices without the need for massive models or extensive resources.

The study highlights the importance of function calling in AI agents and the potential for on-device models to excel in this area.

The development and testing of the Octopus models demonstrate a significant step forward in the creation of efficient, on-device AI agents.

Transcripts

play00:00

the autonomous AI agent space is heating

play00:03

up Stanford University drops this gem

play00:05

octopus B2 on device language model for

play00:09

super agent a small model that surpasses

play00:11

the performance of GPT 4 in both

play00:14

accuracy and latency how fast it is and

play00:16

what it is it's an on device language

play00:19

model for super agent so on device

play00:21

meaning it can run on your computer on

play00:23

your phone on whatever and we've seen

play00:25

something very similar from Apple

play00:27

recently they have an on device model

play00:29

they calling it RM and it basically is

play00:32

kind of like a vision model that is Tiny

play00:34

compared to gbt 4 for example something

play00:36

like 8,000th of a% the size of gpc4 can

play00:40

run on device and for certain visual

play00:43

tasks for understanding what's written

play00:45

on your computer screen on your phone

play00:47

screen it exceeds GPT 4's capabilities

play00:50

and this is kind of in the same vein so

play00:52

this is octopus 2 they're saying that

play00:54

language models these llms they're

play00:56

potentially effective in automatic

play00:58

workflows they possess The crucial

play01:00

ability to call functions now really

play01:02

fast let's talk about what what calling

play01:04

functions means just so everybody's on

play01:06

the same page most people probably have

play01:08

heard me talk about it but just really

play01:09

fast so for example if you're dealing

play01:11

with an Android phone and the same thing

play01:14

can be said for Apple phones or your

play01:16

computer or your thermostat or your car

play01:18

or pretty much anything nowadays it has

play01:20

certain functions that you can call that

play01:22

Define what it can do so for example one

play01:25

of them if we're talking about the

play01:26

Android phone for example or any Android

play01:29

system it can take take a photo and you

play01:30

can specify which camera to use like the

play01:32

back camera and what resolution to take

play01:33

the photo at right and here are the

play01:35

parameters you can use you can say the

play01:37

camera is in the front or the back right

play01:39

depending on what phone you're using

play01:41

resolution Etc another function is get

play01:43

trending news right in the US region in

play01:46

English Give Me Five results that are

play01:48

the top results or get the weather

play01:49

forecast or send an email or search

play01:52

YouTube videos etc etc all right but

play01:55

that's function calling right that's

play01:56

that's what functions are an example of

play01:59

a few function and they're essential in

play02:01

creating AI agents they saying despite

play02:03

the high performance of large scale

play02:05

language models in Cloud environments

play02:06

they are often associated with concerns

play02:08

over privacy and cost so certainly I saw

play02:11

a number of applications with Claude or

play02:14

GPT for that were kind of cool but boy

play02:17

they cost a lot cuz you're paying you

play02:18

know open AI or anthropic to run their

play02:22

model in the cloud it's not on your

play02:23

device right it's not local so

play02:25

everything's going through their

play02:27

services you're paying some per token or

play02:29

per million token fee that you're paying

play02:31

them so the more complicated this gets

play02:33

the more you pay and also of course

play02:35

privacy right they can see exactly what

play02:37

you're doing so the solution to that

play02:38

would be something that just runs on

play02:40

your device right but our current on

play02:42

device models for function calling they

play02:44

have issues with latency right how fast

play02:46

they're able to run and accuracy how

play02:48

good they are at actually calling the

play02:49

right function and they're saying our

play02:51

research presents a new method that

play02:52

empowers an onice model with two billion

play02:55

parameters which is rather small it's

play02:57

not as Tiny as some of the smaller Rome

play03:01

models the Apple Vision models but it's

play03:03

definitely on the really smaller side

play03:06

but these on device models they surpass

play03:09

the performance of GPT 4 in both

play03:11

accuracy and latency and they decrease

play03:13

the context length by 95% then they dunk

play03:16

on Zuckerberg for a bit and they're

play03:18

saying this thing is fast enough to

play03:20

deploy across a variety of edge devices

play03:23

so think you know in your phone in your

play03:25

car in your fridge in your thermostat

play03:27

some examples are creating calendar

play03:29

reminders getting the weather and text

play03:30

messaging you know either the user or

play03:32

somebody else about the weather and

play03:34

searching YouTube for a Taylor Swift

play03:36

concert and of course those things are

play03:37

completed and the agent successfully

play03:40

does the thing it is asked and so

play03:41

they're saying that the AI agent's

play03:43

growing presence is very rapid the

play03:45

advancement in agents is rapid right so

play03:47

you have ai assistant tools like

play03:48

multi-on so we've covered that here I

play03:50

was pretty surprised about how good it

play03:53

was like that's where I was like well

play03:55

we're we're certainly farther ahead than

play03:57

I than I realized adep day I so I've

play04:00

heard quite a bit about them and then

play04:01

there's of course rabbit R1 the Humane

play04:04

AI pin and there's a number of other

play04:06

ones including open- sourced ones and

play04:08

they're talking various things various

play04:09

research that went into that like

play04:10

prompting techniques Chain of Thought

play04:12

reasoning and the rise of multi-agent

play04:15

systems but this is a kind of a new

play04:16

trend in this industry showcasing the

play04:18

use of language models right so these

play04:20

gpts and clouds and whatnot and Geminis

play04:23

to develop a Dependable software that

play04:24

empowers users we use API calling you

play04:27

know function calling and reasoning

play04:28

abilities and while this works well they

play04:30

want to create something that is on

play04:32

device something that can be run

play04:33

privately and not cost too much and they

play04:36

want to be able to deploy these agents

play04:37

in these models on edge devices like

play04:39

smartphones cars VR headsets and

play04:41

personal computers I'll link the paper

play04:42

if you want to go through the

play04:44

methodology there's quite a bit here we

play04:46

will just highlight the most important

play04:48

parts looks like they've used Google's

play04:50

Gemma 2 billion model so this is the

play04:52

small model that Google has made open

play04:55

source and after training this model

play04:57

they will be comparing it to kind of the

play04:59

state of the art models specifically

play05:01

here they're going to test it against

play05:02

GPT 4 the uh January 25th checkpoint or

play05:05

sort of that update right cuz they have

play05:07

multiple GPT 4 models so this is the

play05:10

Chabot Arena leaderboard so you can see

play05:12

here this is the 0125 January 25 January

play05:16

25th which is uh you know one of the

play05:18

better ones it looks like the 1106

play05:20

November 06 is more highly rated but I

play05:22

mean they're also close it's pretty much

play05:24

the same that could be just a small

play05:25

variation there and not significant and

play05:27

they also test against GPT 3.5

play05:30

and also they're going to test the rag

play05:31

technique so this ability to for it to

play05:33

sort of check against a database or you

play05:35

know you can think of it like if you

play05:36

have a like a cheat sheet during to

play05:38

which you can look to see which

play05:40

potential functions are available which

play05:41

is going to reduce hallucinations so

play05:43

they talk about llama 7 billion with rag

play05:46

with that sort of retrieval augmented

play05:48

generation so it seems like it didn't do

play05:50

too well so the performance was modest

play05:52

even though they gave it you know few

play05:53

shot learning so they gave it examples

play05:55

of how to do it it was slow and had a

play05:57

68% accuracy so so llama kind of gets a

play06:00

thumbs down GPT 3.5 with you know

play06:03

retrieval with it's little cheat sheet

play06:05

where can look up answers that one did

play06:07

pretty well it has an impressive

play06:09

accuracy of 98.0 N5 and the latency was

play06:13

significantly improved only 1.97 seconds

play06:16

so here's kind of the chart uh that you

play06:18

can see comparing all of them this is

play06:19

llama boy it did not do well but the

play06:22

thing to notice here is that obviously

play06:23

all of these other ones well I mean

play06:26

they're all pretty close and they're all

play06:28

pretty close to 100 % right none of them

play06:30

are at 100% but they're 98 97 98 99 99

play06:35

99 and then 98 with the

play06:38

99s with the three 99s being octopus 0

play06:42

octopus 1 and octopus 2 they beat all of

play06:45

the other ones including GPT 4 which got

play06:48

a

play06:58

98.57% models so octopus I guess zero

play07:03

this would be zero so they used so they

play07:05

either used full model training or they

play07:06

used Laura which is uh low rank

play07:09

adaptation so we we did a video a long

play07:11

time ago about it basically it's a way

play07:13

to kind of fine-tune and simplify models

play07:16

to use less parameters while keeping

play07:18

similar results so I guess you can like

play07:20

think of it kind of like a you know how

play07:22

sometimes you can make image files

play07:24

smaller without necessarily losing

play07:26

detail kind of think of it like that I

play07:28

guess and then they train uh some of

play07:30

them on 1,000 data points so sort of

play07:33

their data size was their data set size

play07:35

was 1,000 and they've also tried 500 for

play07:39

octopus 3 and then 1004 octopus 4 so it

play07:42

looks like that's why there was an

play07:43

actually a drop off for octopus 3

play07:46

perhaps maybe because of the smaller

play07:48

data set and then when they were testing

play07:50

GPT 4 interestingly so of course it

play07:52

exhibited Superior accuracy at 98.5 and

play07:55

even lower latency than GPT 3.5 even

play07:59

though is a bigger model and they're

play08:00

saying GPT 4's enhanced performance

play08:03

suggests open AI could be allocating

play08:05

more GPU resources to it or that

play08:07

experiences less demand compared to GPT

play08:09

3.5 that's interesting and this is sort

play08:11

of like the latency or you know how long

play08:14

it takes for it to run so obviously the

play08:16

higher the worse it is the lower the

play08:18

better so this is in seconds so as you

play08:20

can see again llama oh my God 13 plus

play08:22

seconds right then you have gbt 4 at

play08:25

just over 1 second but the octopus

play08:28

models they're at like a third of a

play08:30

second for most of them so 38

play08:34

3736 so lower the low rank uh adaptation

play08:37

so switching to lower training results

play08:39

in a minor accuracy decrease but it's

play08:41

still high enough that it's sufficiently

play08:43

robust for product deployment but the

play08:45

point of all this is is what does this

play08:47

all mean what is what is the importance

play08:49

of all this and that is simply that and

play08:52

we we've seen the same thing with

play08:53

Apple's research and now this is

play08:55

Stanford it looks like these on device

play08:58

AI agents the archit texture behind them

play09:00

doesn't have to be these massive models

play09:03

like GPT 4 with their 1.7 trillion

play09:06

parameters or whatever that exact number

play09:08

is they can be tiny and they can be very

play09:11

fast and they can be very inexpensive

play09:13

while still maintaining a lot of their

play09:15

accuracy it's interesting to think about

play09:17

because right now a lot of companies are

play09:19

betting that all the kind of forward

play09:21

progress will come from building up more

play09:23

chips right bigger power plants more

play09:26

parameters just more more more bigger

play09:29

meanwhile these two papers from Apple

play09:31

and Stanford are showing that tiny

play09:34

models tiny agents can be extremely

play09:37

effective at certain specific tasks you

play09:39

want an agent that does function calling

play09:41

well here's a tiny one that does better

play09:43

than GPT 4 you want something that can

play09:45

read your screen understand like all the

play09:47

words on your screen so it knows what to

play09:49

click on well here's a microscopic one

play09:52

right from Apple if I recall correctly

play09:53

it was like

play09:55

250 million parameters was the smallest

play09:57

one then it goes up to 3 billion for the

play10:00

biggest one right compare that to 1.7

play10:02

trillion GPT 4 right these outperform

play10:07

the massive one it's interesting because

play10:09

we can make the AI better by making it

play10:11

bigger we can make it better by making

play10:12

it smaller there doesn't seem to be a

play10:15

limit to where it can go

Rate This

5.0 / 5 (0 votes)

Related Tags
AI InnovationOn-Device ModelStanford ResearchGPT-4 ChallengedOctopus B2Language ProcessingData PrivacyEfficient AIEdge DevicesFunction Calling