New Llama 3 Model BEATS GPT and Claude with Function Calling!?

Cole Medin
21 Jul 202414:04

Summary

TLDRIn this video, the presenter explores the groundbreaking open-source Llama 3 model developed by Grok, which excels in function calling and challenges proprietary models like GPT. The script details a comparison between GPT and Llama 3 using an AI personal assistant for task management in Asana, demonstrating the impressive speed and accuracy of Llama 3. The presenter highlights the significance of this open-source model in promoting AI transparency and accessibility, marking a significant step forward for the community.

Takeaways

  • 🌟 The first open-source, large language model for function calling has been introduced by Grok, challenging proprietary models like GPT or Claude.
  • πŸ† Grok's Llama 3 model has achieved top rankings on the Berkeley function calling leaderboard, with both its 70 billion and 8 billion parameter versions performing exceptionally well.
  • πŸ”’ The 70 billion parameter Llama 3 model has a 90% accuracy, ranking it first on the leaderboard, while the 8 billion parameter version is only 1% less accurate, placing it third.
  • πŸ“Š The benchmarking for function calling is done through the Berkeley function calling leaderboard, which aims to represent real-world use cases for large language models.
  • πŸ› οΈ The video demonstrates using Grok's Llama 3 model with an AI personal assistant developed in the AI Master Class series for task management in Asana.
  • πŸ“ The script details a comparison between GPT and Llama 3, showcasing the process of changing the code to use the new model for function calling tasks.
  • πŸ”§ The AI agent is designed to interact with Asana on behalf of the user to manage projects and tasks, utilizing tools defined in the code.
  • ⏱️ The video shows that the Llama 3 model is notably faster than GPT in executing function calling tasks, although it may require additional confirmation steps.
  • πŸ—‚οΈ The Llama 3 model successfully replicates the task management operations that GPT performs, including creating tasks, marking them as complete, and deleting tasks.
  • πŸ€– The script highlights the potential of using local, open-source models as AI agents in workflows, emphasizing the importance of transparency and accessibility in AI.
  • πŸŽ‰ The video concludes by celebrating the success of the open-source Llama 3 model in performing function calling tasks, almost as effectively as proprietary models, marking a significant advancement for open-source AI.

Q & A

  • What major milestone has been achieved in the field of AI language models?

    -For the first time, the best large language model for function calling is an open-source model that can be run locally, breaking away from proprietary models like GPT or CLA.

  • Which company has developed their own version of Llama 3 for function calling?

    -A company called Gro has developed their own version of Llama 3, specifically designed for high performance in function calling.

  • How does Gro's Llama 3 model perform on the Berkeley function calling leaderboard?

    -Gro's Llama 3 model, both the 70 billion parameter version and the 8 billion parameter version, are ranked highly on the Berkeley function calling leaderboard, with the 70 billion parameter version being number one.

  • What is the significance of the 70 billion parameter version of Llama 3 being number one on the leaderboard?

    -The 70 billion parameter version of Llama 3 achieving a 90% accuracy on the leaderboard is significant as it demonstrates its superior performance in function calling compared to other AI models.

  • How does the 8 billion parameter version of Llama 3 compare to other models in terms of accuracy?

    -The 8 billion parameter version of Llama 3 is only 1% worse in overall accuracy compared to the 70 billion parameter version, making it a more efficient model in terms of size and performance.

  • What is the Berkeley function calling leaderboard and how is it used to benchmark AI models?

    -The Berkeley function calling leaderboard is a tool used to benchmark AI models based on their performance in function calling. It evaluates models based on how they are used in real-world scenarios like agents and enterprise workflows.

  • What AI personal assistant is being used in the video to test the Llama 3 model?

    -The AI personal assistant used in the video is one that the presenter has been developing in their AI Master Class video series, designed to help with task management.

  • How does the presenter plan to evaluate the effectiveness of the Gro Llama 3 model for function calling?

    -The presenter plans to evaluate the Gro Llama 3 model by comparing it to another powerful model, GPT 40, using the same AI agent for task management and observing their performance.

  • What tasks does the presenter assign to test the function calling capabilities of the AI models?

    -The presenter assigns tasks such as creating a project in Asana, adding steps as tasks with due dates, marking tasks as complete, deleting tasks, and adding new tasks to test the function calling capabilities of the AI models.

  • What are the key differences in performance between GPT and the Gro Llama 3 model observed in the video?

    -GPT is observed to handle tasks more smoothly and quickly, especially in understanding and executing multiple tasks without needing additional prompts. However, the Gro Llama 3 model, while slower, is still able to perform the tasks, demonstrating its effectiveness as an open-source model.

  • What is the presenter's final verdict on the Gro Llama 3 models in comparison to GPT?

    -While the presenter acknowledges that GPT is slightly better at handling tokens and executing tasks, they are impressed with the Gro Llama 3 models, especially considering they are open-source and perform almost as well as proprietary models like GPT.

Outlines

00:00

πŸš€ Introduction to Grok Llama 3: A New Benchmark Leader

The speaker introduces the groundbreaking news that Grok's Llama 3, an open-source model for function calling, has become the best performing model in this category, surpassing proprietary models like GPT. The blog post from Grok reveals that their 70 billion parameter version of Llama 3 leads the Berkeley Function Calling leaderboard, with a smaller 8 billion parameter version also performing exceptionally well, coming in third place.

05:02

πŸ” GPT vs Grok Llama 3: A Function Calling Showdown

The speaker sets up an experiment to compare GPT and Grok Llama 3 in task management. Using a task management AI agent, they ask GPT to list ten steps for creating an AI agent application, create a project in Asana, and add tasks for each step. The experiment demonstrates GPT's capability in invoking various tools and handling complex instructions seamlessly, with tasks created and managed efficiently in Asana.

10:04

βš–οΈ Testing Grok Llama 3: Performance and Capabilities

The speaker transitions to testing the Grok Llama 3 model, noting its impressive speed compared to GPT, albeit with some limitations. Despite a slower performance in certain tasks and needing additional prompts, Grok Llama 3 successfully handles the creation, modification, and deletion of tasks in Asana. This showcases the model's potential as a strong open-source alternative, albeit with some areas for improvement. The speaker highlights the significance of these advancements for open-source AI and encourages viewers to explore using local models in their workflows.

Mindmap

Keywords

πŸ’‘Open Source Model

An open source model refers to a type of software or algorithm whose source code is available to the public, allowing anyone to view, modify, and distribute it. In the context of the video, the open source model is the 'Llama 3' developed by Grok, which is designed for function calling and is positioned as a non-proprietary alternative to models like GPT or Claude. The significance of this model is highlighted by its high performance on the Berkeley function calling leaderboard.

πŸ’‘Function Calling

Function calling in the context of AI refers to the ability of a language model to execute specific functions or tasks, often by interacting with other software or systems. The video discusses the Llama 3 model's prowess in function calling, as demonstrated by its performance on benchmarks and its ability to perform complex tasks within Asana, a task management software.

πŸ’‘Grok

Grok is a company mentioned in the script that builds infrastructure to help users work with local AI models. They have developed their version of the Llama 3 model, which is optimized for function calling. The video script discusses Grok's contribution to the field of AI by providing an open source alternative to proprietary models.

πŸ’‘Benchmarks

Benchmarks in the context of the video refer to standardized tests or measurements used to evaluate the performance of AI models, particularly in the area of function calling. The Berkeley function calling leaderboard is an example of a benchmark mentioned, where the Llama 3 model has achieved high rankings.

πŸ’‘AI Personal Assistant

An AI personal assistant is a software agent that uses AI to perform tasks or provide assistance to users. In the video, the AI personal assistant is developed to manage tasks in Asana, and it is tested with both GPT and the Llama 3 model to evaluate their effectiveness in function calling.

πŸ’‘Asana

Asana is a task management software used in the script to test the function calling capabilities of the AI models. The AI personal assistant interacts with Asana to create projects, add tasks, and manage due dates, demonstrating the practical application of AI in workflow management.

πŸ’‘GPT

GPT, or Generative Pre-trained Transformer, is a type of AI language model developed by OpenAI. In the video, GPT is used as a point of comparison for the Llama 3 model, showcasing its capabilities in function calling within the context of task management in Asana.

πŸ’‘Parameter

In the context of AI models, a parameter refers to a variable that the model learns during training to make predictions or decisions. The script mentions different versions of the Llama 3 model with varying numbers of parameters, such as the 70 billion parameter version and the 8 billion parameter version, indicating different capacities for learning and performance.

πŸ’‘Accuracy

Accuracy in the context of AI models refers to the correctness of their predictions or outputs. The video script discusses the high accuracy of the Llama 3 model on the function calling leaderboard, with a 90% accuracy for the 70 billion parameter version, indicating its effectiveness.

πŸ’‘Local Model

A local model is an AI model that runs on a user's own device or server rather than relying on cloud-based services. The video emphasizes the benefits of using a local model like Llama 3, including the ability to run it without internet connectivity and potential privacy advantages.

πŸ’‘LangChain

LangChain is a tool mentioned in the script for building chatbots and integrating them with AI models. It simplifies the process of using AI models like Grok's Llama 3 for function calling, as demonstrated when the script describes switching from GPT to Llama 3 for the task management AI agent.

Highlights

For the first time, the best large language model for function calling is an open source model.

The open source model can be run locally, unlike proprietary models like GPT or CLA.

Grock, a company that builds infrastructure for AI, has developed their own version of Llama 3 for function calling.

Llama 3 is outperforming every other AI model in function calling benchmarks.

The 70 billion parameter version of Llama 3 is leading the Berkeley function calling leaderboard with 90% accuracy.

The 8 billion parameter version of Llama 3 is only 1% less accurate and ranks third on the leaderboard.

The Berkeley function calling leaderboard is a benchmark that represents typical use cases for function calling in AI.

The AI personal assistant developed in the AI Master Class video will be used to test the Llama 3 model's function calling capabilities.

The task management agent will be used to manage tasks in Asana, a task management software.

The testing will involve creating a project in Asana and adding tasks based on a list of steps provided by the AI.

GPT 40 will be used as a comparison model to evaluate the effectiveness of the Llama 3 model.

The code for the AI agent is available in a GitHub repo linked in the video description.

GPT successfully created a project in Asana and added tasks based on the provided steps.

The Llama 3 model was faster than GPT in creating tasks in Asana, but required manual input for due dates.

The Llama 3 model successfully added, updated, and deleted tasks in Asana, demonstrating its function calling capabilities.

The Llama 3 model, while not as powerful as GPT, showed significant performance in function calling, especially for an open source model.

The success of the Llama 3 model is a major step forward for open source and local AI models, offering transparency and accessibility.

The demonstration shows that open source models can compete with proprietary models in practical applications like task management.

Transcripts

play00:00

this week history has been made for the

play00:02

very first time ever the best large

play00:04

language model for function calling is

play00:06

an open source model that you can run

play00:08

locally it's no longer a proprietary

play00:10

model like a GPT or CLA grock an AI

play00:14

company that builds infrastructure to

play00:15

help you work with any local model has

play00:18

recently developed their own version of

play00:19

llama 3 which is specifically designed

play00:22

for function calling and this thing is

play00:25

absolutely insane it's crushing it on

play00:27

the benchmarks beating every single AI

play00:29

model with function calling and so today

play00:31

I'm going to show you guys exactly how

play00:33

to use this model and we're going to do

play00:35

some testing to really see if this thing

play00:36

is as good as the benchmarks say it is

play00:39

all right so here we have the blog post

play00:41

from grock where they unveiled these

play00:42

llama 3 models that have specifically

play00:44

been designed for high performance for

play00:46

function calling now the first big

play00:48

question that I had when I heard about

play00:49

this because honestly it seems too good

play00:51

to be true is how can they actually say

play00:53

that their version of the Llama 3 model

play00:55

is the best at function calling the way

play00:58

that they're benchmarking this is with

play01:00

the Berkeley function calling leader

play01:01

board and we'll dive into this in just a

play01:03

second here but one thing that I wanted

play01:04

to call out really quickly from this

play01:06

article first of all there's 70 billion

play01:08

parameter version of their llama 3 is

play01:10

number one on this leaderboard right now

play01:12

which is really cool it's got a 90%

play01:14

accuracy um I mean that's the big deal

play01:16

right now but one thing that I find even

play01:18

more interesting honestly is their 8

play01:20

billion parameter version of their llama

play01:22

3 is only 1% worse for overall accuracy

play01:25

this much smaller model and is number

play01:27

three on the leaderboard so it's beating

play01:29

out all GPT models and every single

play01:32

Cloud Model except 3.5 Sonet with

play01:34

function calling right now 3.5 Sonet as

play01:37

you can kind of guess from what I just

play01:38

said is number two on the leaderboard so

play01:41

we can actually go over and take a look

play01:42

at this Berkeley function calling

play01:43

leaderboard right now this is not

play01:45

updated with Gro llama 3s at this point

play01:48

um but we had 3.5 Sona in first before

play01:50

the updates and then gbt 4 and Claw 3

play01:53

Opus which is super cool now just

play01:55

looking at this initially it's a little

play01:56

vag like what do these accuracies and

play01:58

rankings really mean um if you want to

play02:01

though you can read up on everything

play02:03

that goes into this leaderboard and how

play02:05

they do their benchmarking with function

play02:07

calling um so just a little bit here

play02:09

they are trying to be very

play02:10

representative of most users use cases

play02:13

with function calling and they call out

play02:15

things like agents and Enterprise

play02:17

workflows and so they're really trying

play02:19

to model their evaluations based on how

play02:22

people actually use large language

play02:24

models for function calling and so I've

play02:26

spent some time diving into this and it

play02:27

really does seem accurate but what we're

play02:29

going to do now is we're going to

play02:31

actually dive into using this new gro

play02:33

llama 3 model for function calling and

play02:35

see how accurate it actually is and so

play02:37

we're going to use the AI personal

play02:39

assistant that I've been developing in

play02:41

my AI Master Class video and we're going

play02:43

to use it with this grock Lama 3 Model

play02:46

to see how well it can help me with my

play02:48

task management and so let's go ahead

play02:50

and dive into comparison first starting

play02:52

with GPT and then trying out with this

play02:54

new llama 3 Model so in order to truly

play02:57

evaluate the effectiveness of this new

play02:59

gro llama 3 model for function calling

play03:01

we need to compare it to another

play03:03

powerful model using the same AI agent

play03:06

and so the model that I'm going with

play03:08

here is GPT 40 and the agent that I'm

play03:11

going with is this task management agent

play03:13

like an AI personal assistant that I've

play03:15

been developing in my AI agents Master

play03:18

Class series here on my channel and so

play03:20

this agent it helps me manage my tasks

play03:23

in AA which is my favorite task

play03:25

management software there's a UI for

play03:28

this as well with streamlet and it uses

play03:29

a a lot of Cool Tools like Lane chain to

play03:31

build this up really really nice and

play03:33

easily and so if you're curious about

play03:34

any of those things you can check out

play03:35

other videos on my Channel or in the

play03:37

master class series but I'm just going

play03:39

to go over this code really quickly here

play03:40

and then we'll dive into testing it out

play03:41

with GPT then I'll show you how to

play03:43

change it to use the gro llama 3 model

play03:45

and we'll test it out there as well and

play03:48

so really quickly here the link to this

play03:49

code is in the description of the video

play03:51

in a GitHub repo so you can check it out

play03:53

if you want but I'm just going to go

play03:54

over this in a really high level right

play03:55

now so first of all we have a section

play03:57

that defines all the tools that we're

play03:59

giving the agent to interact with a SAA

play04:01

on my behalf to manage projects and to

play04:04

manage tasks and so here are all the

play04:06

tools and then we get into the next

play04:09

section which is the function to

play04:10

actually interact with our AI agent and

play04:13

so I build up the chatbot and bind all

play04:15

the tools to it and then handle all the

play04:17

prompting here and also handling any of

play04:19

the tool calling that comes up when the

play04:21

AI wants to invoke a tool as an agent

play04:24

next up we have the main function and

play04:26

this is just where we Define everything

play04:28

with a streamlet UI so I can interact

play04:29

act with my AI in the browser and have

play04:32

it manage tasks just through natural

play04:34

language that I spit at it uh through

play04:36

the chat component and so that is

play04:38

everything for this AI agent now let's

play04:40

go ahead and see how well it does with

play04:42

gp4 all right so here we are in the

play04:44

Streamlight UI for the task management

play04:47

AI agent that we have running with GPT

play04:50

right now the way that I ran this script

play04:52

is I just ran the command streamlet run

play04:54

in the name of the Python script that I

play04:56

just showed you you do that in a

play04:57

terminal and then it'll give you this UI

play04:59

in the browser for you to interact with

play05:01

your agent and so what I'm going to do

play05:03

right now to test how good GPT is with

play05:05

function calling is I'm going to give it

play05:07

a very difficult task where it needs to

play05:09

invoke many different tools to interact

play05:11

with a SAA to do something rather

play05:13

complex for me and then we'll test the

play05:15

exact same thing with the grock Llama 3

play05:17

model and so I'm going to start out with

play05:19

a very simple question I'm going to say

play05:21

give me the 10 steps to create an AI

play05:23

agent application and so basically I'm

play05:26

just having GPT start out by doing a

play05:27

little bit of research for me so it'll

play05:29

give me the top 10 steps to make an AI

play05:31

Agent app it's a little vag but we're

play05:33

just doing this as an example and then

play05:35

what I'm going to do is I'm going to say

play05:37

okay

play05:38

great now create a project in ass sauna

play05:43

called I'll just say like AI Agent app

play05:46

and add each step as a task that is due

play05:50

by Friday all right so now we are

play05:52

kicking off many different things behind

play05:54

the scenes where GPT has to know to

play05:57

invoke the tool to create a project and

play05:59

then go go into it and create tasks for

play06:01

every single step so it has to also

play06:03

understand the due date that I gave and

play06:05

its previous response to be able to pick

play06:07

out each of those tasks and turn them

play06:09

into a nice little title for me for each

play06:12

task and so it's going to take a little

play06:14

bit here because it has to invoke every

play06:15

single one of those tools um but I'm

play06:17

specifically letting it go here and not

play06:19

just pausing and coming back when it's

play06:21

done because I want to show the speed

play06:23

here and also compare that to the grock

play06:25

Llama 3 Model so here we go I've created

play06:28

a project in AA called a Agent app and

play06:31

I've added each of these tasks and it

play06:32

gives the links as well so that worked

play06:34

flawlessly that is awesome and so now

play06:36

I'm going to do a couple of other little

play06:38

tests here and then we'll go and

play06:40

actually check it out in AA so first

play06:41

I'll say nice I have finished um

play06:46

defining the purpose and scope I don't

play06:50

spell it right but that's totally fine

play06:51

because I wanted to mark this task as

play06:53

complete all right it has marked it as

play06:56

complete nice and I'll say I'll just do

play06:58

another test where I want it to delete a

play06:59

task I'll say I actually don't want to

play07:03

test the application I do not recommend

play07:06

this but this is just a test here

play07:07

because I want to remove this task uh

play07:09

there we go it's removed it all right

play07:11

nice and now I'm going to test adding

play07:12

another task I'll say instead I want to

play07:16

hire someone to test my app so I wanted

play07:19

to add that as a task instead oh nice

play07:23

okay so before it even adds a task it

play07:24

asks me for the due date which is really

play07:26

good so I'll say Saturday all right

play07:29

added in by

play07:31

Saturday so now it's thinking here we go

play07:34

yep hire someone to test the app here we

play07:36

go all right so now let's going to ASA

play07:37

and actually check out and make sure

play07:39

that all these things worked as the bot

play07:41

told me it did so here we go over to ASA

play07:44

we've got a new project called AI Agent

play07:46

app I click into this and then boom here

play07:48

we go we got a task for every single one

play07:50

of the steps to build an AI Agent app

play07:52

toine the purpose and scope is complete

play07:55

we don't see test the application

play07:56

anymore and we do have a new task

play07:58

created that is due by Saturday to hire

play08:00

someone to test the app and this is new

play08:03

in two Saturdays from now which is also

play08:05

nice that it that it determined that so

play08:07

everything worked great now we're going

play08:09

to go over to the grock Llama 3 model

play08:12

and see if it can do this just as well

play08:14

or maybe even better or faster so let's

play08:17

go ahead and dive into how we change the

play08:18

code to do that all right so I'm going

play08:20

to spend just a minute going over the

play08:22

changes that it takes to use the grock

play08:24

Llama 3 model and then we'll go ahead

play08:26

and test this one just like we did with

play08:27

GPT to see how it fares with function

play08:29

calling and so the first thing is I'm

play08:31

going to import a new module from Lang

play08:33

Chen Gro where it's just chat grock and

play08:36

we'll use this to instantiate a grock

play08:38

model for our chatbot instead of an open

play08:40

AI one and then for our model that we

play08:44

have finded through the environment

play08:45

variables we're going to have a default

play08:47

here of the Llama 3 grock 70 billion

play08:50

parameter version and so with that all

play08:53

the tools are going to be exactly the

play08:54

same so all this code is going to be

play08:56

very very very similar the only

play08:57

difference here is instead of using a a

play08:59

chat open AI object to instantiate the

play09:01

chat bot we're going to use chat grock

play09:03

passing in that grock llama 3 70 billion

play09:06

parameter model you could even test this

play09:08

with the 8 billion one as well because

play09:09

that one is apparently number three on

play09:11

the benchmarks and so that'd be cool to

play09:13

play with as well and that is all the

play09:15

changes that you have to use using Lang

play09:17

chain to work with grock is so so easy

play09:19

they have documentation for how you can

play09:21

use grock without Lang chain but this

play09:24

just makes it so simple so with that

play09:25

let's go ahead and test out this new

play09:27

grock llama 3 Model all right so here we

play09:29

are in the streamlet UI for the task

play09:31

management AI agent again but this time

play09:34

powered by the grock Llama 3 for

play09:35

function calling and so I'm just going

play09:37

to go ahead and go through the exact

play09:38

same process as before and right off the

play09:41

bat you can see this thing is so

play09:42

freaking fast compared to gbt which is

play09:44

so cool it doesn't have the streaming

play09:46

effect like the typewriter effect that

play09:47

gbt has but I still appreciate the speed

play09:50

a ton and so now with that I'm just

play09:52

going to go ahead and give it a request

play09:55

to do all the things in a sauna like we

play09:57

did in GPT and so right off the bat uh

play10:01

it's asking us to confirm the exact date

play10:04

for Friday okay so that's a little weird

play10:06

and I think it's just because llama 3

play10:07

isn't as powerful as GPT but I'll say uh

play10:10

Friday is and then I'll actually check

play10:12

my calendar really quickly here uh

play10:13

Friday is the 26th all right so let's

play10:17

see if I can take this and run with it

play10:19

to add the due dates and add all these

play10:21

tasks into the new AI Agent app project

play10:24

so it's going to take a little bit

play10:27

because even though grock is really fast

play10:28

I think there's a little a little bit of

play10:29

rate limiting because I'm using the free

play10:31

tier and so it'll make one task and then

play10:33

it'll prompt itself again to make the

play10:34

next task and that starts to kind of

play10:36

rate limit itself and so I'm going to

play10:38

come back when this is done oh actually

play10:40

never mind there we go all the tasks for

play10:42

AI Agent app have been added

play10:44

successfully and are due by Friday that

play10:46

is perfect okay so it took a little bit

play10:49

to get it there I had to give it a date

play10:51

when I didn't have to give that to GPT

play10:53

but this is still pretty cool the fact

play10:54

that a local model can do this an open

play10:56

source model is freaking insane and so

play10:58

now I'm going to go ahead and give it

play11:00

another request I'll say I have finished

play11:03

um let's see I'm going to say I have

play11:06

finished choosing a programming language

play11:10

and Dev environment because I want it to

play11:13

actually mark this task as

play11:15

complete that is interesting it seems

play11:17

there is an error updating the task the

play11:19

task ID you provided is not valid I'll

play11:22

say no you need to look up the task IDs

play11:27

I don't want to have to give that to the

play11:29

model it needs to be able to determine

play11:31

that itself just like GPT did um okay

play11:34

Define the problem has been updated

play11:36

successfully okay I don't even know if

play11:38

that's the right one so it's not doing

play11:40

the best here but I'll I'll test it out

play11:42

a little bit more here create a new task

play11:45

to um hire out a Dev let's see if it can

play11:49

make a new task in this project to hire

play11:51

out a developer for me hopefully he can

play11:53

do this one fine we'll see what happens

play11:55

it's taking a sweet time here not really

play11:58

sure why this one should be pretty quick

play12:00

it seems like gbt is actually faster and

play12:02

invoking tools somehow but here we go

play12:04

the task hire developer has been created

play12:05

successfully um let's do one more test

play12:08

here where I will I'll um delete the

play12:11

task test the AI model I don't want this

play12:14

anymore let's see if we can get rid of

play12:15

it fine and then I'll go over to a sauna

play12:18

after this and verify that everything

play12:20

actually looks the way it should based

play12:21

on what llama 3 told me in this

play12:24

conversation so I'll just give it a

play12:25

little bit of time to delete that task

play12:28

and then we'll swap over over to assana

play12:30

all right so it has successfully deleted

play12:33

the task for me and now let's go over to

play12:34

Asana and check this out so I deleted

play12:37

the AI Agent app project from GPT now

play12:40

this is the only one that was now

play12:42

created by llama 3 so I'll click into

play12:44

this we'll see how it looks okay so hire

play12:47

developer has been added all the other

play12:48

tasks are here it has checked off to

play12:51

find the problem or task and I don't see

play12:53

test the AI app anymore so there we go

play12:56

it successfully did everything that gbt

play12:58

did it took a little bit more to get it

play12:59

there but it did work and so that is a

play13:02

huge victory for open source and local

play13:04

models so I honestly can't say I'm a

play13:07

100% impressed with these grock llama 3

play13:09

models for function calling because

play13:11

they're not quite as good as GPT I think

play13:13

mostly just because GPT is able to

play13:16

handle a bunch of tokens a lot better

play13:18

but still it's insane how well this

play13:20

model is doing compared to other local

play13:23

and open source models I didn't even

play13:24

want to compare it to a Bas llama 3 or

play13:27

Microsoft 5 for example because those

play13:29

fall apart so bad it wouldn't even be a

play13:31

good demonstration so that's why I

play13:33

compared it to GPT and it was almost as

play13:35

good which is a huge victory for open-

play13:38

Source models if you're an advocate for

play13:40

transparency in AI or making AI

play13:42

accessible for anybody then this is what

play13:45

you want to be rooting for these models

play13:47

getting almost as good as proprietary

play13:49

ones is a big step forward so I'm stoked

play13:52

with this I hope that you can take this

play13:54

knowledge that I've given you and apply

play13:55

it to add local models as AI agents in

play13:57

your workflows if you found this useful

play14:00

in anyway I'd really appreciate a like

play14:01

and a subscribe and with that I will see

play14:03

you in the next one

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Llama 3Function CallingAI ModelsOpen SourceBenchmarksTask ManagementAI AgentsGPT ComparisonLocal ModelsAI Transparency