Llama 3.2 is INSANE - But Does it Beat GPT as an AI Agent?

Cole Medin
29 Sept 202416:29

Summary

TLDRThe video introduces Meta's latest LLaMA 3.2 language models, offering versions with 1 to 90 billion parameters. The creator compares the performance of LLaMA 3.2 90B with GPT-4 Mini, showcasing their ability to handle various AI agent tasks such as task management and file handling. LLaMA 3.2 performs well but falls short in certain tasks like Google Drive integration, where GPT-4 Mini outperforms. Despite limitations, LLaMA 3.2 represents significant progress for local models, especially in function calling, and is promising for future AI agent developments.

Takeaways

  • ๐Ÿค– Meta recently released their latest suite of large language models, LLaMA 3.2, with versions ranging from 1 billion to 90 billion parameters.
  • ๐Ÿ–ฅ๏ธ LLaMA 3.2 models can be run on a wide range of hardware, supporting diverse generative AI use cases.
  • ๐Ÿ“Š The 90 billion parameter version of LLaMA 3.2 shows impressive benchmark results, comparable to GPT-4, even outperforming it in some areas.
  • ๐ŸŽฏ Local LLMs, such as LLaMA 3.2, have typically struggled with function calling (AI agent capabilities), but advancements in LLaMA 3.2 bring them closer to improving in this area.
  • ๐Ÿ”ง The script creator developed an AI agent using LangChain and LangGraph, which allows for testing different models like LLaMA 3.2 and GPT-4 Mini by switching environment variables easily.
  • ๐Ÿง  While GPT-4 Mini performs complex tasks like function calling, creating tasks, searching Google Drive, and handling RAG (retrieval-augmented generation), LLaMA 3.2 performs comparably but still has limitations, particularly with complex tool calls.
  • ๐Ÿš€ LLaMA 3.2โ€™s AI agent capabilities show improvement but are still not as efficient as GPT-4 Mini in handling complex, multi-step tool calls.
  • ๐Ÿ“‚ The testing showed that while GPT-4 Mini handled various tool calls flawlessly, LLaMA 3.2 struggled with more intricate tasks like formatting Google Drive search queries.
  • ๐Ÿ”„ LLaMA 3.2 handles retrieval tasks (RAG) well, but for tool calling, it doesnโ€™t match GPT-4 Miniโ€™s performance, especially with complex workflows like downloading files and adding them to a knowledge base.
  • ๐Ÿ“ˆ Overall, LLaMA 3.2 represents a significant step forward for local LLMs in terms of AI agent performance, but further improvements are needed to match cloud-based models like GPT-4.

Q & A

  • What are the parameter sizes available for Llama 3.2?

    -Llama 3.2 is available in four parameter sizes: 1 billion, 3 billion, 11 billion, and 90 billion.

  • How does Llama 3.2 90B compare to GPT 40 Mini in terms of performance?

    -Llama 3.2 90B is considered comparable to GPT 40 Mini, performing well in many benchmarks, and even surpassing it in some cases. However, in the specific context of AI agents and function calling, GPT 40 Mini still performs better.

  • Why are local LLMs like Llama 3.2 important for some use cases?

    -Local LLMs allow users to run models on their own hardware without relying on external APIs. This is particularly important for users with privacy concerns or requirements for local processing due to data sensitivity.

  • What is a current limitation of local LLMs when used as AI agents?

    -Local LLMs have generally struggled with function calling, which is necessary for AI agents to perform tasks beyond generating text, such as sending emails, interacting with databases, and more.

  • What tools are used in the custom AI agent implementation described in the video?

    -The AI agent is built using LangChain and LangGraph, with integration for tools such as Asana (for task management), Google Drive (for file management), and a local Chroma instance (for vector database and retrieval-augmented generation).

  • What issue did Llama 3.2 90B encounter during the Google Drive test?

    -Llama 3.2 90B failed to properly format the search query when asked to retrieve a file from Google Drive, resulting in an incorrect tool call. It did not perform as well as GPT 40 Mini in this test.

  • How does the AI agent determine whether to invoke a tool?

    -The AI agent uses a router that checks if the LLM requests a tool call. If so, the agent invokes the tool and continues the process, looping back to the LLM for further instructions if necessary.

  • How did GPT 40 Mini perform with a more complex multi-step task involving Google Drive?

    -GPT 40 Mini was able to search for and download a specific file from Google Drive, add it to a knowledge base, and then use retrieval-augmented generation (RAG) to answer a query based on the file's content, although it downloaded the file multiple times unnecessarily.

  • What specific improvement does Llama 3.2 bring over Llama 3.1 in terms of function calling?

    -Llama 3.2, especially the 90B version, shows significant improvement in function calling over Llama 3.1, which struggled with this capability even at the 70B parameter size. However, it still does not reach the level of GPT 40 Mini.

  • What potential does the developer see for local LLMs as AI agents in the future?

    -The developer is optimistic that local LLMs will eventually excel in function calling and become highly capable AI agents. Once a local model reliably handles function calls, it will be a 'game-changer' for many applications.

Outlines

00:00

๐Ÿš€ Meta Releases LLaMA 3.2: A New Step in Generative AI

Meta has just launched its latest suite of large language models, LLaMA 3.2, with models of varying sizes (1B, 3B, 11B, and 90B parameters). The 90B model is especially impressive, approaching the performance of GPT-4 and sometimes surpassing it. This release is significant for those who need local large language models (LLMs) as LLaMA 3.2 supports a wide range of hardware. While local LLMs have struggled with tasks requiring function calling or interacting with tools, LLaMA 3.2 brings excitement as it may be a step toward improving these capabilities.

05:01

๐Ÿ’ป Testing LLaMA 3.2 with AI Agents

The speaker shares their excitement about testing LLaMA 3.2, specifically its ability to function as an AI agent and perform function calling. They have built a custom AI agent using Langchain and Langgraph for testing different LLMs. The video will compare LLaMA 3.2 (90B version) against GPT-4 Mini, exploring how the models interact with tools like sending emails or accessing databases. The speaker outlines their AI agent's setup, including its ability to swap between LLMs without changing code, simplifying the process of testing various models.

10:01

๐Ÿ›  Tools and Capabilities of the AI Agent

The speaker describes the tools that their AI agent uses, such as task management with Asana, file management with Google Drive, and vector database integration for retrieval-augmented generation (RAG). They explain how these tools are incorporated into the model and demonstrate basic tasks like creating and managing Asana projects, as well as using Google Drive for document management. The goal is to test how well the AI can use these tools to perform complex tasks, starting with basic operations and gradually increasing complexity.

15:02

๐Ÿ“Š GPT-4 Mini: AI Agent Tool Performance

The speaker tests GPT-4 Mini as an AI agent by running several tool-related tasks. It performs well, successfully managing tasks in Asana, searching and downloading files from Google Drive, and adding documents to the AI's knowledge base for retrieval. However, there is one issue: GPT-4 Mini downloads the same file multiple times from Google Drive, though it still completes the task correctly. Overall, GPT-4 Mini impresses with its capabilities, showcasing strong function-calling abilities for various real-world applications.

๐Ÿค– LLaMA 3.2: Performance Compared to GPT-4 Mini

LLaMA 3.2 (90B) is now put to the same test as GPT-4 Mini. It performs similarly well in simpler tasks like creating tasks in Asana but falters when interacting with Google Drive, failing to format a search query correctly. While LLaMA 3.2 performs well in retrieval-augmented generation (RAG), it struggles with more complex tool invocations, unlike GPT-4 Mini. Despite this, LLaMA 3.2 represents progress for local models in function calling, though it still falls short of the capabilities demonstrated by GPT-4 Mini.

๐Ÿ“ˆ Improvements and Future Prospects for Local LLMs

The speaker concludes that while LLaMA 3.2 does not outperform GPT-4 Mini in AI agent tasks, it is a significant improvement over previous versions like LLaMA 3.1. Its performance in function calling, although not perfect, signals progress for open-source and local models. Fine-tuning and improving tool instructions could further enhance LLaMA 3.2's abilities, showing promise for future advancements. The speaker remains optimistic about local LLMs eventually reaching parity with leading cloud-based models for AI agent use cases.

Mindmap

Keywords

๐Ÿ’กLLaMA 3.2

LLaMA 3.2 is Meta's latest suite of large language models, including versions with 1 billion, 3 billion, 11 billion, and 90 billion parameters. These models are designed to handle a wide range of hardware and AI use cases. The video's theme revolves around testing LLaMA 3.2's performance in AI agent scenarios, particularly its ability to perform function calling, which is critical for executing tasks beyond text generation.

๐Ÿ’กGPT 40 Mini

GPT 40 Mini is another large language model being compared to LLaMA 3.2 in the video. The speaker tests both models to see how well they function as AI agents, particularly in their ability to perform function calling. GPT 40 Mini serves as the benchmark for LLaMA 3.2โ€™s capabilities, with the 90 billion parameter version of LLaMA 3.2 being directly compared to GPT 40 Mini.

๐Ÿ’กFunction Calling

Function calling, also known as tool calling, allows large language models to perform tasks beyond generating text, such as sending emails, managing tasks in external applications, or querying databases. The speaker emphasizes the importance of function calling for AI agents and tests whether LLaMA 3.2 has improved in this area compared to previous versions. Successful function calling is key to an LLM acting as a practical AI assistant.

๐Ÿ’กLangChain

LangChain is a framework mentioned in the video that is used to build AI applications that can integrate with multiple language models and tools. The speaker uses LangChain in combination with LangGraph to create a custom-coded AI agent that can swap between different LLMs, such as LLaMA 3.2 and GPT 40 Mini, depending on the environment variables set in the code. It plays a central role in enabling the dynamic tool usage and function calling tests.

๐Ÿ’กLangGraph

LangGraph is used in conjunction with LangChain to set up a complex but flexible AI agent that can handle multiple models and tools. The speaker explains how LangGraph helps manage different LLMs and dynamically switch between them based on environment variables. The tool's ability to route tasks to the correct models and handle state across different actions is crucial for testing AI agents.

๐Ÿ’กAI Agent

An AI agent refers to an AI system that can perform tasks or take actions on behalf of a user, such as managing files, scheduling tasks, or querying knowledge bases. The main focus of the video is to evaluate whether LLaMA 3.2 can function as an effective AI agent, especially in comparison to GPT 40 Mini. An AI agent must be able to use function calling to interact with external tools and systems, which is a core part of the test.

๐Ÿ’กAsana

Asana is a task management platform used as an example of tool integration in the video. The speaker demonstrates how both LLaMA 3.2 and GPT 40 Mini can interact with Asana to create tasks or retrieve project information. This is one of the external tools that the AI agent is expected to interface with as part of the function calling test.

๐Ÿ’กGoogle Drive

Google Drive is another tool integrated into the AI agent setup for testing. The speaker demonstrates how the models can search, download, and manage files from Google Drive. However, LLaMA 3.2 struggles with formatting the search query correctly for Google Drive, which shows a limitation in its tool-calling ability compared to GPT 40 Mini.

๐Ÿ’กRetrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a technique where an AI model retrieves external knowledge (e.g., from a database or knowledge base) to generate more informed responses. In the video, the speaker tests the models' ability to use RAG by querying a local Chroma instance with meeting notes stored in a vector database. LLaMA 3.2 and GPT 40 Mini are tested on how well they can retrieve and use this information to answer questions.

๐Ÿ’กKnowledge Base

A knowledge base in this context refers to a collection of documents or data that the AI agent can retrieve information from using RAG. The video shows how the AI agents, specifically GPT 40 Mini and LLaMA 3.2, can add documents to the knowledge base and query it to answer questions about the content. Efficient interaction with a knowledge base is critical for making the AI agent useful in real-world applications.

Highlights

Meta releases LLaMA 3.2, a new suite of large language models with 1B, 3B, 11B, and 90B parameter versions, showing impressive performance across generative AI use cases.

The 90B parameter version of LLaMA 3.2 competes with GPT-4, matching or outperforming it in some benchmarks, which is notable for a locally runnable model.

LLaMA 3.2 supports a wide range of hardware and use cases, offering more accessibility for developers working with generative AI.

Local LLMs historically struggled with function calling, but this release shows promising improvements for AI agents.

The video tests whether LLaMA 3.2 can handle AI agent tasks, such as tool calls and automation, and compares it with GPT-4 mini in these areas.

A custom-coded AI agent using LangChain and LangGraph is used to test multiple LLMs, showcasing flexibility in swapping models and managing tools.

The AI agent setup includes integration with tools like Asana for task management, Google Drive for file operations, and a local Chroma instance for RAG (retrieval-augmented generation).

The 90B parameter version of LLaMA 3.2 is the only one in the suite that supports function calling effectively, while smaller versions fail in this aspect.

GPT-4 mini performs well in executing tool calls, including creating Asana tasks, downloading files from Google Drive, and integrating with the agentโ€™s knowledge base using RAG.

LLaMA 3.2 fails in more complex tool calls, such as searching and downloading files from Google Drive, highlighting areas where it lags behind GPT-4 mini.

LLaMA 3.2 is still useful for simpler AI agent tasks and retrieval-augmented generation but doesn't perform as well in more complex, multi-step operations.

The demonstration shows how AI agents can manage loops in tool calls, invoking multiple tools in a sequence before providing the final response.

GPT-4 mini can manage more complex tasks such as downloading, processing, and querying documents, showing superior performance over LLaMA 3.2.

The custom AI agent setup dynamically switches between models like GPT, LLaMA, and Anthropic, allowing for easy testing and comparison.

While LLaMA 3.2 90B shows significant improvements over LLaMA 3.1 for function calling, it still doesn't match the overall performance of GPT-4 mini.

This step forward for local LLMs like LLaMA 3.2 indicates potential future breakthroughs for open-source models as AI agents, though GPT models still lead in complex applications.

Transcripts

play00:00

just a couple days ago meta released

play00:01

their latest Suite of large language

play00:03

models llama 3.2 and it is super

play00:06

exciting because they released a 1

play00:08

billion 3 billion 11 billion and 90

play00:12

billion parameter versions you can run

play00:13

llama 3.2 on a very wide range of

play00:17

Hardware with a big variety of

play00:18

generative AI use cases and the

play00:20

benchmarks for llama 3.2 are looking

play00:23

really really impressive the 90 billion

play00:26

parameter version is even getting up to

play00:28

the performance of GPT 40 Min and

play00:30

beating it in some ways which is super

play00:33

impressive for a model that you can just

play00:34

download and run yourself the progress

play00:37

for local llms is really promising and

play00:40

specifically if you have a requirement

play00:42

to use local llms for your use case this

play00:44

is extra fantastic news now local llms

play00:48

have generally not been good as AI

play00:50

agents because they don't do well with

play00:51

function calling otherwise known as tool

play00:54

calling and this is what enables llms to

play00:56

actually do things for you besides just

play00:58

generating text things like sending

play01:00

emails chatting in slack interacting

play01:02

with a knowledge base that sort of thing

play01:04

and that is really really valuable I

play01:07

can't wait for the day when local llms

play01:09

are actually fantastic as AI agents and

play01:11

can do function calling well so when

play01:14

llama 3.2 was released I was really

play01:16

really excited to test it out and see if

play01:18

we've gotten any closer to that point

play01:19

whatever local llm is able to First do

play01:22

function calling reliably is going to be

play01:24

an absolute GameChanger and so you and I

play01:27

get to figure out right now if llama 3.2

play01:30

is that model So today we're going to

play01:32

test out llama 3.2 as an AI agent and

play01:35

compare it to the performance of GPT 40

play01:37

mini since generally the Llama 3.2 90b

play01:40

version is considered comparable I'll

play01:42

start by very briefly walking you

play01:44

through the custom-coded AI agent that

play01:46

I've created with Lang chain and Lang

play01:48

graph and then we'll Dive Right into

play01:50

testing out our agents all right so I

play01:52

want to kick us off here by showing you

play01:54

the code that I've developed for this AI

play01:56

agent to test a bunch of different llms

play01:59

and so I've done this with Lang chain

play02:01

and Lang graph and it's a little bit of

play02:03

a more complex implementation but it's

play02:05

very very robust and I'll go into it in

play02:07

just a little bit of detail here just to

play02:09

spark some curiosity for you but I'll

play02:11

also have a link in the description of

play02:12

this video to a GitHub repo where I have

play02:15

all this codes you can download it

play02:16

yourself and play around with a bunch of

play02:18

llms just like I'm going to do right

play02:19

here with llama 3.29 B and GPT 40 mini

play02:24

so in the main function here in my

play02:25

python script I'm just defining the

play02:27

streamlet UI so we can interact with our

play02:29

large language model in the browser I go

play02:31

into a lot more detail with topics like

play02:34

streamlet and other things in this

play02:35

implementation like Lane chain and Lane

play02:37

graph and other videos on my channel so

play02:39

feel free to check those out if you want

play02:41

something much more in depth but I'm

play02:42

going to be pretty brief here so we can

play02:44

go into testing the llms and then when

play02:46

we get a chat message from the user

play02:48

we're going to call Prompt AI to get our

play02:50

response and this is what actually

play02:52

interacts with our langing graph

play02:53

runnable to stream the response from the

play02:55

llm and so I'll go over to the runnable

play02:58

here so we can see how everything

play02:59

everything is set up with L graph it's

play03:01

pretty simple overall and so firstly we

play03:04

have this model mapping and this is what

play03:06

makes it so easy to swap in and out a

play03:09

bunch of different models with this AI

play03:11

agent and So based on our llm model

play03:13

environment variable which I'll show in

play03:15

a little bit we'll instantiate the right

play03:17

chat class from Lang chain based on if

play03:20

it's an open AI model an anthropic model

play03:23

or a model from grock and I'm even going

play03:25

to add support for hugging face in the

play03:26

near future and so going over to the

play03:29

envir variables here I have an example

play03:31

environment variable file in the repo so

play03:33

you can see how to set up all of your

play03:34

API keys that you need to play with the

play03:36

different models and then also here's

play03:38

where you define your llm model and so

play03:41

my whole script will determine which

play03:43

service to set up dynamically based on

play03:45

the value that you have here so you

play03:46

don't have to change any code to go from

play03:49

grock to anthropic or anthropic to open

play03:51

AI it is so so easy and that's part of

play03:54

the the whole setup here and so with

play03:56

that I'll just show how the graph is set

play03:58

up really really quickly so we just set

play04:00

up our chat instance we bind in all the

play04:02

tools that we have that I'll show in a

play04:04

little bit and then the graph is really

play04:06

really simple for the state we're just

play04:08

managing the messages in the

play04:10

conversation and then for the nodes we

play04:12

just have two we have one to call the

play04:14

llm and get the response and another to

play04:17

invoke any of the tools that the llm

play04:19

wants to invoke and then this router

play04:21

here is going to determine do we need to

play04:24

make any tool calls did the llm ask to

play04:26

do so if it did then we'll route to the

play04:29

tool node after we get a response from

play04:30

the llm otherwise we'll just route to

play04:33

the end of the graph and return the

play04:34

response to the user and this handles

play04:36

Loops as well so if the llm wants to

play04:38

invoke a tool and then it does and then

play04:40

it goes back to the llm and now it

play04:42

doesn't want to it would then exit the

play04:43

graph and so it can handle invoking a

play04:45

bunch of different Tools in a loop until

play04:47

the AI agent has done everything that

play04:49

the user asked it to do and then finally

play04:52

for our git runnable function this is

play04:53

where we create the graph defining all

play04:55

of the edges and nodes pile it together

play04:58

with memory and return it so that we can

play05:00

use it in our other python script where

play05:02

we have the streamlet UI set up all

play05:04

right so that is everything with the llm

play05:06

now I want to dive into the tools

play05:07

because that's where we can really see

play05:09

what this AI agent can actually do and

play05:11

that will Define how I'm actually going

play05:13

to test llama 3.2 and gbt 40 mini so

play05:17

first of all at the top like I showed

play05:18

briefly we take all these tools that we

play05:20

import from other files and we bind them

play05:23

into our model and so the different

play05:26

files are right here within the tools

play05:28

directory that you'll see in the

play05:30

repository in GitHub I split the files

play05:33

based on the service and so a sauna for

play05:35

task management Google Drive for file

play05:37

management and then the vector database

play05:39

tools that we have for Rag and that's

play05:40

just using a local chroma instance

play05:42

because I don't need anything fancy to

play05:44

test llms with Rag and so I'm not going

play05:46

to go into all these tools obviously um

play05:49

but for Asana we have SIMPLE functions

play05:51

here to create tasks to create projects

play05:54

to get tasks in a certain project all

play05:57

that good stuff and then for Google

play05:59

Drive we can search for files we can

play06:02

create files download files all that

play06:06

just pretty much everything that you

play06:07

want to do for crud in Google Drive

play06:09

searching through folders as well um and

play06:11

then for the vector database tools

play06:14

everything that you need for reg so

play06:16

we're able to um search for documents

play06:19

query documents and this is what does

play06:20

the similarity search that's the main

play06:22

rag that actual retrieval and then we

play06:25

can add documents to our knowledge base

play06:26

giving a file path and then also clear

play06:29

the knowled base so if we want to empty

play06:30

it of everything so that we can retest

play06:32

or move on to the next model we can do

play06:34

that with this and so that is everything

play06:36

for the tools and so now what I'm going

play06:37

to do is I'm going to set this up to

play06:40

work with GPT 40 mini to start and then

play06:43

I've got a couple of prompts that I want

play06:45

to run on it to see how well it does

play06:47

using all these different tools and then

play06:48

I'll do the exact same thing with llama

play06:50

3.2 90b all right so here I am in a

play06:53

streamlet UI for my large language model

play06:56

and right now I set the llm model

play06:58

environment variable to GP T4 mini so

play07:01

that is what we are playing with right

play07:02

here so what I'm going to do to test it

play07:04

is just start out with some more simpler

play07:06

tool call requests and then get up to

play07:08

things that are a bit more complicated

play07:09

and see how much it can handle and gbt

play07:11

40 mini is pretty impressive overall so

play07:14

I think you'll be surprised the kind of

play07:15

things that it's actually able to do so

play07:17

I'm going to start with a very basic

play07:19

request like what projects do I have in

play07:22

Assa we're going to ask the exact same

play07:24

things Al llama 3.2 90b when we test it

play07:27

and sure enough it listed all the

play07:29

projects that I have right here coding

play07:30

personal business YouTube and fitness

play07:32

very good all right so next up what I'm

play07:34

going to do is ask it to make a task for

play07:37

me so a bit more complicated because it

play07:39

has to Define some parameters now if I

play07:40

go into my terminal here for debugging

play07:43

when it wants to invoke the get Asana

play07:45

projects tool it doesn't need any

play07:46

arguments so it's a very basic tool call

play07:48

so now I'll ask it to create a task in

play07:51

my coding project to endworld hunger

play07:55

with code oh my goodness I can't spell

play07:58

with code by Monday big task but it

play08:01

doesn't care it's going to add it for me

play08:03

and so we'll give it a little bit to

play08:04

make that tool call and there we go it

play08:06

added it in Monday September 30th for

play08:08

the due date and it gives me a link as

play08:09

well and then if I go into my coding

play08:11

project sure enough this wasn't here

play08:12

before end world hunger with code due by

play08:15

Monday looking really really good all

play08:17

right so let's just keep getting more

play08:18

complicated here this thing is doing

play08:20

really great uh next up I wanted to

play08:21

actually do something with my Google

play08:23

Drive and so I have a bunch of meeting

play08:24

notes files that I want to search for

play08:26

and download right now so I'll start

play08:28

with get my and then I'll say 823

play08:31

meeting notes from Google Drive so it

play08:35

has to download it well first has to

play08:36

search for it and then download it and

play08:38

then give me the path to it locally as

play08:40

well so let's see if it can do

play08:42

everything and yes it is looking really

play08:44

good it's even even gave me the link

play08:46

here which I don't think this will

play08:48

actually work I'm not going to test that

play08:49

right now because it's just a local file

play08:50

but anyway that's looking good and yeah

play08:53

you can see all the tool calls I doing

play08:55

here it did a search First and it even

play08:57

formatted the search correctly there's

play08:59

some very specific ways that Google

play09:01

Drive has to uh format the searches in

play09:04

the API and then it downloaded the file

play09:06

once it got the ID from the search so

play09:08

very very good it's using the context

play09:10

from previous tool calls to make the

play09:12

next one so this is looking really nice

play09:14

and so next up um what I want to do is

play09:17

actually add this into my knowledge base

play09:18

so I'll say add this doc into your

play09:21

knowledge base and that way I can ask

play09:23

questions using Rag and it's going to

play09:25

have the information from this meeting

play09:27

notes in there to answer my question

play09:29

question so boom there we go then I can

play09:31

say what are the action items from the

play09:34

823

play09:35

meeting and yeah we'll go look at the

play09:38

terminal really quick it's really cool

play09:39

to see all the tool calls as they're

play09:40

happening um so yeah add to the

play09:42

knowledge base then it queries with the

play09:43

question gets the response and then

play09:45

gives it back out to me and this is

play09:47

perfect word for word if I go into my

play09:50

data folder here you can see that this

play09:51

was actually empty before I ran it so it

play09:53

downloaded 823 meeting notes and what we

play09:55

see here matches exactly what we have

play09:58

here so it is working really really well

play10:00

and so to make this even more

play10:02

complicated for gbt 40 mini I can make a

play10:05

request that would actually require it

play10:07

to download something from Google Drive

play10:09

add to the knowledge base then answer my

play10:11

question all in one and so I can do that

play10:12

by saying what are the action items from

play10:16

the 8:25 meeting and so in this case it

play10:19

doesn't have it downloaded it doesn't

play10:20

have it in the knowledge base so it has

play10:22

to intelligently know to do all of those

play10:24

and so it says right here it only has

play10:27

access to the 823 meeting bummer but

play10:29

what I can say is uh get it from the

play10:32

drive and do what you need to do to

play10:36

answer my question so hopefully this

play10:37

will prompt it to download it add it to

play10:39

the knowledge base then do the search

play10:41

with Rag and give me the response a lot

play10:43

going on here but gbt 40 mini is pretty

play10:46

good it can typically do this so yep it

play10:48

downloaded the file it looks like it's

play10:50

downloading a bunch of files which is

play10:51

really really weird I'm not sure why it

play10:53

did that but it added to the knowledge

play10:55

base and then it queried with what are

play10:56

the ACs from the 825 meeting and got the

play10:58

respon and there we go this is looking

play11:01

really really good so if I go over here

play11:03

I now have for some reason it decided to

play11:06

download the 825 meeting notes four

play11:08

times I'm not sure why it did that

play11:09

that's kind of weird but anyway it

play11:11

downloaded added to the knowledge base

play11:13

and we got the right answer so this is

play11:14

looking really really good I would say

play11:16

this is kind of the first time that GPT

play11:18

40 mini messed up I've never seen like

play11:20

clae 3.5 Sonet or GPT 40 do that kind of

play11:23

weird thing where it downloads the file

play11:24

four times but overall it is really

play11:27

really good as an agent and so now I'm

play11:29

going to go over to testing llama 3.2

play11:31

90b with the same questions and seeing

play11:34

how well it does okay so I stopped my

play11:36

streamline instance and I changed my llm

play11:38

model environment variable to llama 3.2

play11:40

90b and now I'm back up and running

play11:42

using that model for testing I tried

play11:45

using the other llama 3.2 models like 1B

play11:48

3B and 11b for function calling but they

play11:50

straight up don't work they won't invoke

play11:52

tools so that is why I'm only using the

play11:53

90b model here and it's the one that's

play11:55

comparable to GPT 40 mini anyway and so

play11:58

I'm going to start start by going with

play11:59

the exact same queries that I used for

play12:02

GPT 40 mini so I'll say what projects do

play12:05

I have in a sauna and just like before

play12:08

it's going to list out coding Fitness

play12:10

yep there we go business personal and

play12:12

YouTube that is exactly right and so now

play12:15

I'll follow up with just like before

play12:16

create a task in coding to create an AI

play12:21

pet startup by Tuesday all right let's

play12:24

see if it can pull this one off for me

play12:25

here all right there we go the task

play12:27

create an AI pet startup has been

play12:29

created Dubai Tuesday and sure enough

play12:31

that looks absolutely perfect so so far

play12:34

it is keeping up with GPT 4 mini and

play12:36

that is really really exciting because

play12:38

this is the Moment of Truth where we see

play12:40

do we have an LM model that can actually

play12:41

be a good AI agent and so next up I'm

play12:44

going to have interact with Google Drive

play12:46

and I'll say just like before download

play12:48

my 823 meeting notes from Google Drive

play12:53

let's see if it can actually pull this

play12:55

one off and so I'm going to have the

play12:56

terminal open and we can also watch as

play12:59

the tool calls come in just like before

play13:01

all right so the query came through and

play13:03

it looks like it is completely incorrect

play13:05

it has name contains and then it doesn't

play13:06

have any actual search term here so it

play13:10

is not looking as good as GPT 40 min and

play13:13

that is a bummer now this is a bit more

play13:15

of a complicated tool because it has to

play13:17

format the search query in a very

play13:19

specific way but even 4 mini was able to

play13:22

handle this this is kind of

play13:23

disappointing we'll give it a shot and

play13:25

see if it can correct itself here so I'm

play13:27

going to pause and come back after it

play13:29

goes through the loop a few more times

play13:30

and we'll see if it can hold itself

play13:32

together okay so after a while it failed

play13:35

to make the query and it even told me it

play13:37

needs a bit more information can you

play13:39

tell me more about the file name I

play13:40

should not have to do it when I ask for

play13:42

the 823 meeting notes and the file is

play13:45

basically just called 823 meeting notes

play13:47

in Google Drive it is definitely failing

play13:49

here so this unfortunately is not

play13:51

looking very good at all and so it seems

play13:55

to be failing with Google drive but I at

play13:56

least want to test it out with rag here

play13:57

because that's another really important

play13:59

thing and if you just have a use case

play14:01

with rag you can probably still use

play14:02

llama 3.2 but let's figure this out

play14:04

right now so first I'll ask it to clear

play14:07

my knowledge base so that way I just

play14:09

make sure that it doesn't have any

play14:10

information that it just had previously

play14:13

from when I ran it with GPT 40 mini so

play14:15

there we go I cleared my knowledge base

play14:17

so now what I'm going to do is I can't

play14:19

have it actually download the file from

play14:20

Google Drive determine the file path

play14:22

from it and then add that to the

play14:23

knowledge base so I'm just going to give

play14:25

it the file path directly so I'm going

play14:26

to go into here and I'm going to copy

play14:29

the path to this file go back in and

play14:33

paste it and say add this to your

play14:36

knowledge base and so this will invoke

play14:37

the function that using the file path

play14:40

will add it to the vector database and

play14:42

there we go boom the files been added to

play14:43

the knowledge base and so now I can ask

play14:45

it what are the action items from the

play14:48

823

play14:50

meeting and so now it should give us the

play14:53

same answer that I did before now that I

play14:55

have the knowledge in for rag all right

play14:58

there we go look looking good we've got

play14:59

the right answer for the action items

play15:01

from the meeting notes that have been

play15:02

added to the knowledge base so it seems

play15:05

like llama 3.2 overall is looking really

play15:08

good as an AI agent it's not as good as

play15:10

GPT 40 mini though which is pretty

play15:12

disappointing but llama 3.1 aside from

play15:16

the 405b version of course was unusable

play15:19

for function calling even the 70b

play15:22

version and so this is looking really

play15:23

nice and is definitely a step forward

play15:26

for local llms as AI agents so I'm

play15:28

pretty excited also I just wanted to say

play15:30

that I did a lot more testing off camera

play15:33

comparing llama 3.2 to GPT 40 mini for

play15:36

AI agents and it really does seem that

play15:38

llama 3.2 is great for function calling

play15:41

better than llama 3.1 but it doesn't

play15:43

quite reach the level of GPT 40 mini so

play15:46

kind of disappointing but at the same

play15:48

time it still is a huge step forward for

play15:50

these local llms and also there's a lot

play15:53

that you can do to make the dock strings

play15:56

better for the tools make the LM

play15:58

understand it better do some fine-tuning

play16:00

you can really really make it work if

play16:02

you want this demonstration here is just

play16:04

to show that at a base level gbt 4o mini

play16:07

still does surpass llama 3.2 90b as an

play16:11

AI agent so I hope that you found this

play16:13

video super super informative I'm going

play16:15

to keep doing these as new models come

play16:16

out until we do get to the point where

play16:18

there's just going to be some

play16:19

open-source model that just crushes it

play16:22

for AI agents if you appreciate this

play16:24

content I would really appreciate a like

play16:26

and a subscribe and with that I will see

play16:27

you in the next video y

Rate This
โ˜…
โ˜…
โ˜…
โ˜…
โ˜…

5.0 / 5 (0 votes)

Related Tags
Llama 3.2GPT-4 miniAI agentsLangChaintool callsAI benchmarkslocal LLMsfunction callingtask automationgenerative AI