What are AI Agents?

IBM Technology
15 Jul 202412:28

Summary

TLDRThe video discusses the evolution of AI, focusing on the shift from monolithic models to compound AI systems. It explains how compound AI systems combine multiple components, like tuned models and external tools, to solve complex problems more efficiently. The concept of AI agents, which use large language models for reasoning and planning, is introduced. These agents can handle intricate tasks by breaking them down and using various tools. The video contrasts programmatic and agentic approaches, highlighting their applications based on problem complexity, and emphasizes the growing importance of agentic AI systems in 2024.

Takeaways

  • 🧠 The shift from monolithic models to compound AI systems is a significant development in generative AI, with the latter offering more adaptability and integration into existing processes.
  • 🏰 Monolithic models are limited by the data they've been trained on, making them less adaptable and less capable of handling tasks that require personalized or sensitive information.
  • 🔍 Compound AI systems unlock their potential when integrated with external databases and tools, allowing for more accurate and personalized responses to queries.
  • 🛠 System design in AI involves multiple components, such as models, output verifiers, and programmatic components, which can be combined to solve complex problems more effectively than tuning a single model.
  • 🔄 The modular nature of compound AI systems makes them inherently more adaptable, allowing for the selection of appropriate components to tackle specific tasks.
  • 🌐 Retrieval Augmented Generation (RAG) is highlighted as a popular compound AI system, but it can fail if the query deviates from its predefined search path, emphasizing the importance of control logic in AI systems.
  • 🤖 AI agents represent a further evolution, where large language models (LLMs) are put in charge of the logic and reasoning of the system, thanks to their improved capabilities in reasoning and problem-solving.
  • 📝 The three core capabilities of AI agents are reasoning, acting through external tools, and accessing memory, which includes both internal logs and conversation history for personalized interactions.
  • 🔄 ReACT (Reasoning and Acting Components of LLM agents) is a method for configuring AI agents, where the agent is prompted to think and plan before executing actions, potentially using external tools.
  • 🛑 The control logic of AI systems, whether programmatic or agentic, is crucial for determining the path to answer a query, with the agentic approach allowing for more complex problem-solving through iterative planning and execution.
  • 🌟 The video concludes by emphasizing the ongoing evolution and rapid progress in AI systems, suggesting that 2024 will be a year of significant growth for agentic AI approaches, offering a sliding scale of AI autonomy for different problem sets.

Q & A

  • What is the main topic discussed in the video script?

    -The main topic discussed in the video script is the evolution of AI agents, particularly the shift from monolithic models to compound AI systems and the concept of using large language models as agents in problem-solving.

  • What is the first shift mentioned in the field of generative AI?

    -The first shift mentioned is the move from monolithic models to compound AI systems, which are more adaptable and capable of solving a wider range of tasks by integrating models with external processes and tools.

  • Why are standalone AI models limited in their capabilities?

    -Standalone AI models are limited by the data they have been trained on, which impacts their knowledge about the world and the tasks they can solve. They are also hard to adapt without additional investment in data and resources.

  • Can you provide an example of how a compound AI system might work?

    -An example given is a system designed to plan a vacation, which integrates a language model with access to a database to fetch personal vacation days, thus providing a correct and personalized response to the user's query.

  • What is the significance of system design in compound AI systems?

    -System design in compound AI systems is significant because it allows for the integration of multiple components, such as different models and programmatic components, to solve complex problems more effectively than a single model could.

  • What is the role of programmatic control logic in compound AI systems?

    -Programmatic control logic defines the path a compound AI system takes to answer a query, determining how the system utilizes its components, such as searching databases or using external tools, to provide a solution.

  • What is an AI agent and how does it differ from a traditional AI system?

    -An AI agent is a system that uses a large language model to control its logic, allowing it to reason and plan how to tackle complex problems. It differs from traditional AI systems by having a higher degree of autonomy and the ability to break down and solve problems in a more human-like manner.

  • What are the three main capabilities of AI agents as discussed in the script?

    -The three main capabilities of AI agents are the ability to reason, the ability to act by calling external tools, and the ability to access memory for storing thoughts and conversation history to personalize the experience.

  • What is ReACT and how does it relate to AI agents?

    -ReACT stands for Reasoning and Acting Components of LLM agents. It is a method of configuring AI agents that combines their reasoning capabilities with the ability to act using external tools, creating a more dynamic and adaptable problem-solving approach.

  • How does the concept of memory play a role in AI agents?

    -Memory in AI agents can refer to the internal logs of the model's thought process or the history of human-agent interactions. This memory allows the agent to provide personalized experiences and to retrieve information at different points in time.

  • What is the significance of the sliding scale of AI autonomy mentioned in the script?

    -The sliding scale of AI autonomy represents the balance between a system's programmed responses and its ability to act independently. It highlights the trade-offs between efficiency and flexibility in AI systems, depending on the complexity of the tasks they are designed to perform.

Outlines

00:00

🔍 The Evolution of AI Agents

In 2024, AI agents are expected to become highly prominent. This involves a transition from monolithic models to compound AI systems, which integrate multiple components and adapt more easily than individual models. By combining language models with programmatic elements, these systems can handle more complex tasks, such as querying personal databases to provide accurate responses. The example of planning a vacation illustrates how a compound AI system can access and use various data sources to generate accurate answers, showcasing the advantages of modular, system-based approaches.

05:00

🤖 Agentic Approach to AI Systems

Large language models (LLMs) are now capable of reasoning and can be put in charge of controlling the logic of AI systems, leading to an agentic approach. This method contrasts with strict, fast-acting systems by allowing for slow, deliberate problem-solving. LLMs can break down complex tasks, plan steps, and use external tools or memory to achieve goals. The ReACT (Reason and Act) framework exemplifies this approach by combining reasoning with action, enhancing the model's ability to handle multifaceted queries through thoughtful planning and execution.

10:01

🧩 Modular Components of AI Agents

AI agents have three key capabilities: reasoning, acting, and accessing memory. Reasoning involves the model planning and solving problems step-by-step. Acting is facilitated by external tools, such as web searches or calculators, which the model can call upon as needed. Memory includes logs of the model's thought process and conversation history, enabling personalized interactions. The ReACT framework integrates these capabilities, allowing the agent to plan, execute actions, and iterate based on feedback, making it adaptable to complex, dynamic tasks.

Mindmap

Keywords

💡AI agents

AI agents refer to autonomous systems that can perform tasks, make decisions, and interact with users based on programmed capabilities and learned behaviors. In the video, the concept of AI agents is central to the discussion of the evolution of generative AI, emphasizing the shift from static models to dynamic, interactive systems that can solve complex problems by integrating various tools and data sources.

💡Generative AI

Generative AI encompasses a range of technologies that can generate new content, such as text, images, or music, based on learned patterns. The script discusses the shift in generative AI towards compound systems, which combine multiple models and tools to create more sophisticated and adaptable AI agents.

💡Monolithic models

Monolithic models are large, single-block systems that perform specific tasks but are limited by the data they've been trained on and are not easily adaptable. The video contrasts these with compound AI systems, which are more flexible and can be integrated into various processes, as illustrated by the vacation planning example.

💡Compound AI systems

Compound AI systems are multi-component systems that integrate various AI models and programmatic components to solve problems more effectively. The script describes these systems as being modular, allowing for the combination of different models and tools to create tailored solutions for specific tasks or queries.

💡System design

System design in the context of the video refers to the architectural approach of creating AI systems that are composed of multiple interacting components. It emphasizes the importance of modularity and the integration of models with other tools to enhance problem-solving capabilities, as opposed to relying solely on the capabilities of a single AI model.

💡Retrieval augmented generation (RAG)

Retrieval augmented generation (RAG) is a type of compound AI system that combines retrieval mechanisms with generative models. The script mentions RAG as an example of a popular compound system, highlighting its use in scenarios where the system must follow a specific path to answer a query, such as searching a database.

💡Control logic

Control logic in AI systems refers to the rules and procedures that dictate how a system processes and responds to input. The video discusses the importance of control logic in compound AI systems, where it can be either programmatically defined or managed by a large language model, affecting how the system approaches problem-solving.

💡Large language models (LLMs)

Large language models (LLMs) are AI models trained on vast amounts of text data, capable of understanding and generating human-like language. The script highlights the improvements in LLMs' reasoning capabilities, which enable them to be at the core of AI agents, prompting them to devise plans and reason through complex problems.

💡Agentic approach

The agentic approach in AI involves giving a large language model the role of overseeing the logic and decision-making process within a system. The video describes this as a method where the model is prompted to 'think slow,' breaking down complex problems into manageable parts and seeking external help when needed, leading to a more effective problem-solving process.

💡ReACT

ReACT stands for Reasoning and Acting Components of LLM agents and is a method for configuring AI agents. The script explains ReACT as a process where the model is prompted to reason through a problem, plan its actions, and execute them by calling upon external tools, observing the results, and iterating until a satisfactory solution is reached.

💡Autonomy

In the context of AI, autonomy refers to the degree to which a system can operate independently without human intervention. The video discusses a sliding scale of AI autonomy, where systems can be designed with varying levels of independence based on the complexity of the tasks they are expected to perform, from narrowly defined problems to a wide spectrum of queries.

Highlights

2024 will be the year of AI agents.

The first shift in generative AI is the move from monolithic models to compound AI systems.

Monolithic models are limited by the data they've been trained on and are hard to adapt.

Compound AI systems integrate models into existing processes to improve adaptability and accuracy.

Example: A system can fetch vacation data from a database and provide accurate responses.

Systems are modular and can include tuned models, language models, image generation models, and programmatic components.

Programmatic components can include output verifiers, query breakdown tools, database search functions, and other tools.

Retrieval augmented generation (RAG) systems are popular compound AI systems.

Control logic defines the path a query follows within a program.

Large language models (LLMs) can control the logic of a compound AI system, enabling an agentic approach.

LLMs can reason, plan, and break down complex problems.

Agents have the ability to act using external programs known as tools, such as web search, calculators, or other APIs.

Agents can access memory, including inner logs and conversation history, to personalize experiences.

ReACT combines reasoning and acting components in LLM agents.

For complex tasks, agentic systems are more effective, while for narrow problems, programmatic approaches are more efficient.

Agentic systems provide a sliding scale of AI autonomy, balancing efficiency and complexity.

The video emphasizes the ongoing rapid progress in agent systems and the importance of human oversight.

Transcripts

play00:00

2024 will be the year of AI agents.

play00:04

So what are AI agents?

play00:05

And to start explaining that,

play00:07

we have to look at the various shifts that  we're seeing in the field of generative AI.

play00:10

And the first shift I would like to talk  to you about

play00:13

is this move from monolithic models to compound AI systems.

play00:26

So models on their own are limited by the data they've been trained on.

play00:31

So that impacts what they know about the world

play00:34

and what sort of tasks they can solve.

play00:40

They are also hard to adapt.

play00:42

So you could tune a model, but it would take  an investment in data,

play00:46

and in resources.

play00:51

So let's take a concrete example  to illustrate this point.

play00:55

I want to plan a vacation for this summer,

play00:58

and I want to know how many vacation days are at my disposal.

play01:06

What I can do is take my query,

play01:10

feed that into a model that can generate a response.

play01:19

I think we can all expect that this answer will be incorrect,

play01:23

because the model doesn't know who I am

play01:26

and does not have access  to this sensitive information about me.

play01:30

So models on their own could be useful for a  number of tasks, as we've seen in other videos. 

play01:35

So they can help with summarizing documents,

play01:38

they can help me with creating first drafts for emails

play01:41

and different reports I'm trying to do.

play01:43

But the magic gets unlocked when I start building systems

play01:47

around the model and actually take the model and  integrate them into the existing processes I have.

play01:52

So if we were to design a system to solve this,

play01:56

I would have to give the model access to the  database where my vacation data is stored.

play02:03

So that same query would get  fed into the language model. 

play02:07

The difference now is the model would  be prompted to create a search query,  

play02:13

and that would be a search query that  can go into the database that I have. 

play02:18

So that would go and fetch the information  from the database, output an answer,  

play02:23

and then that would go back into the  model that can generate a sentence

play02:28

to answer, so, "Maya, you have ten days  left in your vacation database."

play02:33

So the answer that I would get here would be correct. 

play02:42

This is an example of a compound AI system,

play02:45

and it recognizes that certain problems are better solved

play02:48

when you apply the principles of system design.

play02:55

So what does that mean?

play02:58

By the term "system", you can understand there's multiple components.

play03:02

So systems are inherently modular.

play03:04

I can have a model, I can choose between tuned models,

play03:08

large language models, image generation models,

play03:11

but also I have programmatic components that can come around it.

play03:15

So I can have output verifiers.

play03:18

I can have programs that can that can take  a query and then break it down

play03:21

to increase the chances of the answer being correct.

play03:25

I can combine that with searching databases.

play03:27

I can combine that with different tools.

play03:30

So when we talking about a system approaches,

play03:33

I can break down what I desire my program to do

play03:36

and pick the right components to be able to solve that.

play03:40

And this is inherently easier to solve for than tuning a model.

play03:45

So that makes this much faster and quicker to adapt.

play03:54

Okay, so the example I use below,

play03:58

is an example of a compound AI system.

play04:00

You also might be popular with retrieval augmented generation (RAG),

play04:05

which is one of the most popular  and commonly used compound AI systems out there.

play04:11

Most RAG systems and the example I  use below are defined in a certain way. 

play04:18

So if I bring a very different query, let's  ask about the weather in this example here. 

play04:23

It's going to fail because this the path  that this program has to follow

play04:28

is to always search my vacation policy database.

play04:32

And that has nothing to do with the weather.

play04:34

So when we say the path to answer a query,

play04:37

we are talking about something called  the control logic of a program.

play04:43

So compound AI systems, we said   most of them have programmatic control logic.

play04:49

So that was something that I defined myself as the human.

play04:55

Now let's talk about, where do agents come in?

play05:00

One other way of controlling the logic  of a compound AI system

play05:04

is to put a large language model in charge,

play05:07

and this is only possible because   we're seeing tremendous improvements

play05:11

in the capabilities of reasoning   of large language models.

play05:15

So large language models, you  can feed them complex problems

play05:18

and you can prompt them to break them down  and come up with a plan on how to tackle it.

play05:23

Another way to think about it is,

play05:25

on one end of the spectrum,  I'm telling my system to think fast,

play05:30

act as programmed, and don't deviate  from the instructions I've given you.

play05:34

And on the other end of the spectrum,

play05:36

you're designing your system to think slow.

play05:40

So, create a plan, attack each part of the plan,

play05:44

see where you get stuck, see if you need to readjust the plan.

play05:47

So I might give you a complex question,

play05:49

and if you would just give me the  first answer that pops into your head,

play05:53

very likely the answer might be wrong,

play05:55

but you have higher chances of success  if you break it down,

play05:59

understand where you need external help to  solve some parts of the problem,

play06:02

and maybe take an afternoon to solve it.

play06:05

And when we put a LLMs in charge of the logic,

play06:08

this is when we're talking  about an agentic approach.

play06:13

So let's break down the components of LLM agents.

play06:19

The first capability is the ability to reason, which we talked about.

play06:24

So this is putting the model at the core of how problems are being solved.

play06:29

The model will be prompted to come up with a plan  and to reason about each step of the process along the way.

play06:35

Another capability of agents is the ability to act.

play06:39

And this is done by external programs  that are known in the industry as tools.

play06:45

So tools are external pieces of the program,

play06:48

and the model can define when to call them  and how to call them

play06:52

in order to best execute the  solution to the question they've been asked.

play06:56

So an example of a tool can be search,

play06:59

searching the web, searching a database at their disposal.

play07:03

Another example can be a  calculator to do some math. 

play07:08

This could be a piece of program code  that maybe might manipulate the database. 

play07:13

This can also be another language model that  maybe you're trying to do a translation task,  

play07:18

and you want a model that can be able to do that.

play07:21

And there's so many other possibilities of what can do here.

play07:23

So these can be APIs.

play07:25

Basically any piece of external program  you want to give your model access to. 

play07:30

Third capability, that is  the ability to access memory. 

play07:35

And the term "memory" can mean a couple of things.

play07:37

So we talked about the models thinking through the program

play07:41

kind of how you think out loud  when you're trying to solve through a problem.

play07:45

So those inner logs can be stored and can be  useful to retrieve at different points in time. 

play07:51

But also this could be the history of  conversations that you as a human had  

play07:56

when interacting with the agent.

play07:57

And that would allow to make the experience   much more personalized.

play08:01

So the way of configuring agents,   there's many are ways to approach it.

play08:05

One of the more most popular ways of going about it is through something called ReACT,

play08:11

which, as you can tell by the name,

play08:13

combines the reasoning and act components of LLM agents.

play08:18

So let's make this very concrete.

play08:21

What happens when I configure a REACT agent?

play08:23

You have your user query that gets fed into a model. So an alarm the alarm is given a prompt. 

play08:31

So the instructions that's given is don't  give me the first answer that pops to you. 

play08:37

Think slow planning your work. And then try to execute something. 

play08:44

Tried to act. And when you want to act, you can define whether. 

play08:49

If you want to use external tools to  help you come up with the solution. 

play08:53

Once you get you call a  tool and you get an answer. 

play08:56

Maybe it gave you the wrong answer  or it came up with an error. 

play09:00

You can observe that. So the alarm would observe. 

play09:02

The answer would determine if it does answer the  question at hand, or whether it needs to iterate  

play09:08

on the plan and tackle it differently. Up until I get to a final answer. 

play09:17

So let's go back and make  this very concrete again. 

play09:20

Let's talk about my vacation example. And as you can tell, I'm really excited  

play09:25

to go on one, so I want to take  the rest of my vacation days. 

play09:29

I'm planning to go on to Florida next month. 

play09:32

I'm planning on being outdoors  a lot and I'm prone to burning. 

play09:35

So I want to know what is the number of two ounce  sunscreen bottles that I should bring with me? 

play09:43

And this is a complex problem. So there's a first thing. 

play09:45

There's a number of things to plan. One is how many vacation days  

play09:49

are my planning to take? And maybe that is information  

play09:52

the system can retrieve from its memory. Because I asked that question before. 

play09:56

Two is how many hours do I plan to be in the sun? I said, I plan to be in there a lot,  

play10:01

so maybe that would mean looking into the weather  forecast, for next month in Florida and seeing  

play10:06

what is the average sun hours that are expected. Three is trying maybe going to a public health  

play10:13

website to understand what is the recommended  dosage of sunscreen per hour in the sun. 

play10:17

And then for doing some math, to be able  to determine how much of that sunscreen  

play10:22

fits into two ounce bottles. So that's quite complicated. 

play10:25

But what's really powerful here is  there's so many paths that can be  

play10:29

explored in order to solve a problem. So this makes the system quite modular. 

play10:33

And I can hit it with much more complex problems. So going back to the concept of compound AI  

play10:40

systems, compound AI systems are here to stay. What we're going to observe this year is that  

play10:44

they're going to become more agent tech. The way I like to think about it is  

play10:49

you have a sliding scale of AI autonomy. And you would the person defining the system  

play11:02

would examine what trade offs they want in terms  of autonomy in the system for certain problems,  

play11:09

especially problems that are narrow, well-defined. So you don't expect someone to ask them about the  

play11:14

weather when they need to ask about vacations. So a narrow problem set. 

play11:19

You can define a narrow system like this one. It's more efficient to go the programmatic  

play11:24

route because every single query  will be answered the same way. 

play11:27

If I were to apply the genetic approach here, there might be unnecessarily  

play11:32

looping and iteration. So for narrow problems, pragmatic approach can  

play11:36

be more efficient than going the generic route. But if I expect to have a system, accomplish very  

play11:43

complex tasks like, say, trying to solve  GitHub issues independently, and handle  

play11:50

a variety of queries, a spectrum of queries. This is where an agent de Groot can be helpful,  

play11:54

because it would take you too much effort to  configure every single path in the system. 

play11:59

And we're still in the early days of agent systems. 

play12:02

We're seeing rapid progress when you combine the  effects of system design with a genetic behavior. 

play12:08

And of course, you will have a human in the  loop in most cases as the accuracy is improving. 

play12:13

I hope you found this video very useful, and  please subscribe to the channel to learn more.

Rate This

5.0 / 5 (0 votes)

Related Tags
AI AgentsGenerative AISystem DesignReasoning ModelsCompound SystemsLanguage ModelsVacation PlanningSearch QueriesExternal ToolsReACT ModelAutonomy Scale