The Future of Generative AI Agents with Joon Sung Park
Summary
TLDRIn a discussion about AI agents, Jun Zen Park provides background on the evolution of agents, from early assistive agents like Clippy to modern conversational agents. He outlines two branches of agent development: tool-based agents meant to automate complex tasks, and simulation agents that mimic human behavior. Large language models enable more advanced, personalized agents, though interaction challenges remain around deploying agents for high-risk tasks. Park sees initial success for agents in soft-edge problem spaces like games and entertainment before expanding to other areas. Though applications like ChatGPT show promise, he questions if conversational agents are the ultimate killer app compared to historic examples like Microsoft Excel.
Takeaways
- 😀 LLMs like GPT-3 made AI agents possible by providing the ability to predict reasonable next actions given a context
- 👥 There are two main types of AI agents - tool-based agents to automate tasks, and simulation agents to model human behavior
- 💡 LLMs still need additional components like long-term memory and planning for full agent capabilities
- 🎮 Games were an inspiration for early agent research aiming to create human-like NPCs
- 🚦 Current LLM limitations around safety and fine-tuning may limit the range of possible agent behaviors
- 🎭 Simulation agents for 'soft edge' problems like games and entertainment may succeed sooner than tool agents
- 🔮 Multimodal (text + image) agents are an exciting area for future research
- ❓ It's unclear if ChatGPT represents the 'killer application' for LLMs we expected
- 📚 Agent hype cycles have spiked and faded as expectations exceeded capabilities
- 🤔 Carefully considering human-agent interaction and usage costs will be key to adoption
Q & A
What was the initial motivation for Jun to research generative agents?
-Jun was motivated by the question of what new and unique interactions large language models like GPT-3 would enable. He wanted to explore if these models could be used to generate believable human behavior and agents when given a micro context.
How does Jun define 'tool-based' agents versus 'simulation' agents?
-Tool-based agents are designed to automate complex tasks like buying plane tickets or ordering pizza. Simulation agents are used to populate game worlds or simulations, focusing more on replicating human behavior and relationships.
What capability did large language models add that enabled new progress in building agents?
-Large language models provided the ability to predict reasonable next sequences given a micro context or moment. This could replace manually scripting all possible agent behaviors.
What does Jun see as a current limitation in using models like GPT-3 for simulation agents?
-Models like GPT-3 have been specifically fine-tuned to remain safe and not surface unsafe content. This limits their ability to accurately reflect a full range of human experiences like conflict.
Where does Jun expect agent technologies to first succeed commercially in the next few years?
-Jun expects agent technologies to first succeed commercially in 'soft edge' problem spaces over the next few years, like simulations and games. There is more tolerance for failure in these areas.
What does Jun see as a key open question around why previous periods of hype around agents failed?
-Jun wonders if past agent hype cycles failed because not enough thought was given to interactions - how agents would actually be used and whether they solved needs users really had.
What future line of questioning around large language models is Jun interested in pursuing?
-Jun wonders if ChatGPT represents the 'killer app' for large language models that people were waiting for. He thinks it's worth discussing whether ChatGPT is actually as transformational as expected.
How does Jun suggest thinking about future model architectures that could replace Transformers?
-Jun suggests thinking about Transformer capabilities as an abstraction layer - focusing on the reasoning capacity it provides. The implementation could be replaced over 5-10 years while still building useful applications today.
Where does Jun look for inspiration on new research directions?
-Jun looks to foundational insights from early Artificial Intelligence researchers that have stood the test of time. He believes great ideas are timeless even if hype cycles come and go.
What aspect of current agent capabilities is Jun most interested in improving further?
-Jun is interested in enhancing accuracy to better reflect real human behavior and diversity. This could enable personalized and scalable simulations grounded in real communities.
Outlines
👨💻 Jun introduces himself and his work on generative agents
Jun provides background on himself, stating he is a PhD student working on human-computer interaction and natural language processing. He discusses his interest in exploring how large language models can enable new capabilities, which led him to work on a project called Generative Agents. This involves using language models to create AI agents that can populate and behave realistically within simulation worlds.
🤔 What motivated Jun's research direction and focus on agents
Jun explains his thought process in identifying a research direction. When large language models like GPT-3 emerged, it was unclear what novel capabilities they would truly enable. Simple tasks like classification weren't fundamentally new. Simulating human behavior via agents was ambitious, tied to longstanding AI goals, and could open new interaction possibilities.
😎 The evolution of agents and how they split into two communities
Jun provides historical context on agents. There have been tool-based agents focused on automation vs simulation agents focused on modeling human communities. With large language models, these have splitted into two present-day communities - one using tools and one focused on personalized, multi-agent simulation.
🌟 How LLMs enabled a breakthrough in quality and scale for agents
Previously, generating sophisticated agent behaviors required manual scripting which didn't scale. Large language models provide the ability to predict reasonable next actions given a situation context. By adding memory and planning, this provides a scalable path to coherent, long-term agent behavior.
🎥 Multimodal Inputs (Image, Video) will make the next breakthrough
Currently agents operate via text from visual inputs translated to language descriptions. With multimodal models like DALL-E expanding to handle images, video, etc directly, future agents could take raw perceptual inputs and better simulate human perspective and behavior.
🎯Opportunity for more accurate agents that model actual communities
Jun sees opportunity to evolve agents to accurately reflect real human social dynamics instead of fictional scenarios. This is challenging today due to content moderation but may be feasible over time. Accurate community simulation could enable applications in markets, personalization, policy, and beyond.
😅 Deployments today constrained to "soft failure" cases, long road ahead
Agents face adoption challenges when mistakes incur high costs, as in purchasing transactions. History shows hype cycles then fading interest in 6-12 months. The first deployments are likely in entertainment, gaming etc. Standards for auditability, controllability needed before handling complex real-world tasks.
🔮 Don't assume Agents will work now just because technology improved
Despite progress, it is worth questioning why past agent hype cycles failed. Just having greater technological capacity doesn't guarantee actual usefulness and adoption. Key questions remain around real user need, interaction model fit, and compelling benefit over cost.
❓Is ChatGPT actually the "killer app" for LLMs we've been waiting for?
The breadth of ChatGPT adoption seems incredible. However, it is largely an interface wrapper on existing skills. We should question whether it truly constitutes the "killer app" that maximizes impact of LLMs. If not, what applications are still missing that could better deliver generalized value?
📚 Historical inspiration from Simon, Newell as timeless resources
Instead of recent papers, Jun is inspired by timeless foundational works, like Herbert Simon and Allen Newell. Their insights launched fields and won accolades. Unlike hype cycles, their textbooks showcase ideas that stood test of time and continue enabling impact decades later.
Mindmap
Keywords
💡agents
💡large language models (LLMs)
💡generative agents
💡multi-agent systems
💡killer application
💡limitations
💡commercial deployment
💡future research
💡interaction challenges
💡multimodality
Highlights
Agents have finally made their way into real enterprises with real use cases
Large language models provided the key capability to generate believable human behavior
Agents can be categorized as tool-based or simulation-based
Tool-based agents aim to automate complex tasks while simulation agents model communities
The agent architecture gives agents long-term memory and planning abilities
Multimodal capabilities like images will make agents more powerful
Accuracy limitations due to safety constraints may need to be addressed
Agents will likely succeed first in soft-edge problem spaces before hard-edge ones
The interaction challenges, not the technology, caused past agent hype cycles to fail
It's worth questioning if ChatGPT is the killer app we've been waiting for
The killer app should enable manipulating the key data type the technology generates
Learn from past insights that had impact and stood the test of time
Review recently published cutting-edge papers for the latest developments
Refer to pioneering works in AI and cognitive science for foundational ideas
Current hype cycles shouldn't discount timeless, foundational concepts
Focus on capacities and modalities over specific underlying technologies
Transcripts
the number of users who use CH gbt is
that's
incredible but I think it's sort of
worth asking ourselves is that sort of
quote unquote the killer applications
that we were waiting for chpt does feel
like it's a fairly simple wrap around
lar sling model because that's what the
main is open AI has done fantastic
things to make it safer and make it more
useful F tuning I think what's really
great but I think it's worth asking if
that is actually the killer application
why is it a killer application and the
answer might actually come out that
maybe it actually isn't the killer
application that we were waiting for um
in which case what is going to be the
killer application that's really going
to add value in a much more
generalizable
way welcome to AI in the real world I'm
your host my name is Joan Chen and I am
a general partner at Foundation capital
I work closely with startups are
reshaping business with AI in this
series I'll be holding in-depth
discussions with leading AI researchers
we'll explore how St of-the-art AI
models are being applied in real
Enterprises today to kick things off I'm
excited to speak with Jun Zen Park a phc
student in computer science at Stanford
June works at the intersection of human
computer interaction and natural
language processing he is best known for
his research on AI agents we break down
how AI is transforming agent design
share advice for Builders working with
these models and unpack why we haven't
yet found the perfect killer app for AI
agents here's our conversation how are
you good what about you great seeing you
again good to see you again it's been a
while because the unconference was last
I want to say May June May or June yeah
last June May or June wow TimeWise and
the world has changed I think that
agents have finally made its way into
real Enterprises with real news cases
and it was not um back then it was a lot
of like what what could this be right um
right thanks to you and some of your
work which is why I'm super excited to
have this
conversation together um especially
right now since like uh Enterprises are
um uh like are are sinking in a real way
to adopt so uh so I thought who can can
we chat with that would have like a
really interesting perspective and
that's why we reached out back to you so
really appreciate the time of course
thanks for having me do you mind maybe
just to start like getting giving giving
us a quick rundown of like what's
happened maybe some of the background
that you have uh building this
technology yeah so let's see uh so do
you want me to just sort of speak about
sort of what has happened in the past
six months where sort of what would be
interested uh interested for you just
just a brief overview of what youve
worked on and also what's happened in
the last uh six months to a year in
terms of evolution yeah all right that
sounds good right so um I guess I do a
quick intro so I'm sort of I'm a PhD
student here uh sort of working the area
of HCI and and NP so as you know sort of
the work that we've done I think the one
that I'm sort of mainly known for is
this paper code generative agents um and
generative agents in particular was a
project that to ask can we use our
language models to create General agents
that can populate a simulation world
right so if you play something like Sim
City or Sims uh can we actually create
these NPC like characters that would
actually flood into the City and
actually live like humans and by
definition it is sort of everything from
how they would wake up in the morning
talk to each other form routines and
relationships all the way to creating
basically communities and ersing social
dynamic
Dynamics and sort of my interest in this
area really stems from this idea so this
is sort of what people at the
intersection of human computer
interaction and natural language
processing and machine learning like to
ask which is we now have these really
amazing models like large language
models and Foundation models the
question really becomes what are you
going to do with them right these models
are new and they're great and we think
they have great capacity but are they
really going to enable us to do
something that's quite and unique and
that has been sort of the focal point
for a lot of the research that I do and
ultimately the conversation that we got
down to was this idea of well these
models of train on Broad data like uh
like the web Wikipedia and so forth so
they can actually be used to generate a
lot of believable human behavior when
you're given a very micro context so can
we actually piece this together to
create human leg agents which is
something that AI more broadly has
envisioned since its founding day
um and we decided that this is the time
to do that and so that's how we got to
where we are uh so that's the general of
Agents uh and this is the paper that was
published in April last year or we put
it on archive in April and was
officially published November which is
which is crazy how much um uh the world
has has developed I'm curious what
initially
motivated uh this topic for you I'm sure
you had lots of different options in
terms of what to research and study why
did you decide to focus on
this yeah so ultimately it really was
the question of what will L language
models these new models that are being
trained really going to enable us to do
and when I started my PhD was around
like 2020 and that was when gpt3 was
just about to come out during my first
year we wrote a paper called uh
Foundation
models uh which sort of made this
observation that there's going to be
this new wave of models that's going to
come out where we're not going to be
training these models for a specific
task but rather we'll be training for a
modality right we're going to be
training this language model that can
process language and so
forth
and we thought that was going to be a
big opportunity there in terms of what
we can do with them but the question of
what are we going to do with them was
incredibly unclear so really our first
instinct as sort of researchers in more
machine learning in NLP community where
we sort of were drawn to was this idea
of can we do classifications or
generations with these models and seeing
that these models could do that was
really exciting because we didn't train
these models to do that but they could
but more from the interaction
perspective doing classification and
simple generation was something that we
already knew how to do so that didn't
feel fundamentally new so really the
question again became what are we going
to do that's going to be truly new and
transformative in the sense of
interaction
so that's uh what really Drew us to look
for these kind of ideas um and again
that's we thought simulating human
behavior in general computational agents
that felt like a big problem because in
part because it's something that again
our community had wanted for many
decades uh it was sort of the idea that
people in more the cognitive science
field that will inspire the early AI
research like alen new and Hart's these
folks were asking and we were certainly
inspired by those ideas and of course we
thought it would be a lot of fun because
we sort of grew up with Sims Pokemon and
these kind of games in the 90s and early
2000s and we were certainly inspired by
those games as well I love I love those
games as well and it's nice to see some
of that um play out in the real world
now I agree I think games are fun in the
sense that um like you know I think they
are inspirational in many ways because
they do they very forward looking in
many ways right because you can be a
little bit more playful and I think
research can be in many ways playful
especially when you're trying to do
really forward looking research so it
certainly is a big inspiration and I was
just going to sort of end that comment
by saying that I think it's worth asking
for us as a community what's going to be
the new sort of quote unquote cure
application of these
models in the sense that um when we had
personal computer in the early 80s and
so forth the computers were very cool uh
but what really made them into household
applications were the existence of this
what we would now consider as killer
application of PCS like Microsoft Excel
that really made tabular information uh
usable and scalable I think we l
language model Community are should also
be looking for those kind of ideas as
well because that's going to be
ultimately what's going to really
transform the user experience around
these models and I think we're seeing
some great uses uh usage of these models
but I think there's a lot more to do
going forward makes makes a lot of sense
when you look at the what's happened
since April right a lot of things have
changed uh we have new LM capabilities
we have a whole flurry of startups
building in the space could you maybe
summarize what what you've
seen right
so right so agan cty has been a big
thing uh especially first the latter
half of
2023 this is how I'm seeing it um agent
community in the sort of the way I view
it has split into two communities I
would argue now
so maybe it might actually Mak a little
more sense to really talk about like the
history of Agents because Asian became a
big thing last year but this is not a
new idea in and out of itself right in
this in even even in the commercial
space we actually had agents like
Microsoft clippy I'm not sure how many
of us will actually remember that but
there used to be these agents uh in this
in sort of our industry and in research
so this is certainly not a new idea uh
so if you go all the way back um so we
had agents like clippy and in many ways
these agents especially in the
reinforcement learning and machine
Learning Community agents were these
elements that basically
could simulate human behavior I think
that is ultimately sort of underlying
thesis but many of the agents were given
tools to automate certain tasks and the
tasks it were meant to automate were
tasks that are not simple right it's not
something like you're running a for loop
with your python code but it's a little
bit more complex in there right it
operates in much more embodied spaces or
in spaces that we often operated right
the web uh right can it the simplest
example with these kind of tool-based
agents are can it order me pizza can it
buy plane tickets and those might sound
simple but we we know from our
experiences that even ordering pizza
actually does require multiple steps
right we need to travel to certain
websites we need to look through the
menus actually make the payment and de
with sort of entering your address and
so forth so that was one Gena of agents
that already sort of existed for a long
time or I would say all genas of Agents
sort of existed but that was one draine
that was highlighted in the past uh so
you see things like clipp is also in
that draine as well you're a Microsoft
uh office user clippy would try to
automate some tasks for you based on
your prior interaction with the
software another a set of Agents um was
this idea of simulation agents or agents
that were cre cl to clarify on that
point those agents are single agents
correct they can be single agents they
were
often implemented as single agents
that's right I don't think by definition
they actually had to be single agents so
you actually try you're now seeing in at
least in the research you're start to
see glimpse of people trying to imagine
what would it look like for these agents
to be in a multi-agent setting so
research paper that I remember coming
out after uh gener of Agents was
basically what if you have a company of
Agents right there's going to be a CEO
but there's also going to be designer
agent who spe who works in some other
aspects there's going to be editor in
this company and those are still much
within the literature of what I would
call tool-based agents right they're
trying to automate some complex tasks
for the users and I think there's going
to be a lot of sort of really big
opportunities in the space that's
something that people have been working
on for a long time uh for for all the
right reasons now another community that
has form but to some extent actually has
a slightly different route is agents
that were created for
simulations uh and these agents were
certainly a part of games right in the
past we had Sims but we also had these
NPC characters that we could interact
with now those NPCs and agents back then
were very much like it was simpler
agents that were either rule-based uh
there were some reinforcement learning
agents back then as well in
space uh but another one that we could
usually think about were agents that
were used basically in social science
economic agents or agents that would
simulate our policy decision making and
so forth uh and those agents were also a
part of this
literature and what we're seeing today
is we're one recognizing that lar
language model is simulating human
behavior so it touches on all these
agents that it can be a foundational
sort of a architectural layer for
creating all these different sorts of
Agents but in terms of our initial
application spaces we're seeing this
split where there's one Community who's
now deeply interested in agents using
tools but another community that that is
deeply interested in this idea of can we
simulate and this is where I would say
like
multi-agents uh and as well as
personalization is really starting to be
highlighted in the simulation space
because it's a little bit more directly
incorporated into the idea of
simulations who are we simulating for
what are we simulating who are we
simulating and by definition simulations
often happen in this multi-agent space
so those are the two communities that
you're starting to see um so generative
agent certainly stands on the far end of
the simulation based agents whereas some
other projects that were also really
cool last year I think a lot of sort of
uh open AI gpts I would say are another
end of the simulation agents or another
end of toolbase Agents so those are the
axes that you're sort of seeing right
now now I sort of End by saying my hunch
actually is again though because they
all start from the same technical thesis
that we can simulate human behavior they
will merge in the end I don't think they
will be like completely separate thesis
like five to 10 years down the line it's
more going to be the question of where
are we going to make our short-term bets
and what's going to be an interesting
and meaningful application space in the
next two to five years so that's the
field that I'm seeing and how it's
developing right now before we maybe go
into that could you maybe describe how
um llm specifically has
affected um the especially the latter
cohort right what is the before and what
is the after and what is the magnitude
of improvement because of this uh
technology that's now you know cheap
enough to use right so L Lang motor is
really what made this possible uh that
is really the fundamental T stack that
we needed uh in the past when you wanted
to create and this goes for both types
of Agents uh tool-based and simulations
what you really needed
was you basically needed rule-based
agents that was the most common and Rule
based agents are sort of a more
sophisticated way of saying we're
scripting all the behaviors so imagine
you're building an NPC in for a game a
human author would actually write every
sentence that the agent would say to the
user for instance it would author would
actually describe in in either code or
language if this happens you do this so
you basically design all the possible
behaviors now that is expensive and not
scalable right um and that was the
fundamental block that we had now
toolbase agents had similar issues that
in many of the context it had to operate
it's not very generalizable tool so if
you sort of see how clippy or even some
of the agents that we're using today
very simple types of agent actually are
already embedded into our daily usage so
you may have used Google spreadsheet or
Google doc it would autocomplete in some
very rudimentary way that actually could
be considered in some ways an agent in
this direction of tool-based agents and
the rules they were using so far were
very simple it's not exactly rule based
but it is something that was very much
hardcoded into the agent's behavior and
there wasn't there was some learning
going on but that were very root like
very straightforward simple uh like
statistics that we're using what L
language model changes is l l model
gives us a single ingredient which is
given a micro context micro moment let's
say I'm sitting in this room talking to
Jan and about let's say generative
agents or simulations and so forth given
that micro moment description a language
model is extremely good at predicting
the next moment right so what is what is
the what are the reasonable set of
things that June might say in this
particular conversation given what he
knows it's very good at doing that um
that on its own not is not a perfect
agent uh or it's not the complete
ingredient that you need to create these
agents that are meant to live for many
many years or decades but they are the
right ingredient or building block that
we needed because that can be used to
replace what was in the past manual
authoring in the past we had to manually
author all the possible sequences given
any micro moment but L's language model
can come
in so given that ingredient what we
really could do is bake in long-term
memory and some reflection module on top
of it and planning module so given the
micro ingredient plus an agent
architecture that we give it on top of
it these agents can basically now start
to function as something that can
operate in that in a world that's much
like ours with a fairly decent degree of
long-term coherence so that's where we
are and that's really the difference it
made and I'd say this is sort of a zero
to one difference not a degree
difference because before Lis language
model this was not
possible what else is um so we like
large language models gave memory gave
context uh gave interactions to these
agents what else in a perfect world
would these agents have in order to
better mimic the real world like what's
maybe in the next next step just out of
curiosity uh right so to clarify L
language model doesn't actually have so
L language model provides one element
it's the micro sort of a module for
predicting the next sequence yeah uh the
it is the agent architecture that
actually ends up giving the memory and
planning ability um but those two pair
becomes a fantastic combination um now
going forward what I do think is going
to be interesting are so right now we're
using L language model but we may have
have all noticed that things like chpt
can now not only deal with just language
but also other modality like
image I think that's going to be really
interesting right
so right now let's say if you and this
is sort of based on my prior work like
call generative agents and we had this
game world like Sim and that we call a
Smallville the way these agents
perceived and operated in their world
was basically by um by translating like
our system translating the visual world
into natural language so we would tell
the agent you are in your apartment or
you are in the kitchen
talking to someone so we would actually
take the visual world and use our system
to translate the visual world into
natural language and then feeding it to
the agent architecture that would use
Earth language model to process this but
now with these models being able to deal
with multimodal aspect we might actually
be able to bypass that pH and go
straight from here is the visual world
or space that you're seeing right now
that is your memory now act on it I
think that's going to be potentially
very powerful because in
part image is much richer to some it
conveys a lot more I I do come from
natural language uh processing
background at least that's my other half
of my sort of academic background so I
have bias to towards believing the
natural language is profound and I think
that we're going to be that will be the
case going forward as well but image
does offer something that just language
alone does not so image is going to be a
big thing now imagine in the future
video is going to be a big thing as well
then gradually the more these agents
where basically increasingly get more
powerful as this new modality gets piled
on so that's something that we should be
looking forward to that's that's great
what are some on the downside what are
some of the
limitations um that you're seeing in
terms of these um these agents
especially General
agents right um
so to so there are limitations that I
can mention just about sort of in sort
of in the context of our work um and
then I think there are going to be
interesting limitations that are much
more application specific so for
generative agents today certainly the
technical limitation right now might
have to do with things like so you're
using whether it's an open so right now
we use open AI model open AI has
actually done a lot of work to make the
model safer and I think for open AI I
think that was the right approach in the
sense that what they really wanted to
create was these chatbots or agents or
chpt that was safe tool to use for most
of people now if you want to run a
simulation or create truly accurate and
believable agents with something like
chpt however that could become a
limitation because what we really
experience as humans uh is we fight we
sometimes have conflicts we disagree
with each other and that might not be
something that something like Chachi B
that's been fine tuned to not behave
that way to remain safe it's something
it might not be something that these
models will try to surface and that
could be a potential Block in creating
more accurate more believable
simulations or agents for that matter so
that certainly is one limitation right
now that we're seeing um an interesting
way to tackle this I think going forward
is to use open source models or other
models that have less of this fine-tuned
nature um but it's going to be highly
dependent on the models that we'll be
using for this so I think that's one
thing to look forward to got it got it
that's that's super that's super helpful
maybe one last question on kind of the
research side um when you think about
future areas to explore for you
specifically yeah um what are some of
the more narrow topics that you're
hoping to to dive deeper in given
world so ultimately I think making the
agents more accurate as rep for these
agents to be more accurate reflection of
who we are I think it's going to be a
really interesting research and much
like it's I think it's going to that's
going to be an area that's going to have
more of a research and broader impact um
so right now you may have seen the sort
of simulation demo the agents that live
in that simulation are fictional that we
just for instance we have an an agent
named Isabella we told Isabella that she
is a cafe owner and LGE language model
basically makes up what a Persona that
is reasonable given
description but I think it's going to be
far more interesting if we can make
these simulations actually closely model
our actual human communities so it's not
just functional but actually has
groundings that's going to open up from
our perspective an entirely new set of
application spaces as well as research
impact Um this can be used for instance
to accurately model or predict markets
or it's going to be able to use to more
closely personalized many of these
agents for individual use cases so
that's something that we're looking
forward to in terms of sort of a
particular topic that we're diving into
that plus of course scaling of the
agents I think that's another big one
but those two that that makes sense you
know one of the things that's missing in
most
AI uh Technologies is like really the
emotional part of how humans feel right
right like all of that data is largely
not captured and therefore not part of
any kind of models today uh language is
one small output of what we have it's
it's a very important output for sure
right um but it's still one small output
so I wonder how uh we might be able to
incorporate some of the data around our
emotions I agree and thoughts in the
future um maybe let's move on to the
applications today since you talked
about some of the challenges for agent
many organizations are thinking about
how to use large language models today
right there's a huge amount of
aspirations a subset of them are also
thinking about what are some of the
agent technology applications that are
viable within an Enterprise which has
limitations around infrastructure around
data silos security and all that
stuff any particular areas where you've
seen companies be successful at using
these Technologies in
production right so I think this is
going to be incredibly like Case by case
uh answer so I let me
think or if not like any hypothesis as
to where you might see the first
commercial deployments at
scale
right so there is something that I have
I this is there's a message that I have
in trying to communicate I think in in
different settings so this is not
something I'm I sort of conve for the
first
time and and my opinion has been getting
updated but I think fundamentally I
think this is right so the way I've been
describing it is in human computer
interaction or in
most task settings there are two types
of problems that we deploy our machines
or agents in one
has very hard Edge problem spaces these
are things like hey order me pizza or
buy me a plane ticket these are tasks
where there's a very concrete outcome
and there often is a right or wrong
answer that the agent has to achieve at
least from the user's perspective there
is something that will absolutely be yes
or
no and then there are problems PES where
we have soft Edge problems these are
problems where we can increase L he'll
climb towards sort of being better but
at certain level it starts to actually
become useful so to make this intuition
a little bit to make this a little bit
more intuitive here for instance if I
guess the worst possible case scenario
is I ask the agent to buy me a plane
ticket and it it bought me the wrong
ticket that just goes to a different
place then that's like a heavy
no uh whereas let's say I ask the agent
to sort of simulate the behavior that is
sort of fun so that I when I'm in a game
this is sort of entertaining and
interesting that might be something that
the Asian doesn't need to be quite
perfect in but it can still get there
quite quickly and then we can gradually
improve those two are the spaces that we
can sort of in terms of when we consider
where to deploy or how this will
actually make its first impact we might
actually be looking at those problem
spaces first and these are the classes
that I'm seeing if I were to make my bet
AG agents will likely succeed first in
the soft Edge problems basis and will
gradually inch into making it work in
the hardage problems space this has been
sort of an intuition with agent research
Community for some time so when clippy
for instance
failed our intuition there at least from
research perspective wasn't that these
agents failed because we didn't have the
technology there certainly these were
deployed and there were some confidence
around the technology but the the
problem there actually was with
interaction that when agents are
deployed in hardage problems basis it's
often deployed in states where it
actually has to have a fairly long chain
of steps and fairly High the risk were
fairly High when it fails the cost of
correcting its error is actually quite
High Cost of auditing its error is
actually quite
high so when these agents are deployed
in hardage problems SP bis it has to
reckon with the fact that it will
undoubtedly make
mistakes and it when it does make
mistake it has to be increasingly
auditable and controllable by the users
so that the cost of correcting its error
is not high enough that from the user's
perspective the cost benefit anal
analysis basically has to make sense and
that's been a fundamental challenge with
agents that's why in every era we see
the interest around agents Spike for a
while and then it quickly subsides after
maybe a half a year or two year now
there's there's a real question now
though that given the large language
model and the progress we saw that this
might not be the case this time or at
some point we might be able to make this
work but for now so I'm closely
monitoring this as well and I think we
all should I don't think we should just
go and say because it didn't work before
it's not going to work this time but my
hunch is that we will likely see very
similar pattern Arise at least for the
first of future and we haven't quite
dealt with the interaction problems with
those types of Agents so I think it's
much safer to assume that it is going to
be the softage problems basis and that's
why in some in many of the aspects
that's why we our team was also
interested in this idea of simulation
because simulation is sort of the prime
example of soft problems spasis where
the simulation has to be good enough for
to be start being useful that's also why
I think a lot of really early promising
AI startups that's going to go in the
agent space are places that does NPCs
for games because those are very safe
softage problem spaces where the agents
can fail but that's okay and gradually
we'll sort of go to the other area as
well but I think that's where the impact
is going to start in the next couple of
years I I I also think just seeing um
the startups in the the space in sectors
and functions that allow for failure
like you said include things like
marketing where if you Market
incorrectly it's not that big of a deal
if you uh write the wrong
code that's probably going to be a
bigger deal if you pick the wrong
security features that's a huge deal
right U if you pick the wrong things for
healthcare that's you know that's that's
an even bigger deal so there are degrees
of Tolerance fault tolerance within the
Enterprise um that's one thing and the
second on the consumer side especially
if the agents are just assisting
consumers by not executing on
anything um that could probably also
work for example you know there's a
company called rewind which is using
some of the agent Technologies I believe
um and they're getting a bunch of
consumer demand but but what consumers
are doing is just searching for a
behavior that they have had
before um and this this this product is
helping them do that versus do anything
you know um real world really
interesting uh but that's that's super
the way that you frame it is very um uh
is very
useful what about just from an
architecture
standpoint uh large language models
enabled by Transformer
architectures this is a whole different
direction we're already seeing companies
that are saying hey Transformers are are
not the most efficient you know
inference costs are very high uh let's
look at this the next thing have you
spent much time thinking about that um
and if so any impact to the work that
you're
doing right so certainly um next sort of
model that we're going to be banking on
I think that always is an important
topic and that's something that I think
we as a community always has to sort of
monitor because I think you're right the
Transformer is not going to be the end
model that would be hopefully I mean KN
going what the Hope here is that we
wouldn't be using Transformer 10 years
down the
line uh but one way that we do view this
is this is very much like a programmer a
programmer way of looking at this we
view this in abstractions right so what
Transformer has gotten us right now is
this amazing capacity for reasoning and
processing information and generating
information so it might be the case that
in the future that task will be done by
even better models and hopefully that's
going to be the case um but for the sake
of building applications it is true that
we can sort of view this as a layer of
abstraction that there might be some
other of Technology that's going to be
powering it in the future but really
what we're focusing on is the capacity
and the modality what kind of reasoning
using what modality can these technology
that exist today do and we're going to
be building on top of it so I think
that's sort of uh our way of looking at
it in sort of I say medium term like in
the next three to five years now if
you're look because right now there are
some promising architectures that's
going on that's sort of been created at
the Forefront of I more in the machine
learning and natural language processing
communities that I personally getting a
little bit excited about um but at the
moment those are still in the very much
in the research Pro uh phase and can you
share some examples of that yeah so I
think like there's one model that
recently came out like Mamba by some
folks uh those are from Stanford uh
folks so I think the author is now as
CMU and Princeton uh sort of all with
sort of this community so that's one
example of sort of a potentially
promising or interesting model and
that's a one that I recently heard about
that I think is interesting to be
looking
at but these models for them to be
deployed at scale in a commercial way if
we decide to basically go with certain
model that's getting created today it
will give us
maybe two to five year timeline before
they can really take off because
Transformer is
not it is a relatively modern model but
it really how you look at sort of the
timeline this transform sevenish years
sevenish years so I think hopefully we
if we find something like this time
maybe it's going to go much faster but
it still took about sevenish years for
chat PT to really come out so it's not
immediate uh whereas a lot of
interactions that we can build I think
there's a lot that we can do like today
to create really cool experiences so I
think that's how we're looking at this
there is a medium-term this is where we
are focused on the level of extraction
that this is the capacity that we'll
have and then maybe in the down the line
five to 10 year term we can really be
looking forward to some new models
that's going to make an
impact that's great that's great um cool
very cool um maybe more generally if you
just zoom out for for a moment when you
look at the ecosystem today uh what are
some of the problems that you want to
see solve we talked about multimodal a
little bit we talked about new models
right after Transformers that might come
out what are some of the problems that
you are most excited about someone
solving not necessarily uh you
personally but someone solving I sort of
have two in mind um and it's a little
bit less of a this is a specific problem
that I want someone to solve but it's
more sort of questions that I have that
I think more of us should be thinking
about and to some extent and this this
happens a lot with sort of the way I do
my research as well where I get inspired
by big problems or foundational problems
that we had in previous decades because
often times there's a lot of insights
that we can learn from the past as we
sort of build on the
future
one certainly I'm embedded into this
agent
space one is in the past agent had its
hype cycle basically but it
failed um that the hype cycle lasted for
a couple of years and then people very
quickly lost interest basically because
it didn't quite deliver the promises
that it
had I think it's worth asking ourselves
why that was the case I think the
opportunity this time is real but I also
think the opportunity in the past was
also real to some aspect as well so just
because the opportunity is real and
language model is really cool doesn't
necessarily guarantee us at least from
my perspective that we're going to that
agent will finally be a thing that
everyone will use down the f I think
there is a future where that's that will
happen at some point I think it might
even happen this cycle but I think it's
really worth asking as a community why
did it fail in the past so that we don't
repeat those mistakes one sort of main
thing I'm sort of curious about that I
don't think a lot of are thinking about
is actually not the technology part but
the
interaction how are these agents going
to be used in what way because
ultimately that's where it really
delivers value to the end users and
that's where agents in the past have
failed that was really cool technology
but we didn't seriously ask ourselves is
this something that people really need
and does the cost benefit analysis of
using these agents and learning how to
use them well really makes sense for the
broader user user base so that's one and
other one is sort of
my it's a little bit of a hot take but
it's also a shorter take which is we
have large language models and I think
these have made a huge impact already
right the number of users who use CH
jbts that's
incredible but I think it's sort of
worth asking
ourselves is that sort of quote unquote
the killer applications that we were
waiting for because in many ways
chbt or maybe it is and I I think if it
is I think somebody should articulate
this but chpt does feel like it's a
fairly simple wrap around L language
model because that's what itain is and
you know open AI has done fantastic
things to make it safer and make it more
useful by tuning I think what really
great but I think it's worth asking if
that is actually the killer application
why is it a killer application and the
answer might actually come out that
maybe it actually isn't the killer
application that we were waiting for um
in which case what is going to be the
killer application that's really going
to add value in a much more
generalizable way um that's a very
abstract question it's for for now it's
for me it's just a hunch that I think
there's something to be asked about
there and if I'm wrong like I would also
love to hear again somebody really say
we already have this clear application
maybe it's co-pilot track GPT and here's
why
but for now this is a question that I'm
still asking
myself that makes sense um and thank you
thank you for sharing that what are some
of your favorite AI apps today that you
use I love chbt I use it every day I
chpt did make a difference in my
workflow uh so I so as sort of a
researcher and one of the main things I
do is I program every day uh or at least
most days or I write or write papers so
I do one of those two things chpt is
fantastic at both uh so it's you know
it's as all programmers sort of know you
know we sometimes don't bother
remembering all the different functions
or
documentation it's very good at
generating a lot of the code when I have
an idea really impressive it's also
quite a good editor so if I make grammar
error in my sort of sentences chpt will
usually catch them for me right it's
simple and easy thing but it's good
enough now that it's actually making a
difference in the
workflow um so I say trpd for sure by
extension I think co-pilot will make a
difference um so it's sort of worth
asking maybe going back to the question
around like killer replication what is
the definition of killer application I
think it does some people Define it as
application that has more users and the
fact of that I think always has to be
the case that no killer application has
no user killer application by default
means the application that will have the
most number of
users but I think there's more
theoretical definition to what a killer
application is that implies a lot of
users or the most number of users but
for instance if we look back to the
prior era of PC a killer application
that I mentioned was something like
Microsoft's Excel or this tabular data
format the thing that would man that
would let us manipulate the tabular data
so really the definition in a more
theoretical sense of K application here
is there's new technology stack that is
been developed there's a new file type
that is getting generated then the
killer application is the one that would
let us manipulate the file application
file type that's one theoretical
definition that one could give at least
that's sort of the definition that I've
been toying with I think it's an
interesting one I don't think it's the
only one but those are sort of the ways
that I'm looking at this that that that
makes a lot of sense I also use CH GPT
every single day uh it's been um it's
been very helpful everything from coming
up with menu names to you know rewriting
uh emails that don't sound as As Nice um
and I've tried a little bit to you know
give it files and images oh I actually
help my mother create a background she's
a dancer so she was performing and
wanted a very specific background for
her dance and you know I created that
for her using chat gbt so all sorts of
utility there um but I love the way that
you framed um the last potential
application maybe just one last question
from from my end any um any resources or
books that you you love that is uh on um
this
topic on this
topic
right I I do think and this is this is
often the case with many many of The
Cutting Edge like spaces I think a lot
of the papers that are coming out that
are gaining a lot of attention I think
those are sort of wor checking out as
sort of the resources it's not exactly
like here's one book that we can all
look at uh uh but things are moving fast
enough that I think um I think those are
sort of interesting resources are just
things that are getting created today uh
and they're that quations uh so those I
sort of mentioned as sort of a generic
answer I do think I think this has been
a sort of running theme in some of the
things I mentioned today I get inspired
by insight
that basically had impact and stood the
test of
time right and the reason why that is
the case is because I personally think
all the great all great ideas are sort
of Timeless that because current hyp
cycle is over doesn't mean they're less
interesting or less meaningful for sure
that foundational ideas that will
continue to have impact um so when I
look for resources I actually look back
to books from truly the prior
Generations um so some of the works that
I often go back to are works by Herbert
Simon Ali new those are founders of AI
and this many of this Fields who would
later go on to win the Turing award the
Nobel Prize and so forth and those Works
uh early cognitive psychologists and
scientists inspired my worka uh and
their textbook uh those people actually
have written books because they were
much more established than sort of The
Cutting Edge spaces today uh so I go
back to those as sort of my personal
resources for getting
ideas that's great thank you so much
this was super super helpful to me
personally because what we do as
investors is we try to understand the
impacts of technology and start to
invest in companies when it
becomes at the beginning of when it
becomes commercially viable right so to
your point around what are the problem
spaces what are the applications in
which this can be applied in a cost
effective and secure way that where the
end user is willing to
interact uh and dra and and get value
that's when we start to come in uh and
invest in these companies which will
hopefully uh be much bigger companies in
the future so really appreciate this
chat yeah it was
[Music]
fun
Ver más vídeos relacionados
"More Agents is All You Need" Paper | Is Collective Intelligence the way to AGI?
LangChain Agents: A Simple, Fast-Paced Guide
Chatbot or AI Agent Setting up crewai framework for scaling tasks
STUNNING Step for Autonomous AI Agents PLUS OpenAI Defense Against JAILBROKEN Agents
Avoiding Mistakes in Defining Agents and Tasks in CrewAI
5 Problems Getting LLM Agents into Production
5.0 / 5 (0 votes)