Will "Claude Investor" DOMINATE the Future of Investment Research?" - AI Agent Proliferation Begins
Summary
TLDRThe transcript discusses the rapid advancements in AI, particularly in agentic workflows, which are expected to drive significant progress in the field. It highlights the potential release of GPT-5 and the evolution of AI models like CLA, which has transitioned from creating self-portraits to investment analysis. The importance of iterative processes and multi-agent collaboration is emphasized, with examples such as the development of an open-source alternative to Devon and the iterative improvement in AI's ability to write code and play games like Super Mario 64. The transcript also touches on the educational resources available for learning about AI and the potential impact of AI agents on various industries.
Takeaways
- ๐ Andrew A emphasizes the significance of AI agentic workflows, predicting they will drive substantial AI progress, potentially outpacing the next generation of foundation models like the anticipated GPT-5.
- ๐ฎ Sam Altman expects a major model launch later in the year, with CLA 3 demonstrating a shift from creating self-portraits to more complex tasks like investment analysis.
- ๐ The creator of Claud investor, an AI investment analyst, has open-sourced the tool, showcasing the potential for AI in financial analysis and decision-making.
- ๐ค AI development is moving towards more collaborative and iterative processes, with multiple agents working together and refining tasks through planning, tool use, and reflection.
- ๐ AI's capability in coding is improving, with GPT 3.5 and 4 demonstrating higher correctness rates in coding tasks compared to their predecessors.
- ๐ ๏ธ Tool use is a critical component in AI workflows, allowing AI to gather information, take action, or process data more effectively.
- ๐ Iterative processes and multi-agent collaboration significantly enhance the quality of AI output, yielding results that surpass those of single-pass writing or isolated AI models.
- ๐ Andrew A shares a framework for categorizing design patterns for building agents, highlighting reflection, tool use, planning, and multi-agent collaboration as key elements.
- ๐ Claud Investor illustrates the application of AI in investment analysis, providing an example of how AI can synthesize financial data and news to make investment recommendations.
- ๐ฎ Experiments with GPT-4's ability to play games like Super Mario 64 show the potential for AI in interactive environments, despite latency and decision-making challenges.
Q & A
What does Andrew A. urge everyone to pay attention to?
-Andrew A. urges everyone to pay attention to AI agentic workflows, as he believes they will drive massive AI progress.
What is expected to be released later this year in terms of AI foundation models?
-GPT-5, the next generation of OpenAI's foundation models, is expected to be released later this year.
What is CLA, and what is its recent development?
-CLA is an AI model that has evolved from drawing somewhat disturbing self-portraits to trying its hand at beating Warren Buffett as an investor.
What does the creator behind Claud investor say about potentially open-sourcing it?
-The creator behind Claud investor has mentioned that they may open-source it the next day, and indeed, they do open-source it.
What is Devon and how was it received in the AI community?
-Devon is an open-source alternative to impressive seeming AI agents that was recently released and is currently available to a small group of testers.
What is the significance of Andrew A.'s statement about agentic workflows being more important than the next generation of foundation models?
-Andrew A.'s statement highlights the potential impact of agentic workflows on AI progress, suggesting that their importance may surpass that of foundational models like GPT-5.
How does an agentic workflow with an LM (Language Model) differ from a zero-shot approach?
-An agentic workflow involves the LM iterating over a document multiple times, planning, researching, writing drafts, and revising, whereas a zero-shot approach involves the LM answering a question or completing a task in one go without prior examples or iterations.
What is the 'Reflection' tool use case mentioned in the script?
-In the context of AI, 'Reflection' refers to the AI examining its own work to come up with ways to improve it, which is a critical part of an agentic workflow.
How does multi-agent collaboration improve the results of AI tasks?
-Multi-agent collaboration involves more than one AI agent working together, splitting up tasks, and discussing or debating ideas to come up with better solutions than a single agent would be able to achieve.
What is the improvement rate when incorporating an iterative agent workflow with GPT 3.5?
-Incorporating an iterative agent workflow with GPT 3.5 can achieve up to a 95.1% improvement rate, which is significantly better than the 48% rate of GPT 3.5 alone and even surpasses the performance of GPT 4.
What is the potential application of AI agents in the stock market according to the script?
-AI agents can be used to analyze financial data, news, sentiment, and industry trends for stocks within a given industry, rank them by investment potential, and provide price targets, although it's emphasized that these are for educational or informational use only.
Outlines
๐ Andrew A's Prediction on AI Agentic Workflows
Andrew A emphasizes the significance of AI agentic workflows, predicting they will drive substantial AI progress this year, potentially outpacing the next generation of foundation models like the anticipated GPT 5. He highlights the rapid advancement of these workflows, noting their improvement from mere ideas six months ago to highly effective tools now. Andrew A, a respected AI researcher, teaches a broader audience about AI through his courses on deeplearning.ai, many of which are free, making him a valuable resource for those interested in diving deep into the field.
๐ Iterative AI Workflows and Their Impact
The paragraph discusses the iterative nature of AI workflows, comparing them to human writing processes. It suggests that iterative AI workflows yield significantly better results than single-pass writing. The concept of a 'Society of Minds,' where multiple agents focus on different tasks, is introduced as a way to further enhance outcomes. The example of GPT 3.5's iterative process achieving a 95.1% success rate in a coding benchmark is provided, showcasing the dramatic improvement over non-iterative models. The paragraph also touches on the importance of providing examples to AI models, such as GPT 3.5 and GPT 4, to improve their performance significantly.
๐ ๏ธ Devon's Workspace and Multi-Agent Collaboration
The paragraph describes Devon's workspace, which includes various tools like a shell, browser, and editor, as well as a planner for breaking down tasks into subtasks. It highlights the problem-solving process Devon goes through when initializing a chart component, showcasing the iterative planning and execution mechanism. The paragraph also discusses the effectiveness of multi-agent collaboration, where multiple AI agents work together to discuss, debate, and refine ideas, leading to better solutions than a single agent could achieve. The potential applications of this collaborative approach in various fields are emphasized, along with the importance of open-source tools and frameworks for AI agent development.
๐ฎ GPT-4's Gaming Capabilities with Multimodal Gamer
Josh Biet, an engineer on the Hyperight AI project, explores GPT-4's ability to play Super Mario 64 using a multimodal gamer framework. Despite GPT-4's latency, the model demonstrates decision-making and navigation skills within the game. Biet iteratively improves the model's performance by adjusting the prompt and providing more context about the game's controls. The paragraph details the step-by-step process of how GPT-4 interacts with the game, including its successes and failures, and ends with a call for others to modify and expand the repository for playing other games, especially those less sensitive to latency issues.
๐ Final Iteration and Code Overview of Multimodal Gamer
The final iteration of the multimodal gamer project is presented, showing GPT-4's improved performance in navigating and playing Super Mario 64. A fast-forwarded version of the gameplay demonstrates the model's ability to learn and adjust its strategy over time. The paragraph then provides an overview of the code repository, explaining the simple structure and the use of a prompt to guide GPT-4's actions. The potential for the repository to be adapted for other games is discussed, with a focus on games that can accommodate the model's latency. Biet encourages others to modify the repository and experiment with different games, offering to share further developments through YouTube videos.
Mindmap
Keywords
๐กAI Agentic Workflows
๐กFoundation Models
๐ก่ฟญไปฃ่ฟ็จ (Iterative Process)
๐กOpen Source
๐กMulti-Agent Collaboration
๐กTool Use
๐กReflection
๐กPlanning
๐กFew-Shot Learning
๐กLatency
๐กMultimodal Gamer
Highlights
Andrew Ng emphasizes the importance of AI agentic workflows for driving significant AI progress, potentially more than the next generation of foundation models.
Expectations for the release of GPT-5 or the next iteration of OpenAI's foundation models are high.
CLA 3 showcases its versatility from creating self-portraits to attempting to outperform Warren Buffett as an investor.
The creator behind Claud investor has open-sourced the model, which could revolutionize investment analysis.
Devon, an impressive AI agent, is currently in a limited testing phase and demonstrates the rapid advancement of AI capabilities.
Andrew Ng advocates for AI education and offers free courses on deeplearning.ai for those interested in learning about AI.
AI agentic workflows are improving at an astonishing rate, with significant advancements observed over just a 6-month period.
The transition from using LLMs in zero-shot mode to iterative agent workflows marks a significant shift in AI development.
An iterative workflow with AI yields much better results than a single-pass approach, similar to human writing processes.
The concept of 'Society of Minds' where multiple agents with different focuses collaborate further improves AI outcomes.
GPT-3.5, when used iteratively in an agent loop, can achieve up to 95.1% correctness, surpassing the capabilities of GPT-4.
Open source agent tools and academic literature on agents are becoming more prevalent, indicating an exciting yet confusing time in AI development.
A framework for categorizing design patterns for building agents is shared by Andrew Ng, highlighting the practical applications of AI in various fields.
Reflection, tool use, planning, and multi-agent collaboration are identified as key components of effective AI agentic workflows.
Devon's workspace showcases the ability of AI to plan, execute, and troubleshoot tasks, demonstrating a human-like approach to problem-solving.
Multi-agent collaboration is extremely effective, as seen in various examples, including the iterative improvement of GPT 3.5's capabilities.
The potential of using AI agents like Claud investor for financial analysis and investment tracking is discussed, highlighting the educational and informational use of such tools.
Open sourcing of AI models and frameworks, such as the multimodal gamer, encourages experimentation and innovation in AI gaming applications.
The rapid iteration and improvement of AI models, as seen in the development of multimodal gamer, demonstrate the potential for AI in complex tasks like playing video games.
Transcripts
so Andrew a urges everyone to pay
attention to AI agentic workflows saying
that they will drive massive AI progress
this year potentially more than the next
generation of foundation models keep in
mind we're expecting GPT 5 to come out
later this year or whatever the next big
iteration of open ai's foundation models
will be Sam Alman said on Lex Freeman
podcast that he expects that the next
big model launch is later this year CLA
3 goes from drawing somewhat disturbing
self-portraits to trying its hand on
beating Warren Buffett as an investor
the Creator behind Claud investor even
says that he may open source it tomorrow
but will he spoiler alert he does
meanwhile if you got the technical chops
there's a team that's building in public
creating the open source alternative to
Devon Devon is of course one of the more
impressing seeming AI agents right now
came out just last week I believe and
currently is available to a small group
of testers I am not one of them why am I
not one of them am I not cool enough am
I not worthy of Devon fine see if I care
but I I do care I care a lot but let's
get back to Andrew a now Andrew a is one
of the most well-known well respected AI
researchers that's doing a lot to teach
a greater audience about AI about how to
use AI he's got a lot of courses at
deeplearning.ai a lot of them free so if
you're ready to dive deep he's got tons
of stuff on here with VAR specialists in
the the field a great resource a lot of
it is free I think he has a few paid
courses but a lot of this is free and so
he just posted this a few days ago
saying I think AI gentic workflows will
drive massive AI progress perhaps even
more than the next generation of
foundation models again that's that's
saying a lot knowing what Foundation
models we expect next this is kind of a
big deal he's saying this is an
important Trend and I urge everyone who
works in AI to pay attention to it and
if you've been following we've covered a
lot of these agents these agentic
workflows on his channel and they're
getting scary good scary Fast 6 months
ago it was just an idea there are some
examples of it but nothing really too
exciting that can use and slowly as they
start coming out each time shockingly
better the rate of progress is insanely
fast so he's saying today we use mostly
llms in zero shot mode so like you ask
it a question and it answers you which
is similar to asking somebody to just
write an essay start to finish typing
straight through without using
Backspaces you know not brainstorming
beforehand right just start typing it
out beginning to end with an agentic
workflow however we can ask the LM to
iterate over a document many many times
it might plan an outline decide what
kind of research it needs to do for
example do some Google searches to
gather more information it can write a
first draft read it over revise iterate
Etc when I did my review of Chad Dev
which was the sort of agentic workflow
where you had multiple agents each
responsible for their own area of
producing say a code or app or a little
game asked them to create something I
think it might have been Flappy Bird
right some simple game and each of these
agents each little person that you see
here they represent an agent that
actually existed in that sort of
environment so each one of these little
faces that you see here that was a
separate instance of GPT 3.5 that's what
they were using at the time you could I
believe KCK get up to GPT 4 but this
thing would you know use up a lot of
tokens so if you use gbt 4 you could
potentially you know run up quite a bill
but even with GPT 3.5 they were able to
produce very impressive results because
they would work together this was the
CTO the chief technical officer he would
go from design have them kind of write
the outline and everything that was
needed so this was kind of the planning
designing Etc then it would go into
coding they would actually create all
the code the CTO would kind of go along
with them and kind of pingpong the
process back and forth until it was
refined then they would kick it over to
testing and testing would like run it
see if they can spot any bugs make sure
everything works so they see this guy
kind of a no bug symbol appears here and
when I tested this last year I have a
video about it testing would keep
kicking the code back to coding coding
would refine it and this happened I mean
three four five times as I was sitting
and I can assumed this was a glitch this
was a doom Loop right I was like okay
it's just going to get stuck it's just
going to keep sitting there forever
burning through my API credits but no
eventually it was like okay yeah we got
rid of all the bugs and they kicked it
over to the documentation step that
built a manual for the game when I saw
this happening live on my computer I I
was kind of floored I was like all right
there's something big here because this
I mean they use the waterfall model of
development they discuss the code they
break it down into steps is this is very
very similar to what a development
agency would do it's eerily similar to
how humans would handle this sort of
process and the fact that they can
iterate and test bugs to make sure
everything's working I mean I think now
more people are aware that this can
happen but back then when I saw it for
the first time live on my computer like
happening you know locally with open AI
API but they were running on my computer
spitting out useful code that I could
run those games or apps or whatever and
troubleshooting them that that was weird
and also the fact that gbt 3.5 by
running multiple instances that all work
together kind of went back and forth It
produced GPT 4 like results what happens
when you string a bunch of GPT 4S
together right right when the next
Generation model comes out what happens
if you let's say GPT 5 right or Claude 4
or whatever what happens when you string
all of those together where each of them
has its own job its own focus and they
work together to refine that idea we're
not that far from that so Andrew Ang
continues the iterative process is
critical for most human writers to write
good text with AI such an iterative
workflow yields much better results than
writing in a single pass this is true we
know this to be true we also again from
a lot of the studies that I've seen as
well as real life results having
iterative results plus kind of this
Society of Minds kind of multiple agents
with their own Focus this even further
improves the results deon's splashy demo
recently received a lot of social media
Buzz my team has been closely following
the evolution of AI that writes code by
the way he he had this course for quite
you know like 6 months maybe pair
programming with a large language model
on deep learning that AI by the way I'm
obviously not getting paid to say any of
this this is a free course I just do
think this is a pretty cool resource if
if you're looking for the Deep technical
dive and so he's saying that they've
been doing quite a bit of research into
this he's saying GPT 3.5 zero shot was
48.1% correct they're using the human
eval coding Benchmark GPT 4 zero shot so
again zero shot meaning we're not giving
you examples of how to create that
particular code how to solve the
particular problem we're just saying
here's the problem and it spits out the
answer so GP 4 does better at 67% by the
way giving these models examples like
few shot learning can be massive so this
is Matt Schumer we'll talk about him
later so he is at hyper right AI so he
was he created one of the AI agents that
we reviewed here I believe this is the
team behind self-operating computer and
Hyper right AI they've had a lot of
updates on this so we'll definitely
check them out in a different video but
he posted this a few days ago he's
saying highest Alpha Secret in AI right
now if you provide around 10 examples to
claw 3 Hau so Hau is the tiny CLA 3
Model the smallest one I believe it goes
Hau Sonet and then Opus Opus is the one
that everyone's kind of focusing on as
the really good one right but he's
saying you give 10 examples you know 10
shot learning to Cloud 3 ha coup it will
often outperform Cloud 3 Opus and far
outperform GPT 4 at a fraction of the
cost with blazing speeds meaning that if
you if you tell clo theopus please do
XYZ right whatever you're asking to do
right and then whatever output it gives
you you'd say oh that's an a good job
CLA 3 Opus but you take CLA 3 ha coup
the small model it's super cheap very
fast and you tell it please do XYZ but
you give it 10 examples right you give
it here's an example 1 2 3 you give it
10 examples well that output that it
gives you that might be an A+ it might
be better than Cloud 3 Opus the bigger
model so that's an important point to
understand that a lot of the stuff it
Stacks few shot examples right it
improves creating multiple agents each
responsible for its own thing it
improves the results but next Andrew a
continues however the improvement from
gbt 3.5 to gbt 4 so from 48% to 67 why
it's it's dwarfed by incorporating an
iterative agent workflow indeed wrapped
in an agent Loop GPT 3.5 achieves up to
95.1% so it's massively massively better
than GPT 4 open source agent tools and
the academic literature on agents are
are proliferating making this an
exciting time but also a confusing one
and so to simplify he's sharing a
framework for categorizing design
patterns for building agents he's saying
his team AI fund is successfully using
these patterns in many applications and
I hope you find them useful isn't it
interesting how a lot of the stuff that
you know the top AI Minds sharing this
on Twitter so the rest of us can learn
it and use it and hopefully when we
learn something new also share it with
the world is kind of similar to a lot of
the stuff that we're talking about in
regards to how these AI agents work
together kind of interesting I think so
here are the things that they've been
finding extremely useful one is
reflection where the AI examines its own
work to come up with ways to improve it
tool use the LM is given tools such as
web search code execution or any other
function to help it gather information
take action or process data planning the
LM comes up with and executes a
multi-step plan to achieve a goal for
example writing an outline for an essay
then doing online research then writing
a draft and so on this is something that
I think d does extremely well here's
from Ethan mik's tweet that we went over
yesterday so here's Devon's workspace
right so he's got multiple things like
shell browser editor but he's also got
this planner where whatever task you
give him Devon breaks it up into
multiple steps kind of subtasks to
complete that and then one by one goes
through does it checks it off does it
checks it off Etc right including
troubleshooting so here looks like he
ran into I say he I mean so they call
this one Devon the this person building
the open source version is calling it
Dev cut so but whatever the case is so
Devon runs into some problem in
initializing a charts component right he
tries to figure out how to do it and
resolves it by you know importing
something that he needs the point is
there's some bug or some error that it
solves right and then checks it off
going yep now that we resolve that issue
we're going to redeploy the web app
check and so it just keeps going down
this list so a lot of the things that
people were complaining about GPT 4
being stupid and not being able to
complete certain tasks I mean how much
of that just goes up in smoke when you
add a really strong ability for it to
plan out its steps think through you
know step by step but also have some
sort of this some sort of a iterative
planning and executing mechanism and
then of course the final one multi-agent
collaboration more than one AI agent
work together splitting up tasks and
discussing and debating ideas to come up
with a better to come up with better
Solutions than a single agent would now
a lot of people might dismiss this like
isn't this the same thing as planning
and flection and iterating no now we've
covered multiple examples here on this
channel not just Chad but many other
where multi-agent collaboration is
extremely effective so you can see here
depending on what kind of architecture
we use right zero shot reflection tool
use planning multi-agent all right so
out of the box GPT 3.5 is here 47 46%
whatever that was GPT 4 much better
right very impressive 68 whatever 66 but
this is the Improvement when we add
those other types of architecture look
at this massive shot from where GPT 3.5
started to where it could be right the
same model but massive massive massive
Improvement by the way if you're
interested in this stuff Matt Schumer so
the guy behind hyper R AI is a great
follow that whole idea of using Claude
ha cou to get the quality of Opus at the
fraction of cost and latency so he made
a collab notebook right put all the code
in there and is open sourcing it so if
you wanted to try it out you now here it
is on GitHub and here's his latest so
this is the Claude investor the first
Claud 3 investment analyst agent just
provide an industry and it will one find
Financial data/ newws for key companies
two analyze sentiments SL trends for
each three rank Stocks by investment
potential plus price targets and it's
open sourced now you might look at the
specific things that goes into this and
say well these aren't the best things
that I would use for investment tracking
or whatever but you might have your own
sort of process that's fine but you can
this you can use this model to plug in
whatever process you use for finding
good Investments and and run it have
this have this AI workflow do all the
research for you and come back with
short summaries price targets Etc here's
the explanation of how Claud investor
works so user provides an industry agent
finds a few stocks to explore retrieves
key financial data and news for these
stocks analyzes sentiment industry
Trends and peer comparisons for each
generates investment recommendations
ranked my potential and obviously this
is just for educational or informational
use only don't use this for real stock
picking of course now I could see
someone that takes this framework and
builds their own using whatever data
that maybe people aren't really looking
at for example using Twitter sentiment
for example finding some sort of viral
trends that might translate to companies
doing better or worse there's a lot of
information online about the global
movement of ships and airplanes and
stuff like that they could give you
advanced warning if a company's in
trouble or about to do really well on
next on their next earnings call now I'm
not saying people should dive into this
but this or or something like this will
be used to make stock trades and he's
open sourcing it here so you can
actually check it out his framework for
doing this by the way and this is not
Financial advice it is just my opinion
on what I will do but the better these
AI tools get the more AI agents are out
there snooping for information the more
and more I will stay away from Trading
because I feel like that would be the
way to get slaughtered in the markets to
me the older I get the smarter buff it
seems just buy good businesses when
people are freaking out and then don't
sell just chill eat your burgers and
Coke somewhere far far away from wall
streets so you don't get all riled up
about whatever is happening so just
chill until you see an opportunity and
then furiously attack it like he does
this hamburger then chill I'm curious
let me know in the comments do you agree
that the more AI agents are out there
the the less most of us should try to
you know outperform their market and
outtrade the competition you know if
you're even into that but before you go
here's Josh biet so he's another
engineer on the hyperight AI project he
asks a simple question can gp4 with
vision play Super Mario 64 to answer
that he created the multimodal gamer
I'll link his profile below but uh check
this up I wrote some code to let the
model behind chat DBT play Super Mario
64 often said that these models
are uh predictors Not actors but I
thought I would give it a try and see if
the results speak for
themselves these models such as gp4 have
a bit of latency and I found that as the
primary issue um in most cases about how
it navigated and made
decisions it would be interesting if
latency was non-existent how this model
would could do if it could get more
frames per second I created a repository
called M multimodal gamer
and um it's basically a framework to
enable m multimodal models to play games
on a computer okay I have an initial
implementation so I'm going to try it
now okay so it took a screenshot
so let's see if it okay so it moved
forward moved
up and it said Mario is facing the path
forward let's start moving up The Path
moving continue moving up The Path
yeah so just an initial Pro of concept
let's make it
better okay I have the next
iteration and let's see if Mario can
cross the
bridge okay I'm going to start up
Mario okay my hands are off so now it
can GPT 4 Vision can decide on the
amount of time to hold it oh okay he
made it across the bridge so Mario needs
to go towards the bridge continue his
Advent Adventure hold up for three
seconds is what it did now it's
jumping Mario is facing a possible under
should jump over it well that's
wrong okay moving
up okay all right the duration is
helpful but 3 seconds is probably too
long since how infrequent the
screenshots are okay I'm going to try to
iterate
it so I made some
adjustments in the prompt so that gp4
can make multiple actions at a time and
it's a little more logical on the
duration of time it takes an action so
let's see how Mario does passing this
guy okay I'm going to start it
up okay it probably took the
screenshot
okay Mario's over there kind of in that
corner okay he's running round him oh
that he's not doing very
well uh okay he got
hurt hopefully he
goes facing turn around towards the star
okay he's running
away
he's
stuck Retreat
further possibly circle around it okay
so now he's
running head towards the star behind the
gate which involves freeing train Chop
or finding another way to
access how close to the gate now I
should approach the wooden post and
attempt to free train Chop okay
I don't know I don't know how what that
is she grabbed the
re okay it seems like it gp4 was on to
something
there need to repossession Mario to
grab oh ran into train
Chom
need to coin quickly to
recover yeah Mario's
struggling okay let's keep adjusting and
iterating
[Music]
it so I adjusted the prompt for gbd4 to
give it more context about the
controllers it has and the ones it
doesn't have like it can't toggle the
view um it might have helped so let's
see
how uh how Mario does
[Music]
now you can get past these
things okay so he's running
stopped
[Music]
the top of the
hill okay he's running he's running oh
he's in the middle wow okay that was
pretty good
actually oh he's running to the left oh
he got hit okay he passed him oh it
finished him okay he was
close so now that you've seen each
iteration uh of the project as I've
adjusted it I thought I would just share
a longer fast forwarded version of the
final interation I worked on I still
think there's a lot more that could be
improved but it's kind of fun to see how
it would do um not just in Snippets but
um over a longer period of time so
here's the fast forward version of it
navigating
let's look at the code so it's just a
few files in this repository it's uh
relatively small and let's start with a
prompt so the prompt that we're sending
to GPD for vision preview it's pretty
simple it's we're just saying you know
you're playing the game and um I set up
this prompt so it could be used with
other games but basically the game is
Super Mario 64 and then it has a goal to
collect Power Stars scattered across the
various levels in the game um and uh
which access through the paintings in
Prince uh Princess Peach's castle um and
then it has a controller uh the N64
controller just to give it some contacts
and I pass those into this long promp
string and we just say here are your
actions up down left right attack jump
and some context about what it's seeing
it's seeing a snapshot of the screen at
every iteration so yeah I mean that's
really the system prompt pretty basic um
and where we send that P system prompt
is in the API file so we take a
screenshot and uh we just save that to
the uh locally to the computer and then
we pass up that screenshot as a base 64
to the um open AI API um and we you send
it with a user prompt which says uh see
the screenshot uh of the game to and you
know do your next action basically here
is what we're saying I told the GPD 4
Vision that it can go up down left right
attack but then we have to convert that
to the keys of the keyboard so we do
that here and a function called press
which just uses this Library Pi Auto uh
guey and this Library let's do just uh
Fire keyboard events or Mouse events the
same as we do when we use a computer and
that's really the code um I hope this
repo can be adjusted by others and and
you can make it play at games I think
the greatest potential for this repo is
games where latency is not a problem
where um there may be step you take the
the game is step by step and it's okay
if you take a while at each step so um I
hope that others try to modify this repo
and try to build other games I might do
so myself um and if I do I will uh share
more uh YouTubes of it
Browse More Related Video
SHOCKING Robots EVOLVE in the SIMULATION plus OpenAI Leadership Just... LEAVES?
OpenAI'S "SECRET MODEL" Just LEAKED! (GPT-5 Release Date, Agents And More)
STUNNING Step for Autonomous AI Agents PLUS OpenAI Defense Against JAILBROKEN Agents
AI News: The AI Arms Race is Getting INSANE
Digital Transformations and AI - Video 3
OpenAI Reveals New ChatGPT-5 Details
5.0 / 5 (0 votes)