Can GPT-4-Vision Play Texas Hold'em Poker?
Summary
TLDRIn this video, GPD 4 Vision, an AI model, is showcased autonomously playing Texas hold'em poker. The AI takes screenshots to analyze the game state, then uses OCR to control the mouse and make decisions like folding, checking, or raising. Despite no specific poker training, GPD 4 Vision wins its first hand, demonstrating the potential for AI in gaming and raising questions about the future of online gaming with AI participants.
Takeaways
- 🤖 GPD 4 Vision, an AI model, was given the ability to autonomously play Texas hold'em poker by controlling the mouse and browser to take screenshots and make decisions based on the game state.
- 🎲 The AI was not specifically trained for poker but managed to perform impressively, suggesting potential for further improvement with targeted training.
- 🔗 Multimodal Gamer is the code enabling the AI to interact with the game, and its source code is available for curious viewers through a link in the video description.
- 📸 The AI takes screenshots to analyze the game hand and then decides on actions such as folding, checking, or raising based on the visual input.
- 👀 Optical Character Recognition (OCR) is crucial for the AI to interpret the text on the screenshots and identify the available game options.
- 🔄 The AI's decision-making process includes providing a reason for its chosen action, which was improved in later iterations of the project.
- 🃏 GPD 4 Vision was able to play multiple hands, demonstrating the ability to adapt to different game situations and make strategic decisions.
- 🏆 The AI successfully won its first game of poker, showcasing its potential as a gaming assistant.
- 🚀 The video discusses the potential future where AI systems like GPD 4 Vision could outperform most people in games like poker, making online gaming for money risky due to the difficulty in distinguishing between human and AI players.
- 💡 The project 'multimodal gamer' is not limited to poker and could be adapted to work with other games, opening up possibilities for AI to assist in a variety of gaming environments.
Q & A
What game did GPD 4 Vision autonomously play in the video?
-GPD 4 Vision autonomously played Texas holdom, a form of Poker.
How did GPD 4 Vision control the gameplay?
-GPD 4 Vision controlled the gameplay by taking screenshots to analyze the game hand and then using the mouse to click on available game options such as fold, check, or raise.
What was the performance of GPD 4 Vision in its first game?
-GPD 4 Vision won its first game of Poker, showing impressive performance considering it wasn't specifically trained for the game.
What is multimodal gamer?
-Multimodal gamer is the code that enables the project, allowing GPD 4 Vision to make decisions and interact with the game through screenshots and mouse control.
How did the creator improve GPD 4 Vision's gameplay?
-The creator improved GPD 4 Vision's gameplay by enhancing the AI prompt, which now provides a clear reason and thought process behind each decision it makes during the game.
What role does OCR play in this project?
-OCR (Optical Character Recognition) is critical in the project as it reads the text from the screenshots, allowing GPD 4 Vision to understand the game state and make decisions accordingly.
What are the possible actions GPD 4 Vision is aware of in the game?
-GPD 4 Vision is aware of the following actions: fold, check, call, raise, wait, click, and continue.
How does the system handle situations where a button is not available?
-In situations where a button is not available, the system may wait until the next turn or when the button becomes available to execute the desired action.
What are the potential implications of this technology for online gaming?
-The technology could potentially be used to play online games, including those with real money involved, making it difficult to distinguish between human and AI players in the near future.
How can someone contribute to the multimodal gamer project?
-Anyone interested in contributing to the multimodal gamer project can make a PR (Pull Request) to add support for other games.
What was the creator's final verdict on GPD 4 Vision's poker playing capabilities?
-The creator concluded that GPD 4 Vision performed well, especially considering it wasn't trained specifically for poker, and could potentially be as good as an average person who doesn't play poker often.
Outlines
🎲 Introducing GPD 4 Vision's Poker Playing Capabilities
The video begins with the host explaining that GPD 4 Vision, an AI model, has been given the ability to autonomously play Texas hold'em poker. The AI controls the mouse and browser to take screenshots of the game, analyze the hand, and then make decisions to fold, check, or raise. The host is impressed with the AI's performance despite it not being specifically trained for poker and believes there is room for improvement. The video also mentions the 'multimodal gamer' code that enables this project and provides a link for viewers to explore further.
🏆 GPD 4 Vision's Progress and First Poker Win
The host demonstrates GPD 4 Vision playing multiple hands of poker, showcasing its decision-making process and the iterative improvements made to the 'multimodal gamer' code. The AI successfully plays a full hand and even wins a round, which is highlighted as a significant achievement. The video also addresses technical issues such as OCR failures and the AI's ability to adapt and make the correct decisions based on the game's state.
👀 Behind the Scenes: GPD 4 Vision's Poker Strategy
The host delves into the technical aspects of how GPD 4 Vision plays poker. The AI model, referred to as a 'multimodal' model due to its ability to process both text and visual inputs, is described. The prompt sent to the model includes the actions it can take, such as folding, checking, calling, and raising. The host explains the role of OCR (optical character recognition) in translating the AI's decisions into mouse clicks on the game screen. The video also discusses the potential for the system to be used with other games and the impressive capabilities of GPD 4 Vision in making poker decisions without specific training.
🤖 Ethical Considerations: AI in Online Poker
The host reflects on the implications of AI playing poker, especially online for money. The video raises concerns about the difficulty of distinguishing between human and AI players in the near future. The host suggests that by 2025, it may be challenging to know if you're playing against a real person or an AI, particularly in games like poker that rely on visual inputs and do not require long-term strategic planning. The video concludes with a call to action for viewers to subscribe and comment if they want to see more games added to the project.
Mindmap
Keywords
💡GPD 4 Vision
💡Texas hold'em
💡Autonomous play
💡OCR (Optical Character Recognition)
💡Multimodal Gamer
💡Latency
💡Poker strategy
💡迭代 (Iterations)
💡Machine learning
💡Online gaming
Highlights
GPD 4 Vision autonomously plays Texas hold'em poker by controlling the mouse and browser.
The AI takes screenshots to analyze the game hand and make decisions based on the visual input.
GPD 4 Vision's decision-making process is not specifically trained for poker, showcasing its general problem-solving capabilities.
The AI can click available game options such as fold, check, or raise using OCR (optical character recognition).
Multimodal Gamer is the code behind the project, enabling GPD 4 Vision to play poker by interpreting text and visual inputs.
The AI provides a clear reason for its actions, improving its performance in the game.
GPD 4 Vision won its first game hand of poker, demonstrating its potential in gaming applications.
The project has been iterated upon, with improvements made to the AI's decision-making and OCR processing.
The AI's gameplay suggests it could perform at the level of an average, non-professional poker player.
The video showcases the potential for AI to play online games, raising questions about the authenticity of online gaming interactions.
The technology could potentially be used for playing poker for money, indicating a shift in online gaming dynamics.
The project could be expanded to include other games, with the possibility of community contributions through PR (pull requests).
The video demonstrates the evolving capabilities of AI in gaming and its potential impact on the future of online poker.
The AI's ability to play poker without specific training highlights the potential for general AI applications beyond gaming.
The video serves as a proof of concept for AI's potential in multimodal gaming and decision-making.
Transcripts
yes okay GPD 4 Vision just won its first
game hand of
Poker in this video I give GPD 4 the
model behind chat GPT the ability to
autonomously play Texas holdom a form of
Poker I give it control of my mouse and
my browser and it takes a screenshot to
see what the game hand looks like and
then after the screenshot it can use my
mouse to control uh to control and then
to click one of the available game
options such as to fold to check or to
raise I was impressed with how it did
considering it wasn't trained
specifically for poker I think it could
be improve improved upon a lot more but
I think it's an interesting look at what
could be possible in the
future real quick I just wanted to
mention multimodal gamer which is the
code that makes this project possible if
you're curious to take a look there's a
link in the description now let's get
started okay let's see how how the first
version
does um so we're going to start it up
okay now multimodal gamer is going to
make a decision to click one of these
three buttons
and we'll see what it decides to
do so it decides to check so it should
be able to click that check
button okay so check work okay so I have
have a better working version of
multimodal gamer
playing Texas holdom so I thought let's
start a new
game and uh let it play through a little
bit longer so you can see how it plays
multiple hands so here we go we got the
game started up let me start up
multimodal
gamer we just wait for the deck to be
there okay let's go ahead and get it
okay
cool so now it's taking a
screenshot and it should come back with
the decision
here takes a little while cuz GPD 4
Vision has some
latency so we're waiting for the
decision okay so it decided to call so
it should be able to click call and move
on
to uh um to continue playing this hand
so it called now this part was a little
tricky because sometimes a button's not
available so sometimes it will wait
until its next turn so now it's right
now it's coming back with an action it
decided to check Okay cool so it should
go ahead and check
now something
failed okay I know why um it it failed
because the screenshot was taken before
the button was available so the OCR step
failed now it's going to wait um because
okay now it's going to check again it's
going to try to play play again okay so
it found check this time so it clicked
it all right
it's not got a great hand I wonder if it
will
fold okay it's going to oh it said check
but it um check's not available on the
screen call is available so it did the
wrong thing there but it has the right
idea so I think now it's probably going
to try to call um as an next
action okay it's going to
call OCR work to click
call okay really is not a great
hand
it wants to raise
okay I don't know what it sees that I
don't okay so that was
failed so it failed there
because actually I don't know it failed
there let's see what it tries to
do okay now it wants to check
sure let's check
okay so it checked now let's see what
everyone else has ah see yeah it should
have it should have folded but it's able
to play a full hand which is which is
progress
cool so I improved the gp4 prompt so
that it provides a clear reason and uh
the thought for why it decides to do an
action I think it's performing better so
I wanted to play another hand let's uh
go ahead and get multimodal gamer
started Okay so
this
hand it's an okay hand it should try to
play this hand I don't think it'll
fold and if it plays his hand it should
should give us a little information
about how it's making its decision so I
have a 10 to four my hand is weak it
said but I didn't I didn't read the rest
but it's going to check okay we have a
pair okay so let's see what it says
about
that so any second now it should provide
the thought and then I have to read it
really quick before it goes
on okay I should check uh no need to bet
with a weak hand I mean it's not super
weak
but okay something on the OCR failed not
a big deal it's just going to try
again so we should have another thought
here with no potential for improvement
weak
hand
okay so I decided to check I didn't
quite read the whole thought but I think
this hand is
okay the pair is a
pair okay it's going to check I think
that makes sense it didn't provide a
thought there sometimes gp4 doesn't
listen to The Prompt
um there was an OCR issue there so it's
going to try
again maybe it'll have a better thought
this time let's
see okay it's going to check I I think
that's the right move and it and it
consistently decides that so okay so
that
succeeded so everyone's checked sweet
okay we were good we got a three
fours that's
awesome check it should have bet but hey
maybe it's just playing the safe side
okay so that's an OCR issue I have to
figure out for some reason it's
not clicking where it
should maybe it'll decide to raise this
time no it's going to
check
cool I think we might have this
game all right oh
yes okay GPD 4 Vision just won its first
game a hand of
Poker
nice all right so you saw the different
iterations of the project and how it
improved over time and then finally one
now I'll just let it play through for a
longer period of time and I'll put it on
four times speed so that you can just
see how it works
[Music]
unfiltered
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
wow I enjoyed watching that and it won
that last
hand I think it could be as good as your
average person who doesn't play poker
very often like me um of course I'm
probably not a good judge of Poker cuz
I'm not very good at it but um
definitely nowhere near like a
professional level level but for
something that wasn't trained
specifically for poker it seems to know
what to
do let's look at the code a little bit
so we're looking at the prompt right now
this AI prompt is sent to the model gbd4
Vision preview is the full name and this
is a multimodal model which uh can
receive both text and uh visual inputs
so we send both we send this text to the
model as well as the screenshot of our
screen so if we look a little bit at
what this um prompt is saying so we say
you know you're a poker expert and um
you get your goals to win for context
will be uh you will be selecting buttons
on a screen and here we can
see all the actions it's aware of so it
can fold check call raise weight um okay
or continue um um the first four are
just common poker moves and the next
three are more related to like the
logistics of how it needs to play so
sometimes it needs to wait because maybe
the other card players the other players
are
going or it's changing to a new hand
sometimes it needs to click okay
specifically when it raises it has to
click okay to confirm and then continue
is when the hand ends it can click
continue and so this prompt is sent to
gp4 and it's then expected output is a
thought and an action and a reason so it
can provide the main critical part is
this action uh and the action will be
one of those that we just uh looked at
and what we do with that action so if we
look at the system
prompt is we send this system prompt uh
with a user prompt that says see the
screen so basically we're sending all
this information AI into the model gp4
Vision preview and we're getting back a
response which is the action and we do
some parsing with the action but the
critical part is um to that makes this
work is basically OCR which stands for
optical character recognition we take
the screenshot and then on our end we um
use a library called e eocr to find all
the text on the on the screenshot the
reason we need to do that is we have to
let gp4 learn uh have the ability to
click Text so gp4 says my action is to
fold well what do we do with that we
take that action fold and we pass it
over uh the OCR return which OCR will
say here's all the text on the screen
and we find the text on the screen that
says
fold and it has a coordinate system with
it and we have we then pass that
coordinates to our operating
system to click that that coordinates on
the screen so we can click at percentage
which we basically calculate percentage
of fold and it's clicked so all gbd4
does is decide the action fold the rest
is a bunch of different processing which
we do in order to have fold be clicked
um at a high level that's how it
works
so that's really the critical part of
how it works but this this project
multimodal gamer has also works with
Mario 64 and we can add other games if
anyone wants to make a PR um but
basically it's a system to allow gp4 to
see the screen decide an action and then
we do a bunch of processing to make sure
that action is fired with our keyboard
and mouse so that's a high level of the
code if you're interested in a more
detailed view let me know in the
comments and I'll definitely uh consider
making another video with more
detail well that concludes the review of
seeing how gp4 Vision does at playing
Texas holdom I was somewhat impressed
that it ended at uh a higher score than
it started it folded when it had bad
hands and it checked when it had no K
hand and it happened to win on that on
that uh four time speed demo so gbd4
vision is just like very generally
knowledgeable but it appeared to know
enough to make those decisions um in
that in that longer gameplay it would be
interesting to see what would happen if
I let it play through for like an hour
but that would cost me an arm and a leg
and I won't do that right now but um
maybe someday okay some final thoughts
this was played on 247 poker.com with
other computers you could probably set
it up to play with Live players and you
could Pro probably um have it play even
for money and that's kind of scary to me
um to think that when you're playing
games online especially if there's games
online with money it's really hard to
know in the next year or two it'll be
really hard to know if you're playing
with a real person and at some point
these systems will likely outperform
most people so you probably don't makes
me think you probably don't want to be
playing games for money online in 2025
um you just there's no way to know I
don't think there'll be a way to know um
there'll be a lot of games where like
the systems are probably not good enough
but something like poker seems like ripe
for for this because it's not like it's
long-term planning and and it's
um and it's it's just just like an image
is all you need and these multimodal
models or the image multimodal models
are going to be getting better and
better so yeah I have a lot of thoughts
about it but I thought this video would
be valuable that it would show people
that this is going to be possible and
people can probably set up these systems
to play like paid poker so something to
think about uh for all of us I think
yeah okay if you enjoyed this video uh
definitely subscribe and like and let me
know in the comments if you want me to
add other games see you
浏览更多相关视频
Use AI & Daily to generate Automatic Video Highlights
AI News : Gpt4o - Mini CRUSHES Claude, Sam Altman's Aggressive New plans , 3 Years Left Until AGI
ChatGPT Can Now Talk Like a Human [Latest Updates]
TOP 10 Cash Game TIPS! [Master The Fundamentals]
SHOCKING New AI Models! | All new GPT-4, Gemini, Imagen 2, Mistral and Command R+
Where We Go From Here with OpenAI's Mira Murati
5.0 / 5 (0 votes)