Can GPT-4-Vision Play Texas Hold'em Poker?

Josh Bickett
3 Mar 202417:08

Summary

TLDRIn this video, GPD 4 Vision, an AI model, is showcased autonomously playing Texas hold'em poker. The AI takes screenshots to analyze the game state, then uses OCR to control the mouse and make decisions like folding, checking, or raising. Despite no specific poker training, GPD 4 Vision wins its first hand, demonstrating the potential for AI in gaming and raising questions about the future of online gaming with AI participants.

Takeaways

  • 🤖 GPD 4 Vision, an AI model, was given the ability to autonomously play Texas hold'em poker by controlling the mouse and browser to take screenshots and make decisions based on the game state.
  • 🎲 The AI was not specifically trained for poker but managed to perform impressively, suggesting potential for further improvement with targeted training.
  • 🔗 Multimodal Gamer is the code enabling the AI to interact with the game, and its source code is available for curious viewers through a link in the video description.
  • 📸 The AI takes screenshots to analyze the game hand and then decides on actions such as folding, checking, or raising based on the visual input.
  • 👀 Optical Character Recognition (OCR) is crucial for the AI to interpret the text on the screenshots and identify the available game options.
  • 🔄 The AI's decision-making process includes providing a reason for its chosen action, which was improved in later iterations of the project.
  • 🃏 GPD 4 Vision was able to play multiple hands, demonstrating the ability to adapt to different game situations and make strategic decisions.
  • 🏆 The AI successfully won its first game of poker, showcasing its potential as a gaming assistant.
  • 🚀 The video discusses the potential future where AI systems like GPD 4 Vision could outperform most people in games like poker, making online gaming for money risky due to the difficulty in distinguishing between human and AI players.
  • 💡 The project 'multimodal gamer' is not limited to poker and could be adapted to work with other games, opening up possibilities for AI to assist in a variety of gaming environments.

Q & A

  • What game did GPD 4 Vision autonomously play in the video?

    -GPD 4 Vision autonomously played Texas holdom, a form of Poker.

  • How did GPD 4 Vision control the gameplay?

    -GPD 4 Vision controlled the gameplay by taking screenshots to analyze the game hand and then using the mouse to click on available game options such as fold, check, or raise.

  • What was the performance of GPD 4 Vision in its first game?

    -GPD 4 Vision won its first game of Poker, showing impressive performance considering it wasn't specifically trained for the game.

  • What is multimodal gamer?

    -Multimodal gamer is the code that enables the project, allowing GPD 4 Vision to make decisions and interact with the game through screenshots and mouse control.

  • How did the creator improve GPD 4 Vision's gameplay?

    -The creator improved GPD 4 Vision's gameplay by enhancing the AI prompt, which now provides a clear reason and thought process behind each decision it makes during the game.

  • What role does OCR play in this project?

    -OCR (Optical Character Recognition) is critical in the project as it reads the text from the screenshots, allowing GPD 4 Vision to understand the game state and make decisions accordingly.

  • What are the possible actions GPD 4 Vision is aware of in the game?

    -GPD 4 Vision is aware of the following actions: fold, check, call, raise, wait, click, and continue.

  • How does the system handle situations where a button is not available?

    -In situations where a button is not available, the system may wait until the next turn or when the button becomes available to execute the desired action.

  • What are the potential implications of this technology for online gaming?

    -The technology could potentially be used to play online games, including those with real money involved, making it difficult to distinguish between human and AI players in the near future.

  • How can someone contribute to the multimodal gamer project?

    -Anyone interested in contributing to the multimodal gamer project can make a PR (Pull Request) to add support for other games.

  • What was the creator's final verdict on GPD 4 Vision's poker playing capabilities?

    -The creator concluded that GPD 4 Vision performed well, especially considering it wasn't trained specifically for poker, and could potentially be as good as an average person who doesn't play poker often.

Outlines

00:00

🎲 Introducing GPD 4 Vision's Poker Playing Capabilities

The video begins with the host explaining that GPD 4 Vision, an AI model, has been given the ability to autonomously play Texas hold'em poker. The AI controls the mouse and browser to take screenshots of the game, analyze the hand, and then make decisions to fold, check, or raise. The host is impressed with the AI's performance despite it not being specifically trained for poker and believes there is room for improvement. The video also mentions the 'multimodal gamer' code that enables this project and provides a link for viewers to explore further.

05:01

🏆 GPD 4 Vision's Progress and First Poker Win

The host demonstrates GPD 4 Vision playing multiple hands of poker, showcasing its decision-making process and the iterative improvements made to the 'multimodal gamer' code. The AI successfully plays a full hand and even wins a round, which is highlighted as a significant achievement. The video also addresses technical issues such as OCR failures and the AI's ability to adapt and make the correct decisions based on the game's state.

10:36

👀 Behind the Scenes: GPD 4 Vision's Poker Strategy

The host delves into the technical aspects of how GPD 4 Vision plays poker. The AI model, referred to as a 'multimodal' model due to its ability to process both text and visual inputs, is described. The prompt sent to the model includes the actions it can take, such as folding, checking, calling, and raising. The host explains the role of OCR (optical character recognition) in translating the AI's decisions into mouse clicks on the game screen. The video also discusses the potential for the system to be used with other games and the impressive capabilities of GPD 4 Vision in making poker decisions without specific training.

15:39

🤖 Ethical Considerations: AI in Online Poker

The host reflects on the implications of AI playing poker, especially online for money. The video raises concerns about the difficulty of distinguishing between human and AI players in the near future. The host suggests that by 2025, it may be challenging to know if you're playing against a real person or an AI, particularly in games like poker that rely on visual inputs and do not require long-term strategic planning. The video concludes with a call to action for viewers to subscribe and comment if they want to see more games added to the project.

Mindmap

Keywords

💡GPD 4 Vision

GPD 4 Vision is an AI model that is being showcased in the video for its ability to autonomously play Texas hold'em poker. It is given control over the user's mouse and browser to take screenshots and make decisions based on the game's state. The model demonstrates its capability by making poker moves such as folding, checking, or raising without specific training for poker, indicating its general intelligence and adaptability.

💡Texas hold'em

Texas hold'em is a popular variant of the card game poker. It is played with a standard deck of 52 cards and involves betting, dealing community cards, and determining the best hand at the table. In the video, the AI model GPD 4 Vision is shown playing this game, making strategic decisions like folding, checking, or raising based on the game's state and its understanding of poker rules and strategies.

💡Autonomous play

Autonomous play refers to the ability of a system or AI model to independently engage in and control gameplay without human intervention. In the context of the video, GPD 4 Vision autonomously plays Texas hold'em by taking screenshots, analyzing the game state, and executing actions such as folding, checking, or raising without any specific training for the game.

💡OCR (Optical Character Recognition)

OCR, or Optical Character Recognition, is a technology that enables the conversion of different types of documents, such as screenshots, into editable and searchable data. In the video, OCR is crucial for GPD 4 Vision to interpret the text on the poker game screen and identify the available actions, such as fold, check, or raise, to make gameplay decisions.

💡Multimodal Gamer

Multimodal Gamer is the code or project name used in the video to enable GPD 4 Vision to play games like Texas hold'em. It is designed to handle both text and visual inputs, allowing the AI to process information from screenshots and make decisions accordingly. The project demonstrates the potential for AI to interact with various types of games and applications by combining text and image recognition capabilities.

💡Latency

Latency in the context of the video refers to the delay or lag in the AI's response time when processing information and making decisions during the poker game. High latency can affect the AI's gameplay by causing slow reactions and potentially missing out on timely actions.

💡Poker strategy

Poker strategy encompasses the various tactics and decisions a player makes during a game of poker. This includes understanding the hand rankings, betting patterns, and the ability to read opponents. In the video, GPD 4 Vision demonstrates a basic poker strategy by making decisions like folding with weak hands and checking when no strong牌 (cards) are present, indicating an understanding of fundamental poker principles.

💡迭代 (Iterations)

迭代, or iterations in English, refers to the repeated cycles of development and refinement in a project or process. In the context of the video, the creator discusses various iterations of the Multimodal Gamer project, highlighting improvements made over time to enhance GPD 4 Vision's performance in playing poker.

💡Machine learning

Machine learning is a subset of artificial intelligence that involves the development of algorithms and models that allow computers to learn from and make predictions or decisions based on data. The video implies that GPD 4 Vision, while not specifically trained for poker, uses machine learning techniques to adapt and improve its gameplay strategies over time.

💡Online gaming

Online gaming refers to playing video games via the internet, where players can interact with each other in real-time. The video discusses the implications of AI models like GPD 4 Vision being used in online gaming, particularly in games involving real money, and raises concerns about the difficulty in distinguishing between human and AI players.

Highlights

GPD 4 Vision autonomously plays Texas hold'em poker by controlling the mouse and browser.

The AI takes screenshots to analyze the game hand and make decisions based on the visual input.

GPD 4 Vision's decision-making process is not specifically trained for poker, showcasing its general problem-solving capabilities.

The AI can click available game options such as fold, check, or raise using OCR (optical character recognition).

Multimodal Gamer is the code behind the project, enabling GPD 4 Vision to play poker by interpreting text and visual inputs.

The AI provides a clear reason for its actions, improving its performance in the game.

GPD 4 Vision won its first game hand of poker, demonstrating its potential in gaming applications.

The project has been iterated upon, with improvements made to the AI's decision-making and OCR processing.

The AI's gameplay suggests it could perform at the level of an average, non-professional poker player.

The video showcases the potential for AI to play online games, raising questions about the authenticity of online gaming interactions.

The technology could potentially be used for playing poker for money, indicating a shift in online gaming dynamics.

The project could be expanded to include other games, with the possibility of community contributions through PR (pull requests).

The video demonstrates the evolving capabilities of AI in gaming and its potential impact on the future of online poker.

The AI's ability to play poker without specific training highlights the potential for general AI applications beyond gaming.

The video serves as a proof of concept for AI's potential in multimodal gaming and decision-making.

Transcripts

play00:01

yes okay GPD 4 Vision just won its first

play00:07

game hand of

play00:09

Poker in this video I give GPD 4 the

play00:12

model behind chat GPT the ability to

play00:15

autonomously play Texas holdom a form of

play00:17

Poker I give it control of my mouse and

play00:20

my browser and it takes a screenshot to

play00:23

see what the game hand looks like and

play00:25

then after the screenshot it can use my

play00:27

mouse to control uh to control and then

play00:30

to click one of the available game

play00:32

options such as to fold to check or to

play00:34

raise I was impressed with how it did

play00:37

considering it wasn't trained

play00:38

specifically for poker I think it could

play00:39

be improve improved upon a lot more but

play00:43

I think it's an interesting look at what

play00:46

could be possible in the

play00:48

future real quick I just wanted to

play00:50

mention multimodal gamer which is the

play00:51

code that makes this project possible if

play00:53

you're curious to take a look there's a

play00:55

link in the description now let's get

play00:58

started okay let's see how how the first

play01:00

version

play01:02

does um so we're going to start it up

play01:06

okay now multimodal gamer is going to

play01:09

make a decision to click one of these

play01:11

three buttons

play01:14

and we'll see what it decides to

play01:17

do so it decides to check so it should

play01:19

be able to click that check

play01:26

button okay so check work okay so I have

play01:29

have a better working version of

play01:33

multimodal gamer

play01:35

playing Texas holdom so I thought let's

play01:38

start a new

play01:39

game and uh let it play through a little

play01:42

bit longer so you can see how it plays

play01:45

multiple hands so here we go we got the

play01:47

game started up let me start up

play01:49

multimodal

play01:51

gamer we just wait for the deck to be

play01:53

there okay let's go ahead and get it

play01:58

okay

play02:01

cool so now it's taking a

play02:03

screenshot and it should come back with

play02:06

the decision

play02:16

here takes a little while cuz GPD 4

play02:19

Vision has some

play02:21

latency so we're waiting for the

play02:23

decision okay so it decided to call so

play02:25

it should be able to click call and move

play02:27

on

play02:28

to uh um to continue playing this hand

play02:32

so it called now this part was a little

play02:35

tricky because sometimes a button's not

play02:37

available so sometimes it will wait

play02:40

until its next turn so now it's right

play02:42

now it's coming back with an action it

play02:44

decided to check Okay cool so it should

play02:47

go ahead and check

play02:49

now something

play02:51

failed okay I know why um it it failed

play02:56

because the screenshot was taken before

play02:58

the button was available so the OCR step

play03:01

failed now it's going to wait um because

play03:04

okay now it's going to check again it's

play03:06

going to try to play play again okay so

play03:08

it found check this time so it clicked

play03:11

it all right

play03:14

it's not got a great hand I wonder if it

play03:17

will

play03:21

fold okay it's going to oh it said check

play03:25

but it um check's not available on the

play03:27

screen call is available so it did the

play03:30

wrong thing there but it has the right

play03:32

idea so I think now it's probably going

play03:35

to try to call um as an next

play03:42

action okay it's going to

play03:49

call OCR work to click

play03:53

call okay really is not a great

play03:58

hand

play04:04

it wants to raise

play04:06

okay I don't know what it sees that I

play04:09

don't okay so that was

play04:15

failed so it failed there

play04:18

because actually I don't know it failed

play04:20

there let's see what it tries to

play04:24

do okay now it wants to check

play04:27

sure let's check

play04:31

okay so it checked now let's see what

play04:32

everyone else has ah see yeah it should

play04:35

have it should have folded but it's able

play04:38

to play a full hand which is which is

play04:39

progress

play04:41

cool so I improved the gp4 prompt so

play04:44

that it provides a clear reason and uh

play04:47

the thought for why it decides to do an

play04:50

action I think it's performing better so

play04:52

I wanted to play another hand let's uh

play04:54

go ahead and get multimodal gamer

play04:56

started Okay so

play05:01

this

play05:02

hand it's an okay hand it should try to

play05:04

play this hand I don't think it'll

play05:07

fold and if it plays his hand it should

play05:10

should give us a little information

play05:11

about how it's making its decision so I

play05:14

have a 10 to four my hand is weak it

play05:17

said but I didn't I didn't read the rest

play05:18

but it's going to check okay we have a

play05:21

pair okay so let's see what it says

play05:22

about

play05:26

that so any second now it should provide

play05:29

the thought and then I have to read it

play05:30

really quick before it goes

play05:33

on okay I should check uh no need to bet

play05:38

with a weak hand I mean it's not super

play05:39

weak

play05:42

but okay something on the OCR failed not

play05:44

a big deal it's just going to try

play05:51

again so we should have another thought

play05:54

here with no potential for improvement

play05:57

weak

play05:58

hand

play06:00

okay so I decided to check I didn't

play06:01

quite read the whole thought but I think

play06:04

this hand is

play06:05

okay the pair is a

play06:12

pair okay it's going to check I think

play06:15

that makes sense it didn't provide a

play06:16

thought there sometimes gp4 doesn't

play06:19

listen to The Prompt

play06:22

um there was an OCR issue there so it's

play06:25

going to try

play06:27

again maybe it'll have a better thought

play06:30

this time let's

play06:34

see okay it's going to check I I think

play06:36

that's the right move and it and it

play06:38

consistently decides that so okay so

play06:42

that

play06:43

succeeded so everyone's checked sweet

play06:46

okay we were good we got a three

play06:50

fours that's

play06:53

awesome check it should have bet but hey

play06:57

maybe it's just playing the safe side

play07:01

okay so that's an OCR issue I have to

play07:04

figure out for some reason it's

play07:06

not clicking where it

play07:11

should maybe it'll decide to raise this

play07:13

time no it's going to

play07:17

check

play07:19

cool I think we might have this

play07:24

game all right oh

play07:28

yes okay GPD 4 Vision just won its first

play07:33

game a hand of

play07:36

Poker

play07:38

nice all right so you saw the different

play07:40

iterations of the project and how it

play07:43

improved over time and then finally one

play07:46

now I'll just let it play through for a

play07:48

longer period of time and I'll put it on

play07:50

four times speed so that you can just

play07:52

see how it works

play07:55

[Music]

play07:58

unfiltered

play08:00

[Music]

play08:26

[Music]

play08:42

[Music]

play09:02

[Music]

play09:19

[Music]

play09:31

[Music]

play09:54

[Music]

play10:36

[Music]

play10:44

wow I enjoyed watching that and it won

play10:46

that last

play10:47

hand I think it could be as good as your

play10:50

average person who doesn't play poker

play10:52

very often like me um of course I'm

play10:54

probably not a good judge of Poker cuz

play10:56

I'm not very good at it but um

play10:58

definitely nowhere near like a

play10:59

professional level level but for

play11:01

something that wasn't trained

play11:02

specifically for poker it seems to know

play11:05

what to

play11:06

do let's look at the code a little bit

play11:09

so we're looking at the prompt right now

play11:13

this AI prompt is sent to the model gbd4

play11:16

Vision preview is the full name and this

play11:19

is a multimodal model which uh can

play11:23

receive both text and uh visual inputs

play11:27

so we send both we send this text to the

play11:30

model as well as the screenshot of our

play11:32

screen so if we look a little bit at

play11:34

what this um prompt is saying so we say

play11:37

you know you're a poker expert and um

play11:42

you get your goals to win for context

play11:45

will be uh you will be selecting buttons

play11:47

on a screen and here we can

play11:50

see all the actions it's aware of so it

play11:53

can fold check call raise weight um okay

play11:57

or continue um um the first four are

play12:00

just common poker moves and the next

play12:03

three are more related to like the

play12:05

logistics of how it needs to play so

play12:07

sometimes it needs to wait because maybe

play12:10

the other card players the other players

play12:12

are

play12:13

going or it's changing to a new hand

play12:17

sometimes it needs to click okay

play12:19

specifically when it raises it has to

play12:21

click okay to confirm and then continue

play12:24

is when the hand ends it can click

play12:27

continue and so this prompt is sent to

play12:29

gp4 and it's then expected output is a

play12:34

thought and an action and a reason so it

play12:37

can provide the main critical part is

play12:39

this action uh and the action will be

play12:41

one of those that we just uh looked at

play12:44

and what we do with that action so if we

play12:45

look at the system

play12:47

prompt is we send this system prompt uh

play12:53

with a user prompt that says see the

play12:55

screen so basically we're sending all

play12:57

this information AI into the model gp4

play12:59

Vision preview and we're getting back a

play13:03

response which is the action and we do

play13:06

some parsing with the action but the

play13:07

critical part is um to that makes this

play13:12

work is basically OCR which stands for

play13:14

optical character recognition we take

play13:16

the screenshot and then on our end we um

play13:19

use a library called e eocr to find all

play13:22

the text on the on the screenshot the

play13:24

reason we need to do that is we have to

play13:26

let gp4 learn uh have the ability to

play13:29

click Text so gp4 says my action is to

play13:32

fold well what do we do with that we

play13:34

take that action fold and we pass it

play13:37

over uh the OCR return which OCR will

play13:41

say here's all the text on the screen

play13:43

and we find the text on the screen that

play13:45

says

play13:46

fold and it has a coordinate system with

play13:49

it and we have we then pass that

play13:53

coordinates to our operating

play13:56

system to click that that coordinates on

play13:59

the screen so we can click at percentage

play14:01

which we basically calculate percentage

play14:04

of fold and it's clicked so all gbd4

play14:08

does is decide the action fold the rest

play14:11

is a bunch of different processing which

play14:14

we do in order to have fold be clicked

play14:19

um at a high level that's how it

play14:22

works

play14:23

so that's really the critical part of

play14:27

how it works but this this project

play14:29

multimodal gamer has also works with

play14:32

Mario 64 and we can add other games if

play14:35

anyone wants to make a PR um but

play14:37

basically it's a system to allow gp4 to

play14:40

see the screen decide an action and then

play14:42

we do a bunch of processing to make sure

play14:43

that action is fired with our keyboard

play14:45

and mouse so that's a high level of the

play14:47

code if you're interested in a more

play14:49

detailed view let me know in the

play14:51

comments and I'll definitely uh consider

play14:53

making another video with more

play14:56

detail well that concludes the review of

play14:58

seeing how gp4 Vision does at playing

play15:01

Texas holdom I was somewhat impressed

play15:04

that it ended at uh a higher score than

play15:06

it started it folded when it had bad

play15:09

hands and it checked when it had no K

play15:12

hand and it happened to win on that on

play15:14

that uh four time speed demo so gbd4

play15:19

vision is just like very generally

play15:20

knowledgeable but it appeared to know

play15:22

enough to make those decisions um in

play15:25

that in that longer gameplay it would be

play15:27

interesting to see what would happen if

play15:29

I let it play through for like an hour

play15:31

but that would cost me an arm and a leg

play15:33

and I won't do that right now but um

play15:35

maybe someday okay some final thoughts

play15:39

this was played on 247 poker.com with

play15:42

other computers you could probably set

play15:45

it up to play with Live players and you

play15:47

could Pro probably um have it play even

play15:50

for money and that's kind of scary to me

play15:54

um to think that when you're playing

play15:55

games online especially if there's games

play15:56

online with money it's really hard to

play15:58

know in the next year or two it'll be

play16:00

really hard to know if you're playing

play16:01

with a real person and at some point

play16:03

these systems will likely outperform

play16:06

most people so you probably don't makes

play16:09

me think you probably don't want to be

play16:10

playing games for money online in 2025

play16:14

um you just there's no way to know I

play16:15

don't think there'll be a way to know um

play16:17

there'll be a lot of games where like

play16:19

the systems are probably not good enough

play16:20

but something like poker seems like ripe

play16:24

for for this because it's not like it's

play16:27

long-term planning and and it's

play16:30

um and it's it's just just like an image

play16:34

is all you need and these multimodal

play16:36

models or the image multimodal models

play16:38

are going to be getting better and

play16:40

better so yeah I have a lot of thoughts

play16:42

about it but I thought this video would

play16:45

be valuable that it would show people

play16:47

that this is going to be possible and

play16:51

people can probably set up these systems

play16:52

to play like paid poker so something to

play16:56

think about uh for all of us I think

play16:59

yeah okay if you enjoyed this video uh

play17:02

definitely subscribe and like and let me

play17:04

know in the comments if you want me to

play17:05

add other games see you

Rate This

5.0 / 5 (0 votes)

Related Tags
AI GamingTexas Hold'emPoker AIMultimodal GamingOCR TechnologyGPD 4 VisionAI LearningGambling ConcernsOnline PokerTech Innovation