SHOCKING New AI Models! | All new GPT-4, Gemini, Imagen 2, Mistral and Command R+

AI Unleashed - The Coming Artificial Intelligence Revolution and Race to AGI
9 Apr 202410:43

Summary

TLDRGoogle DeepMind introduces Gemini 1.5 Pro, an AI model available for public preview on Google Cloud and Vertex AI platforms. The model boasts a 1 million token context window and is trained up to December 2023. GPT-4 Turbo with Vision has been released, offering significant improvements and enabling developers to build innovative applications. Meanwhile, Devon AI, an AI software engineering assistant, has garnered attention but also skepticism. Healthifme leverages GPT-4 Turbo Vision for nutrition insights through food photo recognition. The video also touches on the potential implications of AI agents on various industries and the economy.

Takeaways

  • 🚀 Google DeepMind has released Gemini 1.5 Pro in public preview on Google's cloud and Vertex AI platforms.
  • 🖼️ The new and improved Imagin 2 can create 4K live images from a single prompt, showcasing significant advancements in AI image generation.
  • 📱 GPT-4 Turbo with vision is now generally available in the API, having moved out of preview mode and featuring important improvements.
  • 💡 GPT-4 Turbo Vision introduces a 128,000 token context window and training data up to December 2023, enhancing its capabilities.
  • 🤖 Devon AI, an AI software engineering assistant, is making waves as an application of GPT-4 Turbo's vision capabilities.
  • 🕵️‍♂️ The YouTube channel 'Internet of Bugs' critically examines AI software development demos, questioning the authenticity of some recent presentations.
  • 🛠️ The potential impact of AI agents like Devon on the job market, economy, and remote work is vast and raises many questions about the future.
  • 🎨 TLDraw leverages GPT-4 Turbo Vision to transform user-doodled ideas into functional software, representing a potential shift in UI design.
  • 📈 Google Cloud's updates to Gemini, Gemma, and mlops on Vertex AI include enhanced image generation and multimodal content analysis.
  • 📅 The release of Gemini 1.5 Pro includes a 1 million context window, which could significantly improve its performance on various tasks.
  • 🏆 The leaderboard for AI models shows tight competition between OpenAI's models, with GPT-4 and CLA 3 Opus neck and neck at the top.

Q & A

  • What is the new AI model released by Google DeepMind in the public preview?

    -The new AI model released by Google DeepMind is Gemini 1.5 Pro, which is available in public preview on Google's cloud and Vertex AI platforms.

  • What improvements have been made to the GPT model recently?

    -The recent improvements to the GPT model include the release of GPT 4 Turbo with Vision, which has a 128,000 token context window and training data up to December 2023. It also now supports JSON mode and function calling, and vision requests can be made.

  • What is Devon AI, and what role does it play in software engineering?

    -Devon AI is an AI software engineering assistant powered by GPT 4 Turbo that uses vision for a variety of tasks. It has been making significant noise in the industry, showcasing its capabilities in tasks such as upwork side hustles and website building requests.

  • What are some concerns regarding the authenticity of AI software engineering demos like Devon AI?

    -There are concerns that the demos shown for AI software engineering tools like Devon AI may not be entirely genuine. Critics believe there could be some misrepresentation or 'shenanigans' going on, as evidenced by the thorough debunking done by the internet of bugs YouTube channel.

  • How does the Healthifme app utilize GPT for Turbo Vision?

    -Healthifme has built an app using GPT for Turbo Vision that provides users with nutrition insights by recognizing food photos from around the world.

  • What is the significance of the 1 million context window in Gemini 1.5 Pro?

    -The 1 million context window in Gemini 1.5 Pro is significant because it allows the model to handle large documents and find specific information within them efficiently. This capability is particularly useful for tasks like searching and analyzing multimodal content.

  • What is the potential impact of AI agents like Devon AI on the job market and economy?

    -The potential impact of AI agents includes the automation of various jobs, which could lead to changes in the economy and remote work. There are concerns about knowing who is real and who is not online, as well as how to protect against cyber attacks and maintain the quality of software development.

  • How does the GPT 4 Turbo Vision model facilitate user interface design?

    -GPT 4 Turbo Vision facilitates user interface design by allowing users to draw and annotate their ideas, which the model then turns into actual software. This rapid prototyping process can significantly speed up the development and iteration of user interfaces.

  • What are the capabilities of Google Cloud's updated Gemini imaging model?

    -The updated Gemini imaging model on Google Cloud can now create 4-second live images from a single prompt and supports processing audio inputs, including music, speech, and the audio portion of video. It can provide high-quality transcriptions or be used to search and analyze multimodal content.

  • How does the GPT 4 Turbo model perform in the Gladiator arena for LLM chatbots?

    -The GPT 4 Turbo model, once added to the Gladiator arena, performed well in comparison to other models. It was found to be significantly better than the CLA 3 High coup model, and its performance is closely monitored to see where it will rank among the top AI models.

  • What are the current rankings of the top AI models in the Gladiator arena?

    -As of the latest update, the top AI models in the Gladiator arena are CLA 3 Opus as the reigning king, followed by GPT 4, and then Bard from Gemini Pro. The new GPT 4 Turbo model is expected to join the rankings soon.

Outlines

00:00

🚀 New AI Releases and Improvements

This paragraph discusses the latest developments in AI technology. Google DeepMind has launched Gemini 1.5 Pro in public preview on Google's cloud and vertex AI platforms. The new and improved Imagin 2 has been introduced, capable of creating 4K live images from a single prompt. Additionally, GPT for Turbo with vision has been released from its preview mode and includes significant improvements such as JSON mode, function calling, a 128,000 token context window, and training data up to December 2023. The paragraph also highlights the skepticism around Devon AI's authenticity and the emergence of the internet of bugs, a YouTube channel debunking AI software development demos.

05:01

🖼️ Advancements in Image Generation and AI Models

The second paragraph focuses on advancements in image generation and AI models. Google Cloud has announced updates to Gemini imaging and Gemma and mlops on vertex AI. Gemini 1.5 Pro now supports audio inputs and can provide high-quality transcriptions. The Google Cloud's vertx AI Studio Vision allows for image generation, and the paragraph discusses the potential of these technologies. It also touches on the capabilities of GPT 4 Turbo Vision in user interface design and the potential future applications of AI in various fields.

10:03

🏆 AI Model Rankings and Upcoming Developments

This paragraph covers the current rankings of AI models and upcoming developments. The paragraph mentions the close competition between GPT 4 and CLA 3 Opus, with the latter recently surpassing GPT 4. It also introduces a new competitor, Command R plus, which has been making waves in the AI community. The paragraph concludes with a teaser about upcoming big news in the AI field, hinting at the potential release of a new model, gp4 turbo, and its expected impact on the current rankings.

Mindmap

Keywords

💡Gemini 1.5 Pro

Gemini 1.5 Pro is a reference to an advanced AI model developed by Google DeepMind. It is mentioned in the script as being available in public preview on Google's cloud and Vertex AI platforms. This model is significant as it represents the latest advancements in AI technology, capable of creating 4-second live images from a single prompt, showcasing the evolving capabilities of AI in image generation and processing. Its release is a major update from the previous version, indicating a leap in AI's ability to understand and generate content based on user input.

💡GPT for Turbo with Vision

GPT for Turbo with Vision refers to an AI model that has been enhanced with vision capabilities, allowing it to process and understand visual data in addition to text. This model is a significant upgrade from its predecessors, as it can now handle more complex tasks that involve both text and image data. The integration of vision into the GPT (Generative Pre-trained Transformer) model represents a step forward in AI's ability to interact with and understand the world, providing more nuanced and context-aware outputs.

💡AI software engineering assistant

An AI software engineering assistant is an artificial intelligence system designed to aid in software development tasks. These assistants can automate certain aspects of coding, debugging, and other software development processes, thereby increasing efficiency and reducing the time required for developers to complete tasks. The script mentions Devon, an AI software engineering assistant powered by GPT 4 Turbo, which uses vision for a variety of tasks, indicating the growing role of AI in assisting with complex technical work.

💡Internet of Bugs

The term 'Internet of Bugs' is used in the script to refer to a YouTube channel that critiques AI software development demos, pointing out potential issues or inaccuracies. This channel represents a form of oversight or quality control within the AI community, ensuring that claims made about AI capabilities are scrutinized and validated. The existence of such channels is important for maintaining transparency and trust in the rapidly evolving field of AI.

💡Debunking

Debunking refers to the process of revealing the truth about a claim, idea, or phenomenon, often with the intention of discrediting false or exaggerated statements. In the context of the script, debunking is associated with the Internet of Bugs channel, which aims to critically examine and challenge the demonstrations of AI capabilities. This process is crucial for fostering a balanced understanding of AI technology and its limitations, as well as for promoting honest and accurate representation of AI advancements.

💡Ethan Mik

Ethan Mik is mentioned in the script as an individual who has worked with AI, specifically with the Devon AI agent. His work involves exploring the potential of AI to interact with online platforms like Reddit, where the AI can take on tasks such as building websites based on user requests. This showcases the evolving capabilities of AI to engage in real-world tasks, solve problems, and potentially transform various aspects of online interaction and service provision.

💡Automation

Automation refers to the process of using technology, such as AI, to perform tasks with minimal human intervention. In the context of the script, automation is discussed in relation to the potential for AI agents to take over various jobs, transforming the economy, remote work, and software development. The script raises questions about the implications of widespread automation, including the need for discerning真伪 (real from fake) online and the potential impact on job markets and work processes.

💡Civil attacks

Civil attacks, in the context of the script, refer to disruptive actions that could be targeted at AI systems or the infrastructure that supports them. This could include cyberattacks like Distributed Denial of Service (DDoS) attacks, which aim to overwhelm a system with traffic to cause it to crash. The script raises concerns about the need to protect AI systems from such attacks, highlighting the importance of cybersecurity in an increasingly digital and AI-dependent world.

💡Healthifme

Healthifme is a company mentioned in the script that has developed an application using GPT for Turbo Vision. This application provides users with nutrition insights by recognizing food through photo recognition from around the world. The use of AI in health and nutrition represents the growing trend of applying advanced technology to improve personal well-being and make information more accessible to the general public.

💡User Interface Design

User Interface Design (UID) refers to the process of creating the look and feel of software applications, ensuring they are user-friendly and intuitive to use. The script discusses a potential future of UID where AI plays a significant role, allowing for rapid prototyping and iteration of designs based on user needs and preferences. The integration of AI into UID could revolutionize the way interfaces are developed, making the process more efficient and tailored to user experiences.

💡Gladiator Arena

The Gladiator Arena, as mentioned in the script, is a platform where different AI models compete against each other based on their performance in handling user prompts. It serves as a ranking system that evaluates the effectiveness and capabilities of various AI models, providing insights into which models are the most advanced and effective in understanding and responding to user inputs.

💡ELO Rating

The ELO rating system is a method for calculating the relative skill levels of players in zero-sum games such as chess. In the context of the script, it is used to rank AI models in the Gladiator Arena based on their performance in handling user prompts. The ELO rating provides a quantitative measure of an AI model's ability to respond effectively to user inputs, offering a standardized way to compare different models.

Highlights

Google Deep Mind releases Gemini 1.5 Pro in public preview on Google's cloud and vertex AI platforms.

New and improved imagin 2 can create 4C live images from a single prompt.

GPT for Turbo with vision is now generally available in the API, out of preview mode with important improvements.

GPT 4 Turbo F Vision has a 128,000 token context window and training data up to December 2023.

Devon AI, an AI software engineering assistant powered by GPT 4 Turbo with vision, is gaining attention.

Internet of bugs YouTube channel debunks Devon's demo, raising questions about the authenticity of AI demos.

Ethan mik's work with Devon AI agent on Reddit shows potential for AI in website building and problem-solving.

AI agents may open cans of worms in areas like remote work, software development, and cybersecurity.

Healthifim uses GPT for Turbo Vision to provide users with nutrition insights through photo recognition of foods.

TL draw demonstrates the potential of GPT 4 Vision in user interface design through rapid prototyping.

Google Deep Mind's image in to can create 4-second live images from a single prompt.

Gemini 1.5 Pro on vertex AI supports processing audio inputs, including music, speech, and video audio.

Gemini 1.5 Pro has a 1 million context window, excellent for finding specific information in large documents.

GPT 4 Turbo is added to the Gladiator arena for llm chat Bots to compete for the best model.

Model A (Open Chat 3.5) and Model B (CLA 3 High Coup) showcase differences in AI's ability to capture nuances in conversation.

CLA 3 Opus surpasses GPT 4 in the arena rankings, with interesting implications for the future of AI models.

Command R plus is a new competitor in the AI space, making waves and challenging existing models.

Transcripts

play00:00

Google Deep Mind wakes up this morning

play00:01

and releases Gemini 1.5 Pro that's now

play00:05

available in public preview on Google's

play00:07

cloud and vertex AI platforms which is

play00:10

actually really cool we'll look at this

play00:11

in just a second and they announced the

play00:13

new and improved imagin 2 that's able to

play00:17

create 4C live images from a single

play00:20

prompt crashing waves mountain range of

play00:24

course opening eyes like this can I be

play00:26

we got to drop something too so GPT for

play00:28

Turbo with vision is now generally

play00:31

available in the API so it's out of the

play00:33

preview mode and has been rolled out

play00:35

with some important improvements so

play00:38

there isn't a huge amount of specifics

play00:40

about what was improved here's what's

play00:42

new so this is the new model GPT 4 Turbo

play00:44

F Vision the latest gp4 turbo model with

play00:47

vision capabilities Vision requests can

play00:50

now use Json mode and function calling

play00:53

128,000 token context window and

play00:56

training data up to December 2023 they

play00:59

also give some examples of what

play01:00

developers are building with vision and

play01:03

they're saying drop whatever you're

play01:04

building in the reply as well they

play01:07

highlight Devon Devon AI has been making

play01:10

tons of noise tons of waves it's an AI

play01:13

software engineering assistant powered

play01:15

by GPT 4 Turbo that uses vision for a

play01:18

variety of tasks by the way not

play01:20

everyone's convinced that Devon and the

play01:22

demos that have been shown are the real

play01:25

deal they think that there may be some

play01:28

Shenanigans that's going on in internet

play01:30

of bugs is a new YouTube channel

play01:32

relatively new one month old that's

play01:34

gaining some traction pointing out some

play01:35

of the issues with these AI software

play01:38

development demos latest video debunking

play01:40

Devon first AI software engineer upw Cai

play01:43

exposed he specifically takes a look at

play01:46

this demo that cognition the company

play01:48

behind Devon posted showing Devon's

play01:51

upwork side Hustle the internet of bugs

play01:54

Channel goes through and does a thorough

play01:55

debunking of what the video claimed

play01:58

including a 30-minute un edited footage

play02:00

of him going through and doing

play02:03

everything that Devon did supposedly to

play02:06

complete that task now we've covered

play02:08

Ethan mik's work where he gets the Devon

play02:11

AI agent to go on Reddit and start a

play02:13

thread where it's going to take actual

play02:15

website building requests and it does

play02:17

that solving numerous problems along the

play02:18

way even at some point uh attempting to

play02:21

charge people for the work as Ethan MOG

play02:24

says agents are going to open a whole

play02:26

bunch of cans of worms these cans of

play02:30

warms are things like knowing who is

play02:32

real and who is not online how does the

play02:35

jobs and economy change when a lot of

play02:37

these jobs can be automated by these

play02:39

agents how does remote work change how

play02:42

does software development change how do

play02:44

we protect against things like civil

play02:45

attacks DDOS attacks like there's a

play02:48

million questions that if you imagine

play02:50

these agents will continue developing

play02:51

and getting better right there's a whole

play02:53

bunch of cans of worms that are going to

play02:56

open and these agents are the can

play02:58

openers that's not not even a joke it's

play03:00

not me being funny it's just what's

play03:02

coming now if agent Trader here is

play03:05

correct then maybe these changes aren't

play03:06

quite as close as we think there are

play03:08

maybe the reliability of agents still

play03:12

hasn't been solved quite that well yet

play03:14

and software Engineers have nothing yet

play03:16

to fear because that specific skill set

play03:18

is still Irreplaceable things like Devon

play03:21

AI will be an assistant a very important

play03:23

assistant that's going to allow them to

play03:25

build more do a lot of the boring tasks

play03:28

just kind of improve their product ity

play03:30

instead of actually replacing that work

play03:32

now me calling him agent shider is a

play03:34

joke cuz he looks like Hank shider from

play03:36

Breaking Bad I don't actually know his

play03:38

real name also another company is Health

play03:41

ifim me who built snap using GPT for

play03:44

Turbo Vision to give users nutrition

play03:46

insights through photo recognition of

play03:47

foods from around the world TL draw we

play03:50

covered this in another video If you

play03:53

haven't played with this thing I highly

play03:56

highly suggest you do imagine something

play03:58

like Microsoft Paint where you just

play04:00

paint whatever you want on the screen

play04:02

right you paint buttons and you make

play04:05

little annotations you write out what

play04:06

you want then you click make it real and

play04:09

this thing makes it real it uses GPT 4

play04:12

Vision to take whatever you've just

play04:14

doodled and turn it into an actual

play04:17

version an actual software of of what

play04:20

you did so things I've tried for example

play04:22

drawing a game and then have it actually

play04:25

make that game Real Time take seconds to

play04:28

code that game up I mean there's simple

play04:30

games one of them was like running

play04:31

around chasing chickens in a little

play04:33

enclosure but you can do web pages you

play04:35

can do forms you can do tons and tons of

play04:38

stuff it is surprisingly good and I

play04:41

think something like this will be the

play04:42

future of user interface design just

play04:45

because of how quickly you can just get

play04:47

stuff out there kind of tested iterated

play04:49

Etc Google deep Minds image in to can

play04:52

now make little 4-second live images

play04:54

from a single prompt if you wanted to

play04:56

try the image in two just the regular

play04:58

image in two the image effects is

play04:59

probably the easiest way of doing it

play05:01

it's pretty good I was surprised I still

play05:03

prefer mid Journey but Google is getting

play05:05

very good at image generation in other

play05:07

news Google Cloud announces updates to

play05:09

Gemini imag in Gemma and mlops on vertex

play05:12

AI so Gemma has been improved the small

play05:15

open source model from Google kind of

play05:17

like the open source version of Gemini

play05:19

that might be an accurate way to

play05:20

describe it Gemini 1.5 Pro on vertex AI

play05:24

also supports processing audio inputs

play05:26

including music speech and even the

play05:28

audio portion of video it can give high

play05:30

quality transcriptions or be used to

play05:32

search and analyze multimodal content so

play05:35

in the Google Cloud you can find the

play05:37

vertx AI Studio Vision looks like you're

play05:40

able to generate images and you can

play05:43

request access here I don't have it yet

play05:46

but it looks like I have the Gemini

play05:49

experimental which is the default

play05:51

setting for me and then you do have this

play05:52

Gemini 1.5 Pro preview 0409 so I'm

play05:56

guessing that's April 9th today if a 1

play05:59

million context window so we might do a

play06:02

deeper dive into this but it is looks

play06:04

like it is available and looks like yeah

play06:05

it does have the 1 million context

play06:08

window which I have to say is kind of

play06:09

exciting as we've covered before the

play06:12

paper show that it's really good at

play06:13

doing for example finding the needle in

play06:15

the H stack so if you have a large

play06:17

document and need to find a specific

play06:19

thing in that document it will do so

play06:21

very well 1 million context window is of

play06:23

course massive and Gemini 1.5 Pro was

play06:27

very good at a number of tasks the jump

play06:30

from 1.0 to 1.5 was pretty massive I

play06:33

believe if I recall correctly the big

play06:35

change was they went to mixture of

play06:36

experts so kind of copying the GPT 4

play06:39

design and making kind of a big leap

play06:42

forward so we might test this out in a

play06:44

different video but if you wanted to

play06:45

play around here's how you can do it the

play06:47

Google cloud has a free trial it doesn't

play06:49

autocharge you at the end so if you

play06:51

wanted to jump in and uh mess around

play06:53

with it it's you can do so for free they

play06:56

have text to speech and speech to text

play06:58

hello there happy to be here the voice

play07:01

sound okay pretty good here's uh we're

play07:04

going to try to generate a caption for

play07:05

this lovely lovely thumbnail I made more

play07:08

agents is all you need with these uh

play07:10

wonderful little friendly little kitties

play07:12

caption reads four hairless cats are

play07:14

sitting at a table with the words more

play07:17

agents is all you need above them I mean

play07:19

that's pretty perfect they nailed it

play07:22

they're sitting at a table yeah I got to

play07:24

say this is excellent now we don't quite

play07:26

yet know what exactly has changed we're

play07:28

probably going to see how much better

play07:30

the model is once we start testing it

play07:32

once we start playing around with it it

play07:34

has been added to the Gladiator arena

play07:37

for llm chat Bots to see which one is

play07:40

the best so the added GPT 4 Turbo the

play07:43

409 version this is what that looks like

play07:46

you basically get assign to random

play07:48

models model a Model B you're not told

play07:50

what it is but you're able to put in

play07:52

your prompt and then see how well they

play07:54

respond I'm going to say right quick

play07:56

screenplay about Hank Shader and Walter

play07:59

White talking about using AI to do their

play08:01

jobs the scene is set at Walter's house

play08:04

in the backyard as they make barbecue

play08:07

and off they go Model A and model B

play08:11

Walter starts smirking you know Hank

play08:13

I've been thinking Hank raising eyebrow

play08:15

oh great another one of your brilliant

play08:17

ideas what's this one all right so

play08:18

that's a good start I got to say Walter

play08:21

leaning in what if we use AI to do our

play08:23

jobs Hank are you serious you want to

play08:25

replace me with a robot Walter no no not

play08:27

replace augment improve efficiency you

play08:31

know like using a computer to analyze

play08:33

data instead of sifting through mounds

play08:34

of paperwork okay I got to say so so far

play08:37

Model A whatever it is is excellent they

play08:39

really capture the Precision of the

play08:41

words that Walter uses the Simplicity of

play08:44

how Hank talks so Model A did an

play08:47

excellent job and model B I mean it's

play08:49

okay but it's very very basic right so

play08:53

it just kind of repeats some talking

play08:54

points about AI you know oh AI could

play08:57

write up reports for me and Walter is

play08:59

like but it might put you out of a job

play09:01

kind of simplistic not bad it's okay

play09:02

writing but a is significantly better

play09:05

and now they revealed that model A was

play09:07

open chat 3.5 and model B was CLA 3 High

play09:11

coup which is surprising let's take a

play09:13

look at the leaderboards so CLA 3 High

play09:15

coup the ELO rating the arena rating

play09:17

that they have for it is 1182 whereas

play09:21

open chat is much lower at well much

play09:24

lower on the rankings but not that much

play09:26

lower on the um actual the rating right

play09:29

so it's 1097 and currently we have CLA 3

play09:32

Opus as our reing King it has recently

play09:35

surpassed GPT 4 interestingly Bard from

play09:38

Gemini Pro is right behind basically the

play09:40

third model that's ranked right so we

play09:42

have Claude GPT 4 then we have Bard so

play09:44

it's number four but it's the third best

play09:47

model right if you look at GPT these two

play09:50

versions of GPT 4 as you know the same

play09:53

model then we have a new competitor the

play09:55

command R plus that we have to look into

play09:57

because this has been making a lot of

play10:00

waves a lot of people are questioning if

play10:03

it does indeed belong on here we'll do a

play10:05

full Deep dive into this later but the

play10:07

point is that very soon we're going to

play10:08

see the new model up here I guess it'll

play10:11

be called gp4 turbo

play10:13

20244 d09 and we'll be able to see

play10:17

exactly where it falls will open ey take

play10:20

back their crown and become the number

play10:22

one once again I mean I got to say here

play10:24

GPT 4 and Claw 3 Opus are neck to neck

play10:28

they're two points apart which you can

play10:30

say is not even I mean they're they're

play10:31

pretty much the same the difference

play10:33

might not be statistically significant

play10:35

with that said my name is Wes rth make

play10:37

sure you're subscribed think there's

play10:39

going to be some big news coming soon

play10:41

and thank you for watching

Rate This

5.0 / 5 (0 votes)

Related Tags
AI InnovationsCloud ComputingSoftware EngineeringImage GenerationDeep LearningAI AssistantsTechnology TrendsGoogle CloudAI DevelopmentFuture Tech