AI Realism Breakthrough & More AI Use Cases

The AI Advantage
16 Aug 202425:52

Summary

TLDRThis week's AI news focuses on hyperrealistic image generation with breakthroughs that impact e-commerce, as seen with platforms like Let's AI integrating it for virtual try-ons. The release of Grock 2 by Twitter, incorporating Flux's open-source model for uncensored image generation, is a highlight. Additionally, updates on language models like Chat GPT 4 and Google's new voice assistant, Gemini Live, are discussed. The script also touches on the implications of these technologies for redefining 'photo' and the potential for misuse, emphasizing the importance of education and ethical considerations in AI advancements.

Takeaways

  • 🌐 The script discusses a breakthrough in hyperrealistic image generation with AI, noting its potential impact on e-commerce and social media platforms.
  • 🎨 The release of Grock 2 is highlighted, which integrates the Flux model for image generation and stands out for its uncensored capabilities, except for nudity and other explicit content.
  • πŸ” Grock 2's integration with Twitter's data firehose is emphasized, allowing it to serve as a powerful Twitter search engine and providing real-time news and information.
  • πŸ“ˆ The script mentions the open-source nature of Flux, enabling users to customize and enhance the model with additional data, such as personal images or hyperrealistic photos.
  • πŸ›οΈ The potential use case of AI-generated images in e-commerce is explored, where customers can virtually try on clothes using AI, predicting a shift in online shopping experiences.
  • πŸ€– The script introduces the concept of 'Aura' for image models, which involves low-rank adaptation to improve the generation of specific types of images, like personalized content.
  • πŸ”‘ The importance of understanding code when using AI tools for code generation is stressed, to effectively handle and debug the generated code.
  • πŸ“’ A new Chat GPT model is quietly released within the Chat GPT app, optimized for chat interactions and dialogue, with minimal noticeable differences to the user.
  • πŸ—£οΈ Google's release of its voice assistant, Gemini Live, is critiqued as feeling like a beta release, lacking the advanced features and integrations expected.
  • πŸ•Ί The update to the Vigle app, allowing users to create dancing videos with two people, is presented as a fun and engaging use of AI technology.
  • πŸ’‘ Anthropic's announcement of prompt caching with Claude is highlighted, which could significantly reduce costs and latency, making it an exciting development for AI integration and conversational agents.

Q & A

  • What is the main focus of the video script?

    -The main focus of the video script is the recent advancements in AI, particularly in the area of hyperrealistic image generation and the integration of these technologies into various applications like e-commerce and social media platforms.

  • What is the significance of the breakthrough in hyperrealistic image generation mentioned in the script?

    -The breakthrough in hyperrealistic image generation is significant because it has led to the creation of images that are indistinguishable from real photos, which has implications for how we define and perceive 'photos' in the digital age.

  • What is the role of the Flux model in the recent developments?

    -The Flux model, developed by Black Forest Labs, plays a central role as it is an open-source, mid-journey level model that has been integrated into Gro and is capable of generating hyperrealistic images, which has sparked various adaptations and use cases.

  • What is Aura and how does it relate to the Flux model?

    -Aura stands for low-rank adaptation and is a technique where extra data can be added to an imaging model to train it to generate images with specific characteristics. In the context of the Flux model, it allows for the creation of hyperrealistic images with added realism through fine-tuning.

  • How does the script address the ethical concerns related to the generation of hyperrealistic images?

    -The script acknowledges the potential for misuse of hyperrealistic image generation, such as creating deep fakes, and emphasizes the importance of education to help people understand and protect themselves from these technological advancements.

  • What is the current state of AI-generated code and its usability?

    -AI-generated code has become more sophisticated, but the script points out that without a basic understanding of coding, users may struggle to utilize or debug the code effectively, highlighting the need for education in coding alongside AI tools.

  • What new features does the Gro 2 model offer compared to its predecessors?

    -Gro 2 offers integration with Twitter's data firehose, providing real-time access to news and opinions from Twitter, and improved capabilities in image generation using the Flux model. It also has a more relaxed content policy compared to some other models.

  • What is the significance of the new Chat GPT model release mentioned in the script?

    -The new Chat GPT model release is significant as it has been optimized for chat conversations, offering a more interactive and dialogue-focused experience for users, and is already integrated into the Chat GPT app.

  • What are the limitations of Google's new image generator compared to Flux or Mid Journey?

    -While Google's new image generator is an improvement over their previous efforts, it does not match the level of detail and realism offered by Flux or Mid Journey, which are considered to be the current benchmarks in the field.

  • What is the potential impact of prompt caching with Claude (CLA) on AI-generated content?

    -Prompt caching with CLA can significantly reduce costs and latency, making it more feasible to integrate complex personas into AI models. This could lead to faster and more cost-effective generation of content that requires contextual understanding.

Outlines

00:00

🎨 Hyperrealistic Image Generation Breakthroughs

The script discusses a significant advancement in hyperrealistic image generation, highlighting the integration of the Flux model by Black Forest Labs into Grock 2, an AI model. The open-source nature of Flux allows for customization, such as with Aura, which enhances the model's realism by training it with additional images. The summary touches on the implications of this technology, including its potential to redefine the concept of a 'photo' and its application in e-commerce for virtual try-ons, as well as the ethical concerns regarding the generation of politically sensitive images or deepfakes.

05:01

πŸ› οΈ Democratization of AI-Generated Code and Education

This section addresses the increasing capability of AI to generate code and the challenges faced by users who lack the knowledge to debug or utilize the generated code effectively. The script promotes Brilliant.org as a valuable resource for learning programming, offering hands-on courses that can enhance users' ability to work with AI tools. It emphasizes the importance of understanding code to leverage the full potential of AI-generated outputs.

10:02

πŸ“ˆ Updates on Large Language Models (LLMs) and Their Applications

The script provides an overview of updates in the LLM space, focusing on the release of Grock 2, which is integrated with Twitter's data firehose, allowing it to provide real-time news and information. It compares Grock 2 with other models like Anthropic's Sonet 3.5 and discusses the practicality of these models in various use cases, such as research and browsing. The summary also mentions the release of a new model by Chat GPT and the significance of these updates in the context of AI advancements.

15:04

πŸ”Š Google's Voice Assistant and Its Reception

The script discusses Google's release of a voice assistant, Gemini Live, which offers voice input and output capabilities for Android users. It contrasts the expectations set by the hype around the product with the actual functionality, which is currently limited to basic voice interactions without advanced features like voice modulation or multimodal capabilities. The summary includes feedback from a user experience, highlighting issues with the interrupting feature and the overall impression of the product being a beta release.

20:05

πŸ•Ί Fun Application Update: Vigle App for Dancing Videos

The script introduces an update to the Vigle app, which now allows users to create dancing videos featuring two people. The summary demonstrates the app's functionality by creating a Matrix-themed video with Tyrion Lannister, showcasing the app's potential as a fun communication tool for friends and family.

25:05

πŸš€ Prompt Caching with Claude: Efficiency and Cost-Effectiveness

The script highlights a new feature from Anthropic called prompt caching, which significantly reduces costs and latency when integrating complex personas into Claude's AI model. The summary explains the practical benefits of prompt caching, such as the ability to handle long contexts and multi-shot prompts efficiently. It also expresses a desire to investigate the feature further and understand its limitations compared to fine-tuning.

🌐 Community Engagement and Future AI Developments

The script concludes with an announcement about the restart of the presenter's LLM Innovations event series, which will delve into topics like prompt caching and the best practices for using AI tools. The summary emphasizes the value of community engagement in exploring AI advancements and the presenter's commitment to sharing evidence-based insights and experimental results.

Mindmap

Keywords

πŸ’‘Hyperrealism

Hyperrealism in the context of this video refers to the generation of images that are incredibly lifelike, to the point where they are difficult to distinguish from actual photographs. It is a significant theme of the video, as advancements in AI have led to breakthroughs in this area, with applications in e-commerce and other industries. The script discusses the impact of hyperrealistic image generation on our understanding of what constitutes a 'photo'.

πŸ’‘Image Generation

Image generation is the process by which AI algorithms create new images based on given prompts or data inputs. It is central to the video's narrative, as the script explores the latest developments in AI-driven image generation, particularly with the release of models like 'flux' and its integration into platforms like 'Grok 2', which are pushing the boundaries of realism in AI-generated images.

πŸ’‘Flux

Flux is an open-source AI model mentioned in the script that has contributed to the advancements in hyperrealistic image generation. It is significant because its open-source nature allows for community contributions and adaptations, such as the addition of 'Laura' for low-rank adaptation, enhancing the model's ability to generate highly realistic images.

πŸ’‘Grok 2

Grok 2 is an AI model discussed in the script that integrates with the flux model for image generation. It is notable for its uncensored nature, allowing the generation of a wide range of images, and for its integration of Twitter data, providing a rich source of information for the model to draw upon. The script highlights its capabilities and potential use cases.

πŸ’‘E-commerce

E-commerce is used in the script to illustrate a practical application of hyperrealistic image generation. The video discusses how AI-generated images can be used in online shopping environments, allowing customers to 'try on' products virtually, which could revolutionize the way people shop online.

πŸ’‘Aura (Low Rank Adaptation)

Aura, or low-rank adaptation, is a concept introduced in the script that refers to the process of training an AI model with additional data to improve its performance in specific areas. In the context of image generation, Aura allows models like flux to generate images that are more personalized or realistic, based on the additional data provided.

πŸ’‘Deepfakes

Deepfakes are synthetic media in which a person's likeness is swapped with another's using AI. The script mentions 'Deep live cam', a tool that generates deepfakes in real-time, which raises questions about the authenticity of digital media and the potential for misuse, emphasizing the need for public awareness and understanding of AI capabilities.

πŸ’‘LLM (Large Language Models)

Large Language Models, or LLMs, are AI models capable of understanding and generating human-like text. The script discusses several LLMs, including Gro 2, Chat GPT, and Sonet, highlighting their updates, capabilities, and use cases, such as research, writing, and code generation.

πŸ’‘Chat GPT

Chat GPT is an AI chatbot platform mentioned in the script that has released an updated model for better dialogue interaction. It is part of the ongoing discussion about the evolution of AI and its increasing ability to mimic human conversation and provide assistance in various tasks.

πŸ’‘Anthropic

Anthropic is an organization that develops AI models, such as Sonet, which is mentioned in the script for its excellence in code generation. The company's approach to AI development and its focus on creating human-like responses are highlighted, contrasting with other models like Chat GPT.

πŸ’‘Prompt Caching

Prompt caching, introduced by Anthropic in the script, is a technique that saves context into a cache memory, reducing API costs and latency when using LLMs. This innovation is positioned as a significant advancement, allowing for faster and more cost-effective integration of AI models in various applications, though the script also expresses a desire to understand potential downsides.

Highlights

Hyperrealistic image generation has seen a breakthrough with practical use cases emerging in e-commerce.

Grock 2 release includes image generation 2 from the Flux model, which is open source and has been integrated into Gro.

Flux model allows for the generation of political figures and copyrighted materials, pushing the boundaries of image generation capabilities.

Aura, or low rank adaptation, enables customization of the Flux model with additional data for personalized image generation.

The concept of 'photo' is being redefined as AI-generated images become indistinguishable from real-life photos.

Small companies and Indie hackers are finding innovative use cases for hyperrealistic image generation in e-commerce and beyond.

Grock 2's integration with Twitter data provides a powerful search engine and news reference tool.

New advancements in AI tools for code generation require a basic understanding of coding to effectively utilize and debug the generated code.

Chat GPT 40's latest model release has been optimized for chat conversations and integrated into the Chat GPT app.

Google's new image generator, while an improvement, does not compare to the capabilities of Flux or Mid Journey.

Google's Gemini Live is a voice assistant with integrated functionalities but lacks the advanced features teased in previous releases.

Vigle app update allows for creating dancing videos with two people, offering a fun way to communicate.

Anthropic's prompt caching with Claude AI reduces costs and latency significantly, making complex conversational agents more accessible.

The community-driven llm Innovations event series is restarting, focusing on in-depth exploration of AI advancements.

AI developments are accelerating, with new models and features being released that have practical implications for various industries.

Transcripts

play00:00

okay listen so this week in news we can

play00:02

use is quite different than the usual

play00:04

weeks as you know every week me and the

play00:06

team go in and pull together all the new

play00:08

AI releases research them test them for

play00:10

you and then in this video I present you

play00:11

all the results and usually we start

play00:13

with chbt upgrades llm upgrades but this

play00:16

week I want to lead with Hyper realistic

play00:18

image generation because I think we

play00:20

literally had a breakthrough in this

play00:21

space and the first actual use cases

play00:23

like e-commerce are already popping up

play00:25

so I'm very excited to bring you a

play00:27

packed week of news you can use although

play00:29

it is mid August we're going to be

play00:30

covering hyperrealism and its use cases

play00:33

what happened there since last week but

play00:34

there's also a new cat GPT model that

play00:36

ranks number one above everything else

play00:38

now which is already inside of cat GPT

play00:40

and we have Google releasing a voice

play00:42

assistant of their own that you can

play00:44

actually use on your phone the top

play00:45

comment on last week's video was already

play00:47

saying that it feels like we're

play00:48

beginning to accelerate from Frisco

play00:50

fatsis and I can only agree last week

play00:53

was intense but it feels like we're

play00:54

entering a whole new era and I'm not

play00:56

just saying that lightly let me prove my

play00:58

point here by showing you this week AI

play01:00

news and you can actually use starting

play01:02

out with the grock release and this is

play01:05

linked to this hyper realism story

play01:07

because grock 2 has released and it

play01:09

includes image generation 2 and the

play01:11

image generation is from the flux one

play01:13

model by black forest Labs that we

play01:16

covered last week and that's where I

play01:17

want to start we're going to talk about

play01:19

grock 2 the llm and how it compares to

play01:21

other llms later on when we talk about

play01:23

that but the fact is that because flux

play01:25

is open source we covered that last week

play01:26

if you haven't seen that check it out

play01:27

it's a legit breakthrough to get a mid

play01:29

journey level model model that is open

play01:31

source and people can build up on and in

play01:32

the next few minutes I'll show you why

play01:34

but the point is that it's integrated

play01:35

into Gro and Gro already shipped it's

play01:37

here this is grock 2 mini there's the

play01:39

larger model again we'll talk about that

play01:41

later but you already have this flux

play01:43

integration in here and it's quite

play01:45

unhinged like not completely uncensored

play01:48

okay so you can't do nudity and things

play01:50

like that but you can generate political

play01:52

figures and compromising situations and

play01:54

you can also generate all sorts of

play01:56

copyrighted materials like company logos

play01:59

this is a generated right here I made it

play02:01

live so fair enough now one of the most

play02:03

popular social media platforms on planet

play02:04

Earth can generate copyrighted materials

play02:07

or political images like this one and

play02:09

all sorts of other weird stuff that is

play02:11

related to politics and tragedies and

play02:13

sometimes combining the both I don't

play02:14

even want to show that stuff in this

play02:15

video the point is it's quite unhinged

play02:17

but that's where the story only begins

play02:18

because flux is open source so people

play02:20

can do all sorts of stuff with it and if

play02:22

you've seen last week's video my review

play02:24

of it was wow it's really good it's Best

play02:26

in Class A text generation in

play02:28

hyperrealism it's quite good but M

play02:30

Journey still King but that was last

play02:31

week because people have done a lot work

play02:33

since then and the fact that it is open

play02:35

source allows for something that is

play02:37

called Aura and if you're not familiar

play02:38

let me introduce you to the concept of

play02:40

Aura for a second Laura basically stands

play02:42

for low rank adaptation and what that

play02:44

means in human terms is that you can add

play02:46

extra data to the Imaging model in

play02:49

Practical terms you can add images of

play02:50

yourself and then train the model to

play02:52

generate images of you or you could add

play02:54

a whole bunch of hyper realistic images

play02:56

that look really crisp and super

play02:58

realistic real photos and then the model

play03:00

will be able to pick that up and that's

play03:02

exactly what people have been doing and

play03:03

that's why we have various offshoots of

play03:05

this flux model now because it is open

play03:06

source and you can do things like

play03:08

combine it with luras and we get

play03:09

something like flux def realism which is

play03:12

basically the flux model with a realism

play03:15

Laura attached to it now running this is

play03:17

not free it costs a few cents you need

play03:19

to sign in with giab on this replicates

play03:21

base I'll just briefly do that and also

play03:23

I should note that what we learned since

play03:24

last week's testing is that the

play03:26

prompting is a little more intricate

play03:27

with flux so you need to be using a

play03:29

promp generator or be very detailed in

play03:32

your promptings a lot of the simple

play03:33

prompts that might reduce stunning

play03:35

images in my Journey won't work as well

play03:36

in flux but before we even get into this

play03:38

app I want to address the question of

play03:40

like okay so like hyper realistic images

play03:42

why should I even care like image

play03:44

generation is really good but I have no

play03:46

use case for it either in my work or my

play03:48

everyday life and to that concern that

play03:50

is very common by the way these days I

play03:52

would say fair enough I for myself found

play03:54

this use case of creating these amazing

play03:55

custom thumbnails of me in various

play03:57

situations a lot of the times but most

play03:59

people don't really have a use case but

play04:01

what you do have is the fact that the

play04:03

word photo is kind of a term that

play04:06

everybody uses and everybody has a fixed

play04:08

definition of that now and the point of

play04:10

this might not even be a use case it

play04:12

might be the fact that you need to

play04:13

change your vocabulary or change the

play04:15

definition of what you consider a photo

play04:17

because what we're about to generate

play04:19

with this flux Dev realism model here is

play04:21

indistinguishable from real life like

play04:24

literally and I don't mean sort of

play04:25

indistinguishable if you would see this

play04:27

image and let's say National Geographic

play04:29

no nobody would be able to tell not even

play04:31

a trained eye the fingers are perfect

play04:33

the skin texture the beard the focal

play04:36

plane it's all just like a real photo

play04:39

just like this other images here and

play04:40

what I hope that this segment here in

play04:42

this video does is that you might start

play04:43

questioning what even a photo is because

play04:46

up until now a photo is a moment in real

play04:48

life that was captured through a camera

play04:50

whether that was done back in the day

play04:51

with film or through digital everybody

play04:54

agreed on what a photo is but now also

play04:56

this is a photo and this is not real

play04:57

life and sure you might argue that

play04:59

photoshop took in that direction already

play05:00

but that was still a skill that was hard

play05:02

to access now it really gets

play05:03

democratized like Heck if you're

play05:05

watching this video you can just log in

play05:06

here add a few sense to your replicate

play05:08

account and go ahead and run this

play05:10

yourself like so and all of a sudden you

play05:13

can generate all sorts of fake images

play05:15

but that's only my first point the

play05:17

second point is actually use case

play05:19

related because okay sure we might have

play05:20

to redefine what we perceive as real

play05:22

when we see digital imagery from here on

play05:24

out and there you go this is the

play05:25

generation so the eyes is a little weird

play05:27

no problem I'll just rerun it and

play05:29

another 4 seconds we'll have another

play05:30

alternative cuz that's how simple this

play05:32

is but then certain small companies and

play05:34

Indie hackers already found use cases

play05:36

for this in the real world and they're

play05:38

first because they're the most agile

play05:40

right a big Corporation is going to take

play05:41

12 to 24 months to actually implement

play05:43

this meaningfully but this is the moment

play05:45

where that process begins this is sort

play05:47

of the Tipping Point of realism cuz this

play05:48

model is open source look at that this

play05:50

one is Flawless except of maybe this

play05:52

little text piece the text is right the

play05:54

background this could be from any

play05:55

conference so what did these small teams

play05:57

or individuals find well I have two

play05:59

examples here one of them is called

play06:00

let's Ai and they basically plugged in

play06:02

flux into their product that allows

play06:04

people to try on various clothing in an

play06:06

online setting and keep in mind this is

play06:08

just the first version of it look this

play06:09

is lonus trying on rayb bands from some

play06:11

e-commerce store without actually trying

play06:14

them on same example with a Monclair

play06:17

jacket like so so it's quite easy to

play06:19

imagine a future where online shopping

play06:20

turns into hey upload five images of

play06:23

yourself and then here's the product

play06:24

catalog with you actually wearing the

play06:26

products I mean that will convert so

play06:28

much better than you just seeing a image

play06:30

of some random model wearing it that

play06:32

might have a completely different body

play06:33

type than you so that's one very

play06:35

interesting use case and the Second Use

play06:36

case actually relates to what Peter

play06:38

levels here on X has been experimenting

play06:41

with he's a popular Indie hacker that is

play06:42

always up to some new project and right

play06:44

now he's playing with flux and he added

play06:46

his own Laura to the model and here you

play06:48

can see he generated himself in four

play06:51

different Generations which is

play06:53

interesting but I think even more

play06:54

interesting than that he actually did a

play06:56

little pipeline where he generated an

play06:57

image with flux and then fed it to link

play06:59

to generate the AI YouTuber that looks

play07:02

hyper real and the video aspect here is

play07:04

really the next step but we'll cover

play07:06

that once it's relevant for now

play07:07

character consistency and the lip

play07:09

syncing is just not there yet but hyper

play07:11

real images are with these Fluxx models

play07:12

that we just covered here and just to

play07:14

round out this segment I want to just

play07:15

point your attention towards this GitHub

play07:17

report that popped up over the last week

play07:19

it's called Deep live cam and in case

play07:21

you haven't seen this yet it's very

play07:23

simply described you basically can

play07:25

install this locally and with one image

play07:27

it creates deep fakes of anybody and it

play07:29

creates a webcam image that you could

play07:31

then feed to zoom or Google meets or

play07:34

whatever you might be using and all of a

play07:35

sudden you could potentially Get Fooled

play07:37

by somebody using something like this

play07:39

into thinking that you're talking to

play07:40

somebody else so this is why I wanted to

play07:42

feature this first because these

play07:43

incremental AI advancements often seem

play07:45

meaningless like okay new model who

play07:47

cares I'm not going to be using it but

play07:49

in a case like this I want you to think

play07:50

about what this means for the current

play07:53

digital world and for things that we

play07:54

take for granted like if a family member

play07:56

sends you image you don't question if

play07:58

that's a real image or if they

play07:59

Photoshopped it right with technology

play08:01

like this being accessible inside of

play08:03

WhatsApp Instagram their models are not

play08:05

so good but now with Twitter RX

play08:06

integrating flux into their platform

play08:09

it's just a question of weeks or months

play08:11

until this is widely available to

play08:13

billions of people and not just people

play08:14

who watch this videos and use something

play08:16

like replicate or premium subscribers on

play08:18

X as it is now and one more thing before

play08:20

we move on consider sharing this video

play08:22

with a loved one because no matter how I

play08:24

look at this education is the only way

play08:26

that I can see on how to protect

play08:28

yourself from these technological

play08:30

advancements and these potentially

play08:31

malicious use cases and then on the

play08:33

bright side there will probably

play08:34

transform Ecom very soon here and that

play08:37

should be relevant to everyone involved

play08:39

with marketing or entrepreneurship in

play08:41

any sense so more and more AI tools are

play08:43

becoming incredible at generating code

play08:45

which is great unless you don't know

play08:48

what to do with it we've seen a lot of

play08:49

people recently hop into something like

play08:51

Sonet 3.5 by anthropic that is really

play08:53

good at generating code and they ask it

play08:55

to generate something like a snake game

play08:57

just to get an error which they don't

play08:58

know how to resp solve and they

play09:00

completely hit a brick wall and that's

play09:01

why having at least a little bit of

play09:03

understanding of how code works is

play09:05

really beneficial while trying to

play09:07

utilize the latest AI tools and one

play09:09

fantastic resource that you can use to

play09:11

get up to speed on how to get these

play09:13

Basics under your belt is brilliant.org

play09:15

the sponsor of today's video they have

play09:17

beginner level courses to teach you all

play09:19

the basics but then they also have more

play09:21

advanced courses like this one called

play09:23

designing programs that can really take

play09:24

your coding skills to the next level

play09:26

here you can actually learn how to build

play09:28

games and apps that respond to live user

play09:30

input it also teaches you how to

play09:31

properly check for errors and debug if

play09:33

problems come up and by the way that's a

play09:35

skill that's really useful when working

play09:37

with AI tools because they do a lot of

play09:39

the writing for you it's just that bugs

play09:41

make their way into the code sometimes

play09:43

and you need to know how to deal with

play09:44

that one thing that I really like about

play09:45

brilliant is that you're always Hands-On

play09:47

building something or interacting with

play09:49

exercise you're never forced to sit for

play09:51

an hourong lecture on something that you

play09:52

really don't care about traditional

play09:54

education anyway if you really want to

play09:56

level up your own skill set and take

play09:58

full advantage of the tools of available

play09:59

to you today head on over to

play10:00

brilliant.org or click the link in the

play10:02

description to try it for free for a

play10:03

full 30 days if you decide to stick with

play10:05

it you'll get 20% off an annual

play10:07

subscription a big thank you to

play10:08

brilliant for sponsoring this video and

play10:10

now let's get back to some AI news you

play10:11

can use okay and now it's time to talk

play10:13

about llms I'm I put my headphones on

play10:15

here to get a little more serious about

play10:17

this because there has been quite a few

play10:19

updates and I'm going to keep it short

play10:21

I'm not going to go too deep I don't

play10:22

think we had anything that is like a

play10:23

complete Game Changer this week I would

play10:25

tell you that but we did have various

play10:27

releases from well on one side x/

play10:30

Twitter with groc 2 and then we have a

play10:31

brand new model out of chat GPT the chat

play10:34

GPT 40 latest

play10:38

22488 release you can find that in the

play10:40

new cat GPT app2 and then there was this

play10:42

entire story that unfolded with this new

play10:44

model called sus column r that popped up

play10:47

on LMS Ys Arena and it ranked really

play10:50

high nobody knew what it was people were

play10:51

rumoring that it's a new chb model but

play10:53

now it has been revealed that it was

play10:55

actually the grock 2 Beta release okay

play10:57

and this was the proper grock 2 model as

play10:59

of right now at least for me and the

play11:01

people that I know when you go to x you

play11:03

can only access the grock 2 mini model

play11:06

which is sort of like GPT for om mini so

play11:08

what is unique about grock 2 and what is

play11:11

new about the new cat GPT model well

play11:13

first of all the story really begins

play11:14

with this chatbot Arena here because as

play11:16

I told you this new model sort of just

play11:18

popped up out of nowhere and was ranking

play11:20

really high and this seems to be the new

play11:22

default way like a lot of these

play11:23

companies like open AI now also X test

play11:26

their new models because it's a great

play11:27

way to get it into users hands to get

play11:29

some feedback on how users actually use

play11:32

it and enjoy it without revealing the

play11:34

model so they're released under

play11:35

Anonymous names and this is also how a

play11:37

few recent openai models were introduced

play11:39

and one comment on chatbot Arena

play11:41

actually made a mistake last week that I

play11:42

want to correct this week thank you so

play11:44

much influential studio for this comment

play11:46

here on the video pointing out that when

play11:47

you vote on chatbot Arena and you can

play11:50

see which model you're voting on those

play11:51

votes don't actually count and only

play11:53

counts the anonymous votes that makes a

play11:55

lot of sense as this ranking here is

play11:57

fully user voted so for example in this

play11:59

view where you can actually see what

play12:01

you're comparing these votes do not

play12:03

count only the ones from the arena here

play12:05

actually count where the models are

play12:07

Anonymous just wanted to correct that

play12:09

but back to Gro 2 so it has released and

play12:12

what's the story here well it's a top

play12:13

tier model it's a GPD 4 level model that

play12:15

is not quite Best in Class at anything

play12:17

in particular but it's really

play12:18

well-rounded and the biggest selling

play12:20

point here is the following it's plugged

play12:22

into all of the Twitter data the full

play12:24

Twitter fire Hol all of the news story

play12:27

all of the opinions that go down on

play12:29

Twitter day-to- day they are being

play12:30

infused into the model so you can use it

play12:32

for some use cases that require browsing

play12:35

and that don't work as well with other

play12:37

models like what are the top news

play12:39

stories relating to AI for today and

play12:41

keep in mind this is the Mini model this

play12:42

is not the grock 2 main model that we

play12:44

are looking at here by the way while

play12:46

this generates this will take a few

play12:47

seconds CU it does need to look at the

play12:49

Twitter API and all the data there but

play12:51

as of these released benchmarks it's

play12:53

interesting how they structured it and

play12:54

it's a bit deceptive so I want to

play12:56

clarify this because it Compares it to

play12:57

the turbo model or Gemini 1.5 Pro and

play13:00

some of these more competitive models

play13:01

like claw 3.5 Sonet or llama 3E 405b are

play13:05

all the way on the right and the reason

play13:06

I say that is because for example gbd4

play13:08

turbo this is the release that happened

play13:10

right after the voice assistant

play13:12

announcement back in May and back at

play13:14

that point a lot of people argue that

play13:16

GPT 4 is actually worse than GPT 4 the

play13:19

benchmarks were slightly better than GPT

play13:21

4 but the point is this is not the

play13:22

fairest comparison and they're right

play13:24

next to each other so don't take this

play13:26

Delta too seriously what you really want

play13:27

to compare is Sonet over here here with

play13:29

grock 2 because what you really want to

play13:31

be looking at is Sonet over here and

play13:32

llama 405b those are the most up-to-date

play13:35

versions again this gbt 40 and turbo

play13:36

models are back from May says that down

play13:39

here and if you compare to something

play13:40

like Sonet it actually loses out on all

play13:42

benchmarks closely but it's worse except

play13:44

of MAF Vista over here but again as I

play13:46

always see these minimal differences in

play13:48

benchmarks are not a game changer what

play13:50

matters is how it performs in practice

play13:52

and O okay right now it's actually still

play13:55

loading which is a little weird I'll

play13:57

regenerate this I got to say during my

play13:58

testing of this before the recording of

play14:00

this video this went actually super

play14:01

smoothly but there you go on second try

play14:04

it just gets the stories right away and

play14:06

as you'll see at the bottom it will

play14:07

reference the tweets it pulled it from

play14:09

so this is actually a fantastic Twitter

play14:11

search engine and I think that is the

play14:13

main use case here this combination of

play14:14

the Twitter data fire host and having an

play14:16

llm that has access to all of it is

play14:18

actually quite powerful and as you can

play14:19

see it talks about the search GPT

play14:21

prototype and Google's AI Integrations

play14:23

with the pixel 9 and of course the grock

play14:25

2 launch so this is a fantastic use case

play14:28

that you can be using today if you're

play14:29

subscribed to X premium which I believe

play14:32

in Europe comes in at 10 a month it's

play14:34

actually 860 yeah that's correct if you

play14:36

go to monthly and then Premium Plus is

play14:38

20 but Gro already comes with this so

play14:40

yeah this is a paid thing but it's just

play14:42

a brand new way to use Twitter and then

play14:44

of course it also has the image

play14:46

generation features with flux that we

play14:48

talked about in the first part of the

play14:49

video but there's even more here because

play14:50

as it showed what's coming down the line

play14:52

is also some multimodal capabilities

play14:54

where it has Vision capabilities they'll

play14:55

be offering Enterprise API which could

play14:58

be an interesting way I mean they'll

play14:59

give you access to llm that has all of

play15:01

the world's knowledge that is in Twitter

play15:03

hm time will show what that will be used

play15:05

for and what's my personal first

play15:06

impression of groc and its outputs well

play15:09

it's good it's certainly usable and if

play15:10

you only want the text generation

play15:12

features then it can certainly act as a

play15:14

replacement to chat GPT now I personally

play15:17

and many others still prefer anthropics

play15:20

voice it's just more human and less

play15:21

robotic like chat GPT but this is decent

play15:24

but again I didn't get my hands on the

play15:25

full grto version so I can't really give

play15:28

my full opinion but I also say this it

play15:30

lacks the tooling just some of these

play15:31

other tools also do file uploads a

play15:33

functional mobile apps gpts I use these

play15:35

things all of time now I do consider

play15:37

myself a power user but still if you use

play15:39

those features maybe code interpreter or

play15:41

image input then you won't have those in

play15:43

here so what should you use well let me

play15:45

sum it up briefly as of 15th of August

play15:47

2024 as a general purpose AI assistant

play15:51

chat GPT still is best because of all

play15:53

the functionality that I just named when

play15:55

it comes to writing tone though

play15:56

anthropic Sonet 3.5 is my go-to when it

play15:59

comes to code generation specifically

play16:02

also Sonet 3.5 Head and Shoulders above

play16:04

everybody else right now but when it

play16:06

comes to research perplexity is your

play16:08

friend and when it comes to actually

play16:09

using llms with live data well I think

play16:12

grock actually sort of takes the crown

play16:14

here because it is plugged into all the

play16:16

Twitter data and it references all of

play16:17

that and as Twitter is the place where

play16:19

news breaks first I mean heck a lot of

play16:21

this video is just me and the team

play16:22

spending every single day on Twitter and

play16:23

pulling everything together and then me

play16:25

digesting it for you well Gro can sort

play16:27

of do that already too so that would be

play16:29

the one use case where this really

play16:31

stands out and also one more thing is

play16:33

that Gro is actually sort of uncensored

play16:36

and by sort of I mean again it's the

play16:38

same thing as with flux it won't produce

play16:39

R-rated content but it doesn't have

play16:41

problem with cuss wordss or things that

play16:43

are in like ethical gray area anthropic

play16:45

is on the other side of the spectrum

play16:47

they're extremely strict and CH is quite

play16:49

restrictive but not as much as anthropic

play16:51

anthropic is really extreme and Gro on

play16:53

the other hand doesn't have a problem

play16:54

with profanities and now moving forward

play16:56

there's actually also a brand new Chad

play16:58

GPT model and this is not just the API

play17:01

this has actually been integrated into

play17:02

cat GPT that you might be using every

play17:05

single day because if you look at this

play17:07

tweet from the official chat GPT app

play17:09

there's a new GPT 4 model out in cat GPT

play17:11

since last week hope you are all

play17:12

enjoying it and check it out if you

play17:13

haven't we think you like it and the

play17:15

funny thing is nobody really noticed

play17:17

that's how minuscule the differences

play17:18

between the models are these days they

play17:20

ship a new thing and they have to

play17:22

announce that there's some update

play17:23

because nobody has noticed otherwise but

play17:25

yeah this a slightly upgraded model

play17:27

apparently the biggest difference is and

play17:29

how it handles chat conversations so it

play17:31

has been optimized to interact with

play17:32

users in a dialogue and there's also a

play17:35

brand new API endpoint for people to use

play17:37

but it's funny cuz the dev account still

play17:38

says hey if you're a Dev you probably

play17:40

still want to use the 0806 API endpoint

play17:44

not this latest one that was released

play17:45

for chat GPT that one is just best for

play17:47

chat use cases so there you go a minor

play17:49

update on that front if you were

play17:50

confused by this I will be reporting

play17:52

back once full grock 2 comes out and

play17:54

I'll get to test it a little bit more

play17:56

for my personal use cases for now I only

play17:58

have the min version so moving on to the

play17:59

next story here is a few releases out of

play18:01

Google one of them is image and free and

play18:03

you know I'll keep this as short as

play18:04

possible it's a good image generator

play18:06

it's their best image generator but

play18:07

compared to something like flux or M

play18:09

Journey it just doesn't hold up it does

play18:11

text well but so do others and their

play18:13

open source but yeah it's better than

play18:14

anything that Google has done before

play18:16

with image generation and it will be

play18:17

introduced into their hardware and their

play18:19

software offerings just like this second

play18:22

announcement which might be more

play18:23

interesting here this is Gemini live

play18:25

okay and this is the voice assistant

play18:27

that open AI promised but for Google or

play18:29

is it because the reality of this

play18:31

product is probably the biggest Delta

play18:33

between what some people hyped it up to

play18:35

be and between what it actually is

play18:37

because the reality of it is yes it is a

play18:39

voice assistant and yes it also already

play18:42

shipped Android users already have this

play18:45

on their phone I'm an Apple user but I'm

play18:47

lucky enough that team member Daniel

play18:48

actually went ahead and gave this a shot

play18:50

and tested it and I'll just quote some

play18:52

of the pointy forwarded here to me keep

play18:54

in mind that this is coming from an

play18:55

angle where we're comparing it to what

play18:57

the voice assistant PR and what is

play19:00

available in the open AI app today

play19:02

because if you're not familiar there's a

play19:03

voice assistant already it might not be

play19:05

the sophisticated one with the voice

play19:06

changes that you can interrupt and the

play19:08

multimodal capabilities but there's a

play19:10

voice assistant you can use the voice

play19:11

function to talk to chat GPT right uh

play19:13

quick spoiler that's what this really is

play19:15

Google shipped a voice input and output

play19:18

function that you can also interrupt but

play19:20

it's not great okay so what's the review

play19:22

well apparently the Gemini live voice

play19:23

assistant feels more like a Beta release

play19:25

than something that is actually on the

play19:27

level of The Voice Voice Assistant

play19:29

teased by open ey the voices are good

play19:31

how can I help you today but so are cat

play19:33

gpt's voices today there's no voice

play19:35

modulation and no multimodal

play19:36

capabilities like using the camera to

play19:39

actually infer context and to use it as

play19:41

this advanced voice assistant what it

play19:43

does have is the ability for you to

play19:45

interrupt it which is actually my

play19:46

biggest gripe with the current version

play19:48

of the chat GPT voice features but the

play19:50

problem is it's not great and Daniel

play19:52

reported back that if he has a speaker

play19:54

volume on the phone over 75% The Voice

play19:57

Assistant actually starts interrup in

play19:59

itself cuz it here's the output and then

play20:01

it stops I think you get the point CET

play20:02

GPT never does that and because of this

play20:04

he concluded that the interrupting

play20:06

feature is something that's currently

play20:07

more annoying than useful not sure they

play20:09

can fix that over time but again it just

play20:10

goes to underline this point that it

play20:12

feels like something a little Half Baked

play20:13

that was maybe rushed out but not to

play20:15

bash it too hard what it does have is

play20:17

access to Integrations like your Google

play20:19

calendar or your Gmail and then you can

play20:21

interact with those on your Android

play20:22

phone that is absolutely fantastic and

play20:24

something you cannot get inside of C GPT

play20:26

as of yet so there you go that would be

play20:28

my first little look at the voice

play20:29

assistant feature I do have to add that

play20:30

at the end of their presentation they

play20:32

showed that they're looking at this

play20:33

advanced multimodal voice assistant in

play20:36

the future right but as of what's

play20:38

released today it's just voice input and

play20:40

output with Gemini which is a nice to

play20:42

have but that doesn't mean they're

play20:43

pulled ahead of open AI they just caught

play20:45

up and that's fine but let's call us Spa

play20:47

Spade if you have a different experience

play20:49

by the way please leave a comment below

play20:50

we would love to hear about it all right

play20:51

this is going to be a quick but fun one

play20:53

vigle the app that came out a few months

play20:55

ago that lets you put yourself or

play20:57

somebody else into dancing type video

play21:00

has a new update where you can actually

play21:01

do it for two people and I think that's

play21:03

sort of fun because you can use it as a

play21:04

fun way to communicate with friends or

play21:06

family I just briefly want to show it to

play21:07

you so if you go in here you can see the

play21:09

new update right here I just logged in

play21:11

with Google on a free account by the way

play21:13

you can just try this right away and if

play21:15

you head on over to this multi tab then

play21:17

you can pick a template I'm going to

play21:19

Simply take a matrix fight that sounds

play21:22

perfect

play21:29

all right use the template and now I can

play21:31

pick the two characters right so one

play21:32

character right here I'll just use the

play21:34

camera real quick okay ideally should be

play21:36

a full body photo but I'll just a selfie

play21:39

here and then as the second one how

play21:41

about a picture of Tyrion Lannister here

play21:44

because you guys seem to enjoy the Game

play21:45

of Thrones clip we did with the sponsor

play21:47

last week and that's it I'll just go

play21:48

ahead and generate and by the way this

play21:50

is not sponsor it's just an interesting

play21:51

release of a cool app let's see what we

play21:53

get

play21:56

here okay no way it's

play22:02

Tyrion this

play22:07

epic well this is as I thought this is

play22:11

really sort of funny and quirky and you

play22:13

could just download it with the

play22:15

watermark lights so no problem you could

play22:16

send it to somebody again this is just

play22:18

possible on the free plan sure they have

play22:19

other tiers I haven't even looked into

play22:21

them so far but there you go I found

play22:22

little use case cuz a I can be that too

play22:24

it doesn't always just have to be useful

play22:26

and productive and then last but

play22:27

certainly not not least there is a very

play22:29

very interesting release out of

play22:31

anthropic coming this week and this was

play22:32

sort of shocking to me prompt caching

play22:35

with CLA and what this essentially is

play22:37

that it saves context into a cache

play22:40

memory that goes along with the API with

play22:43

the Practical result of reducing costs

play22:45

up to 90% and latency by up to

play22:49

85% meaning you can integrate really

play22:51

complex personas into CLA and then call

play22:53

the API it will cost 90% less and it

play22:55

will be roughly 5 to 10 times as fast I

play22:58

mean that's a bold claim and reading

play23:00

through this this sounds really

play23:01

impressive all of these use cases

play23:03

conversational agents coding assistants

play23:05

anything that needs a little bit more

play23:06

context like for example here if you

play23:08

upload a book of 100,000 tokens into one

play23:10

of claude's models the latency without

play23:12

the caching would have been 11 seconds

play23:14

to get a response okay so you give it

play23:16

all this context then you ask something

play23:17

about the book and then it takes 11

play23:19

seconds to reply that's what this means

play23:20

in Practical terms and with the caching

play23:22

2.5 at a 90% cost

play23:25

reduction honestly this sounds a little

play23:27

too good to be true so I had a closer

play23:28

look at this and I have to be honest I

play23:30

have one single gripe of this which is

play23:32

like what is the downside here what is

play23:35

the negative part honestly this looks

play23:36

too good to be true they say it's still

play23:38

in beta they give you explanations on

play23:40

how it works and how it's priced one

play23:42

limitation is that it doesn't work on

play23:43

the Opus model but the Sonet model is

play23:45

best right now anyway there's even a

play23:47

prompt caching cookbook on their GitHub

play23:49

if you want to check that out and hey

play23:51

let me tell you what this just came out

play23:52

I didn't really have time to dive deep

play23:54

into this but over the weekend I'll be

play23:55

having a closer look at this and

play23:56

experimenting with this because again it

play23:58

just sounds too good to be true this is

play24:00

sort of like having rag up to a certain

play24:02

context limit I suppose but without the

play24:03

downside of the long loading times and

play24:05

the embeddings being created and

play24:06

retrieved and compared to just adding a

play24:08

lot of context which was the rag

play24:10

alternative it's way faster now too so

play24:12

that's amazing but I want to know what's

play24:14

downside here and what does this

play24:15

compared to for example fine tuning

play24:16

because they do say that one of the best

play24:18

use cases here is actually giving it

play24:19

multi-shot prompts into this cache and

play24:21

then it can consider that as extra

play24:23

context for the generations so I'll have

play24:25

to run a few experiments and I'll report

play24:26

back and if you're interested in this

play24:28

sort of topic I do want to point out the

play24:30

fact that I'm actually restarting my llm

play24:32

Innovations event series before this was

play24:34

called chat GPT Innovations and I used

play24:36

to do it every two weeks for a year

play24:38

straight for all course members and now

play24:40

I hold it in the community and it's a

play24:42

part of the membership and I'm going to

play24:43

hold it once a month we went with this

play24:45

image of eigor Einstein because this is

play24:47

going to be where I'll be presenting

play24:49

some of the experiments that I run to

play24:50

the community it's a long format usually

play24:52

the lecture takes about an hour and then

play24:54

we do a Q&A afterwards and the next

play24:56

session in September we'll be looking at

play24:57

when you should using prompts versus

play24:59

using a GPT versus using finetuning and

play25:01

apparently now I'll have to extend this

play25:03

with when should you be using prompt

play25:05

caching and all of the results I'll be

play25:06

showing there will be evidence-based and

play25:08

include results of the experiments we

play25:10

run internally with the team and then

play25:12

moving forward I'll do one of these a

play25:13

month as this has always been the most

play25:15

popular format within the community so

play25:16

we do a lot of things there but I

play25:17

thought I'd tell you about this one and

play25:19

as I pointed out many times that's the

play25:20

whole idea behind the community we can

play25:21

go deep on single topics rather than

play25:24

doing what we do on YouTube which is

play25:25

brushing over many different topics and

play25:27

that's because this will be assuming

play25:28

that you already took my prompting

play25:30

course and the GPT building course that

play25:32

is also accessible within the community

play25:34

I cannot make that assumption in a video

play25:35

like this but what I can do is test

play25:36

prompt caching more and then come back

play25:38

with a video on that so there you go AI

play25:40

has been wild lately and these are some

play25:41

very exciting developments we'll be

play25:43

playing with all of it and if you find

play25:44

something interesting I'll be reporting

play25:46

back next Friday in our weekly show AI

play25:48

news you can use and that's all I got

play25:50

for today see you soon

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
AI NewsImage GenerationHyperrealismE-commerceAI ModelsVoice AssistantGrock 2Chat GPTFlux ModelAura Adaptation