AI Realism Breakthrough & More AI Use Cases
Summary
TLDRThis week's AI news focuses on hyperrealistic image generation with breakthroughs that impact e-commerce, as seen with platforms like Let's AI integrating it for virtual try-ons. The release of Grock 2 by Twitter, incorporating Flux's open-source model for uncensored image generation, is a highlight. Additionally, updates on language models like Chat GPT 4 and Google's new voice assistant, Gemini Live, are discussed. The script also touches on the implications of these technologies for redefining 'photo' and the potential for misuse, emphasizing the importance of education and ethical considerations in AI advancements.
Takeaways
- π The script discusses a breakthrough in hyperrealistic image generation with AI, noting its potential impact on e-commerce and social media platforms.
- π¨ The release of Grock 2 is highlighted, which integrates the Flux model for image generation and stands out for its uncensored capabilities, except for nudity and other explicit content.
- π Grock 2's integration with Twitter's data firehose is emphasized, allowing it to serve as a powerful Twitter search engine and providing real-time news and information.
- π The script mentions the open-source nature of Flux, enabling users to customize and enhance the model with additional data, such as personal images or hyperrealistic photos.
- ποΈ The potential use case of AI-generated images in e-commerce is explored, where customers can virtually try on clothes using AI, predicting a shift in online shopping experiences.
- π€ The script introduces the concept of 'Aura' for image models, which involves low-rank adaptation to improve the generation of specific types of images, like personalized content.
- π The importance of understanding code when using AI tools for code generation is stressed, to effectively handle and debug the generated code.
- π’ A new Chat GPT model is quietly released within the Chat GPT app, optimized for chat interactions and dialogue, with minimal noticeable differences to the user.
- π£οΈ Google's release of its voice assistant, Gemini Live, is critiqued as feeling like a beta release, lacking the advanced features and integrations expected.
- πΊ The update to the Vigle app, allowing users to create dancing videos with two people, is presented as a fun and engaging use of AI technology.
- π‘ Anthropic's announcement of prompt caching with Claude is highlighted, which could significantly reduce costs and latency, making it an exciting development for AI integration and conversational agents.
Q & A
What is the main focus of the video script?
-The main focus of the video script is the recent advancements in AI, particularly in the area of hyperrealistic image generation and the integration of these technologies into various applications like e-commerce and social media platforms.
What is the significance of the breakthrough in hyperrealistic image generation mentioned in the script?
-The breakthrough in hyperrealistic image generation is significant because it has led to the creation of images that are indistinguishable from real photos, which has implications for how we define and perceive 'photos' in the digital age.
What is the role of the Flux model in the recent developments?
-The Flux model, developed by Black Forest Labs, plays a central role as it is an open-source, mid-journey level model that has been integrated into Gro and is capable of generating hyperrealistic images, which has sparked various adaptations and use cases.
What is Aura and how does it relate to the Flux model?
-Aura stands for low-rank adaptation and is a technique where extra data can be added to an imaging model to train it to generate images with specific characteristics. In the context of the Flux model, it allows for the creation of hyperrealistic images with added realism through fine-tuning.
How does the script address the ethical concerns related to the generation of hyperrealistic images?
-The script acknowledges the potential for misuse of hyperrealistic image generation, such as creating deep fakes, and emphasizes the importance of education to help people understand and protect themselves from these technological advancements.
What is the current state of AI-generated code and its usability?
-AI-generated code has become more sophisticated, but the script points out that without a basic understanding of coding, users may struggle to utilize or debug the code effectively, highlighting the need for education in coding alongside AI tools.
What new features does the Gro 2 model offer compared to its predecessors?
-Gro 2 offers integration with Twitter's data firehose, providing real-time access to news and opinions from Twitter, and improved capabilities in image generation using the Flux model. It also has a more relaxed content policy compared to some other models.
What is the significance of the new Chat GPT model release mentioned in the script?
-The new Chat GPT model release is significant as it has been optimized for chat conversations, offering a more interactive and dialogue-focused experience for users, and is already integrated into the Chat GPT app.
What are the limitations of Google's new image generator compared to Flux or Mid Journey?
-While Google's new image generator is an improvement over their previous efforts, it does not match the level of detail and realism offered by Flux or Mid Journey, which are considered to be the current benchmarks in the field.
What is the potential impact of prompt caching with Claude (CLA) on AI-generated content?
-Prompt caching with CLA can significantly reduce costs and latency, making it more feasible to integrate complex personas into AI models. This could lead to faster and more cost-effective generation of content that requires contextual understanding.
Outlines
π¨ Hyperrealistic Image Generation Breakthroughs
The script discusses a significant advancement in hyperrealistic image generation, highlighting the integration of the Flux model by Black Forest Labs into Grock 2, an AI model. The open-source nature of Flux allows for customization, such as with Aura, which enhances the model's realism by training it with additional images. The summary touches on the implications of this technology, including its potential to redefine the concept of a 'photo' and its application in e-commerce for virtual try-ons, as well as the ethical concerns regarding the generation of politically sensitive images or deepfakes.
π οΈ Democratization of AI-Generated Code and Education
This section addresses the increasing capability of AI to generate code and the challenges faced by users who lack the knowledge to debug or utilize the generated code effectively. The script promotes Brilliant.org as a valuable resource for learning programming, offering hands-on courses that can enhance users' ability to work with AI tools. It emphasizes the importance of understanding code to leverage the full potential of AI-generated outputs.
π Updates on Large Language Models (LLMs) and Their Applications
The script provides an overview of updates in the LLM space, focusing on the release of Grock 2, which is integrated with Twitter's data firehose, allowing it to provide real-time news and information. It compares Grock 2 with other models like Anthropic's Sonet 3.5 and discusses the practicality of these models in various use cases, such as research and browsing. The summary also mentions the release of a new model by Chat GPT and the significance of these updates in the context of AI advancements.
π Google's Voice Assistant and Its Reception
The script discusses Google's release of a voice assistant, Gemini Live, which offers voice input and output capabilities for Android users. It contrasts the expectations set by the hype around the product with the actual functionality, which is currently limited to basic voice interactions without advanced features like voice modulation or multimodal capabilities. The summary includes feedback from a user experience, highlighting issues with the interrupting feature and the overall impression of the product being a beta release.
πΊ Fun Application Update: Vigle App for Dancing Videos
The script introduces an update to the Vigle app, which now allows users to create dancing videos featuring two people. The summary demonstrates the app's functionality by creating a Matrix-themed video with Tyrion Lannister, showcasing the app's potential as a fun communication tool for friends and family.
π Prompt Caching with Claude: Efficiency and Cost-Effectiveness
The script highlights a new feature from Anthropic called prompt caching, which significantly reduces costs and latency when integrating complex personas into Claude's AI model. The summary explains the practical benefits of prompt caching, such as the ability to handle long contexts and multi-shot prompts efficiently. It also expresses a desire to investigate the feature further and understand its limitations compared to fine-tuning.
π Community Engagement and Future AI Developments
The script concludes with an announcement about the restart of the presenter's LLM Innovations event series, which will delve into topics like prompt caching and the best practices for using AI tools. The summary emphasizes the value of community engagement in exploring AI advancements and the presenter's commitment to sharing evidence-based insights and experimental results.
Mindmap
Keywords
π‘Hyperrealism
π‘Image Generation
π‘Flux
π‘Grok 2
π‘E-commerce
π‘Aura (Low Rank Adaptation)
π‘Deepfakes
π‘LLM (Large Language Models)
π‘Chat GPT
π‘Anthropic
π‘Prompt Caching
Highlights
Hyperrealistic image generation has seen a breakthrough with practical use cases emerging in e-commerce.
Grock 2 release includes image generation 2 from the Flux model, which is open source and has been integrated into Gro.
Flux model allows for the generation of political figures and copyrighted materials, pushing the boundaries of image generation capabilities.
Aura, or low rank adaptation, enables customization of the Flux model with additional data for personalized image generation.
The concept of 'photo' is being redefined as AI-generated images become indistinguishable from real-life photos.
Small companies and Indie hackers are finding innovative use cases for hyperrealistic image generation in e-commerce and beyond.
Grock 2's integration with Twitter data provides a powerful search engine and news reference tool.
New advancements in AI tools for code generation require a basic understanding of coding to effectively utilize and debug the generated code.
Chat GPT 40's latest model release has been optimized for chat conversations and integrated into the Chat GPT app.
Google's new image generator, while an improvement, does not compare to the capabilities of Flux or Mid Journey.
Google's Gemini Live is a voice assistant with integrated functionalities but lacks the advanced features teased in previous releases.
Vigle app update allows for creating dancing videos with two people, offering a fun way to communicate.
Anthropic's prompt caching with Claude AI reduces costs and latency significantly, making complex conversational agents more accessible.
The community-driven llm Innovations event series is restarting, focusing on in-depth exploration of AI advancements.
AI developments are accelerating, with new models and features being released that have practical implications for various industries.
Transcripts
okay listen so this week in news we can
use is quite different than the usual
weeks as you know every week me and the
team go in and pull together all the new
AI releases research them test them for
you and then in this video I present you
all the results and usually we start
with chbt upgrades llm upgrades but this
week I want to lead with Hyper realistic
image generation because I think we
literally had a breakthrough in this
space and the first actual use cases
like e-commerce are already popping up
so I'm very excited to bring you a
packed week of news you can use although
it is mid August we're going to be
covering hyperrealism and its use cases
what happened there since last week but
there's also a new cat GPT model that
ranks number one above everything else
now which is already inside of cat GPT
and we have Google releasing a voice
assistant of their own that you can
actually use on your phone the top
comment on last week's video was already
saying that it feels like we're
beginning to accelerate from Frisco
fatsis and I can only agree last week
was intense but it feels like we're
entering a whole new era and I'm not
just saying that lightly let me prove my
point here by showing you this week AI
news and you can actually use starting
out with the grock release and this is
linked to this hyper realism story
because grock 2 has released and it
includes image generation 2 and the
image generation is from the flux one
model by black forest Labs that we
covered last week and that's where I
want to start we're going to talk about
grock 2 the llm and how it compares to
other llms later on when we talk about
that but the fact is that because flux
is open source we covered that last week
if you haven't seen that check it out
it's a legit breakthrough to get a mid
journey level model model that is open
source and people can build up on and in
the next few minutes I'll show you why
but the point is that it's integrated
into Gro and Gro already shipped it's
here this is grock 2 mini there's the
larger model again we'll talk about that
later but you already have this flux
integration in here and it's quite
unhinged like not completely uncensored
okay so you can't do nudity and things
like that but you can generate political
figures and compromising situations and
you can also generate all sorts of
copyrighted materials like company logos
this is a generated right here I made it
live so fair enough now one of the most
popular social media platforms on planet
Earth can generate copyrighted materials
or political images like this one and
all sorts of other weird stuff that is
related to politics and tragedies and
sometimes combining the both I don't
even want to show that stuff in this
video the point is it's quite unhinged
but that's where the story only begins
because flux is open source so people
can do all sorts of stuff with it and if
you've seen last week's video my review
of it was wow it's really good it's Best
in Class A text generation in
hyperrealism it's quite good but M
Journey still King but that was last
week because people have done a lot work
since then and the fact that it is open
source allows for something that is
called Aura and if you're not familiar
let me introduce you to the concept of
Aura for a second Laura basically stands
for low rank adaptation and what that
means in human terms is that you can add
extra data to the Imaging model in
Practical terms you can add images of
yourself and then train the model to
generate images of you or you could add
a whole bunch of hyper realistic images
that look really crisp and super
realistic real photos and then the model
will be able to pick that up and that's
exactly what people have been doing and
that's why we have various offshoots of
this flux model now because it is open
source and you can do things like
combine it with luras and we get
something like flux def realism which is
basically the flux model with a realism
Laura attached to it now running this is
not free it costs a few cents you need
to sign in with giab on this replicates
base I'll just briefly do that and also
I should note that what we learned since
last week's testing is that the
prompting is a little more intricate
with flux so you need to be using a
promp generator or be very detailed in
your promptings a lot of the simple
prompts that might reduce stunning
images in my Journey won't work as well
in flux but before we even get into this
app I want to address the question of
like okay so like hyper realistic images
why should I even care like image
generation is really good but I have no
use case for it either in my work or my
everyday life and to that concern that
is very common by the way these days I
would say fair enough I for myself found
this use case of creating these amazing
custom thumbnails of me in various
situations a lot of the times but most
people don't really have a use case but
what you do have is the fact that the
word photo is kind of a term that
everybody uses and everybody has a fixed
definition of that now and the point of
this might not even be a use case it
might be the fact that you need to
change your vocabulary or change the
definition of what you consider a photo
because what we're about to generate
with this flux Dev realism model here is
indistinguishable from real life like
literally and I don't mean sort of
indistinguishable if you would see this
image and let's say National Geographic
no nobody would be able to tell not even
a trained eye the fingers are perfect
the skin texture the beard the focal
plane it's all just like a real photo
just like this other images here and
what I hope that this segment here in
this video does is that you might start
questioning what even a photo is because
up until now a photo is a moment in real
life that was captured through a camera
whether that was done back in the day
with film or through digital everybody
agreed on what a photo is but now also
this is a photo and this is not real
life and sure you might argue that
photoshop took in that direction already
but that was still a skill that was hard
to access now it really gets
democratized like Heck if you're
watching this video you can just log in
here add a few sense to your replicate
account and go ahead and run this
yourself like so and all of a sudden you
can generate all sorts of fake images
but that's only my first point the
second point is actually use case
related because okay sure we might have
to redefine what we perceive as real
when we see digital imagery from here on
out and there you go this is the
generation so the eyes is a little weird
no problem I'll just rerun it and
another 4 seconds we'll have another
alternative cuz that's how simple this
is but then certain small companies and
Indie hackers already found use cases
for this in the real world and they're
first because they're the most agile
right a big Corporation is going to take
12 to 24 months to actually implement
this meaningfully but this is the moment
where that process begins this is sort
of the Tipping Point of realism cuz this
model is open source look at that this
one is Flawless except of maybe this
little text piece the text is right the
background this could be from any
conference so what did these small teams
or individuals find well I have two
examples here one of them is called
let's Ai and they basically plugged in
flux into their product that allows
people to try on various clothing in an
online setting and keep in mind this is
just the first version of it look this
is lonus trying on rayb bands from some
e-commerce store without actually trying
them on same example with a Monclair
jacket like so so it's quite easy to
imagine a future where online shopping
turns into hey upload five images of
yourself and then here's the product
catalog with you actually wearing the
products I mean that will convert so
much better than you just seeing a image
of some random model wearing it that
might have a completely different body
type than you so that's one very
interesting use case and the Second Use
case actually relates to what Peter
levels here on X has been experimenting
with he's a popular Indie hacker that is
always up to some new project and right
now he's playing with flux and he added
his own Laura to the model and here you
can see he generated himself in four
different Generations which is
interesting but I think even more
interesting than that he actually did a
little pipeline where he generated an
image with flux and then fed it to link
to generate the AI YouTuber that looks
hyper real and the video aspect here is
really the next step but we'll cover
that once it's relevant for now
character consistency and the lip
syncing is just not there yet but hyper
real images are with these Fluxx models
that we just covered here and just to
round out this segment I want to just
point your attention towards this GitHub
report that popped up over the last week
it's called Deep live cam and in case
you haven't seen this yet it's very
simply described you basically can
install this locally and with one image
it creates deep fakes of anybody and it
creates a webcam image that you could
then feed to zoom or Google meets or
whatever you might be using and all of a
sudden you could potentially Get Fooled
by somebody using something like this
into thinking that you're talking to
somebody else so this is why I wanted to
feature this first because these
incremental AI advancements often seem
meaningless like okay new model who
cares I'm not going to be using it but
in a case like this I want you to think
about what this means for the current
digital world and for things that we
take for granted like if a family member
sends you image you don't question if
that's a real image or if they
Photoshopped it right with technology
like this being accessible inside of
WhatsApp Instagram their models are not
so good but now with Twitter RX
integrating flux into their platform
it's just a question of weeks or months
until this is widely available to
billions of people and not just people
who watch this videos and use something
like replicate or premium subscribers on
X as it is now and one more thing before
we move on consider sharing this video
with a loved one because no matter how I
look at this education is the only way
that I can see on how to protect
yourself from these technological
advancements and these potentially
malicious use cases and then on the
bright side there will probably
transform Ecom very soon here and that
should be relevant to everyone involved
with marketing or entrepreneurship in
any sense so more and more AI tools are
becoming incredible at generating code
which is great unless you don't know
what to do with it we've seen a lot of
people recently hop into something like
Sonet 3.5 by anthropic that is really
good at generating code and they ask it
to generate something like a snake game
just to get an error which they don't
know how to resp solve and they
completely hit a brick wall and that's
why having at least a little bit of
understanding of how code works is
really beneficial while trying to
utilize the latest AI tools and one
fantastic resource that you can use to
get up to speed on how to get these
Basics under your belt is brilliant.org
the sponsor of today's video they have
beginner level courses to teach you all
the basics but then they also have more
advanced courses like this one called
designing programs that can really take
your coding skills to the next level
here you can actually learn how to build
games and apps that respond to live user
input it also teaches you how to
properly check for errors and debug if
problems come up and by the way that's a
skill that's really useful when working
with AI tools because they do a lot of
the writing for you it's just that bugs
make their way into the code sometimes
and you need to know how to deal with
that one thing that I really like about
brilliant is that you're always Hands-On
building something or interacting with
exercise you're never forced to sit for
an hourong lecture on something that you
really don't care about traditional
education anyway if you really want to
level up your own skill set and take
full advantage of the tools of available
to you today head on over to
brilliant.org or click the link in the
description to try it for free for a
full 30 days if you decide to stick with
it you'll get 20% off an annual
subscription a big thank you to
brilliant for sponsoring this video and
now let's get back to some AI news you
can use okay and now it's time to talk
about llms I'm I put my headphones on
here to get a little more serious about
this because there has been quite a few
updates and I'm going to keep it short
I'm not going to go too deep I don't
think we had anything that is like a
complete Game Changer this week I would
tell you that but we did have various
releases from well on one side x/
Twitter with groc 2 and then we have a
brand new model out of chat GPT the chat
GPT 40 latest
22488 release you can find that in the
new cat GPT app2 and then there was this
entire story that unfolded with this new
model called sus column r that popped up
on LMS Ys Arena and it ranked really
high nobody knew what it was people were
rumoring that it's a new chb model but
now it has been revealed that it was
actually the grock 2 Beta release okay
and this was the proper grock 2 model as
of right now at least for me and the
people that I know when you go to x you
can only access the grock 2 mini model
which is sort of like GPT for om mini so
what is unique about grock 2 and what is
new about the new cat GPT model well
first of all the story really begins
with this chatbot Arena here because as
I told you this new model sort of just
popped up out of nowhere and was ranking
really high and this seems to be the new
default way like a lot of these
companies like open AI now also X test
their new models because it's a great
way to get it into users hands to get
some feedback on how users actually use
it and enjoy it without revealing the
model so they're released under
Anonymous names and this is also how a
few recent openai models were introduced
and one comment on chatbot Arena
actually made a mistake last week that I
want to correct this week thank you so
much influential studio for this comment
here on the video pointing out that when
you vote on chatbot Arena and you can
see which model you're voting on those
votes don't actually count and only
counts the anonymous votes that makes a
lot of sense as this ranking here is
fully user voted so for example in this
view where you can actually see what
you're comparing these votes do not
count only the ones from the arena here
actually count where the models are
Anonymous just wanted to correct that
but back to Gro 2 so it has released and
what's the story here well it's a top
tier model it's a GPD 4 level model that
is not quite Best in Class at anything
in particular but it's really
well-rounded and the biggest selling
point here is the following it's plugged
into all of the Twitter data the full
Twitter fire Hol all of the news story
all of the opinions that go down on
Twitter day-to- day they are being
infused into the model so you can use it
for some use cases that require browsing
and that don't work as well with other
models like what are the top news
stories relating to AI for today and
keep in mind this is the Mini model this
is not the grock 2 main model that we
are looking at here by the way while
this generates this will take a few
seconds CU it does need to look at the
Twitter API and all the data there but
as of these released benchmarks it's
interesting how they structured it and
it's a bit deceptive so I want to
clarify this because it Compares it to
the turbo model or Gemini 1.5 Pro and
some of these more competitive models
like claw 3.5 Sonet or llama 3E 405b are
all the way on the right and the reason
I say that is because for example gbd4
turbo this is the release that happened
right after the voice assistant
announcement back in May and back at
that point a lot of people argue that
GPT 4 is actually worse than GPT 4 the
benchmarks were slightly better than GPT
4 but the point is this is not the
fairest comparison and they're right
next to each other so don't take this
Delta too seriously what you really want
to compare is Sonet over here here with
grock 2 because what you really want to
be looking at is Sonet over here and
llama 405b those are the most up-to-date
versions again this gbt 40 and turbo
models are back from May says that down
here and if you compare to something
like Sonet it actually loses out on all
benchmarks closely but it's worse except
of MAF Vista over here but again as I
always see these minimal differences in
benchmarks are not a game changer what
matters is how it performs in practice
and O okay right now it's actually still
loading which is a little weird I'll
regenerate this I got to say during my
testing of this before the recording of
this video this went actually super
smoothly but there you go on second try
it just gets the stories right away and
as you'll see at the bottom it will
reference the tweets it pulled it from
so this is actually a fantastic Twitter
search engine and I think that is the
main use case here this combination of
the Twitter data fire host and having an
llm that has access to all of it is
actually quite powerful and as you can
see it talks about the search GPT
prototype and Google's AI Integrations
with the pixel 9 and of course the grock
2 launch so this is a fantastic use case
that you can be using today if you're
subscribed to X premium which I believe
in Europe comes in at 10 a month it's
actually 860 yeah that's correct if you
go to monthly and then Premium Plus is
20 but Gro already comes with this so
yeah this is a paid thing but it's just
a brand new way to use Twitter and then
of course it also has the image
generation features with flux that we
talked about in the first part of the
video but there's even more here because
as it showed what's coming down the line
is also some multimodal capabilities
where it has Vision capabilities they'll
be offering Enterprise API which could
be an interesting way I mean they'll
give you access to llm that has all of
the world's knowledge that is in Twitter
hm time will show what that will be used
for and what's my personal first
impression of groc and its outputs well
it's good it's certainly usable and if
you only want the text generation
features then it can certainly act as a
replacement to chat GPT now I personally
and many others still prefer anthropics
voice it's just more human and less
robotic like chat GPT but this is decent
but again I didn't get my hands on the
full grto version so I can't really give
my full opinion but I also say this it
lacks the tooling just some of these
other tools also do file uploads a
functional mobile apps gpts I use these
things all of time now I do consider
myself a power user but still if you use
those features maybe code interpreter or
image input then you won't have those in
here so what should you use well let me
sum it up briefly as of 15th of August
2024 as a general purpose AI assistant
chat GPT still is best because of all
the functionality that I just named when
it comes to writing tone though
anthropic Sonet 3.5 is my go-to when it
comes to code generation specifically
also Sonet 3.5 Head and Shoulders above
everybody else right now but when it
comes to research perplexity is your
friend and when it comes to actually
using llms with live data well I think
grock actually sort of takes the crown
here because it is plugged into all the
Twitter data and it references all of
that and as Twitter is the place where
news breaks first I mean heck a lot of
this video is just me and the team
spending every single day on Twitter and
pulling everything together and then me
digesting it for you well Gro can sort
of do that already too so that would be
the one use case where this really
stands out and also one more thing is
that Gro is actually sort of uncensored
and by sort of I mean again it's the
same thing as with flux it won't produce
R-rated content but it doesn't have
problem with cuss wordss or things that
are in like ethical gray area anthropic
is on the other side of the spectrum
they're extremely strict and CH is quite
restrictive but not as much as anthropic
anthropic is really extreme and Gro on
the other hand doesn't have a problem
with profanities and now moving forward
there's actually also a brand new Chad
GPT model and this is not just the API
this has actually been integrated into
cat GPT that you might be using every
single day because if you look at this
tweet from the official chat GPT app
there's a new GPT 4 model out in cat GPT
since last week hope you are all
enjoying it and check it out if you
haven't we think you like it and the
funny thing is nobody really noticed
that's how minuscule the differences
between the models are these days they
ship a new thing and they have to
announce that there's some update
because nobody has noticed otherwise but
yeah this a slightly upgraded model
apparently the biggest difference is and
how it handles chat conversations so it
has been optimized to interact with
users in a dialogue and there's also a
brand new API endpoint for people to use
but it's funny cuz the dev account still
says hey if you're a Dev you probably
still want to use the 0806 API endpoint
not this latest one that was released
for chat GPT that one is just best for
chat use cases so there you go a minor
update on that front if you were
confused by this I will be reporting
back once full grock 2 comes out and
I'll get to test it a little bit more
for my personal use cases for now I only
have the min version so moving on to the
next story here is a few releases out of
Google one of them is image and free and
you know I'll keep this as short as
possible it's a good image generator
it's their best image generator but
compared to something like flux or M
Journey it just doesn't hold up it does
text well but so do others and their
open source but yeah it's better than
anything that Google has done before
with image generation and it will be
introduced into their hardware and their
software offerings just like this second
announcement which might be more
interesting here this is Gemini live
okay and this is the voice assistant
that open AI promised but for Google or
is it because the reality of this
product is probably the biggest Delta
between what some people hyped it up to
be and between what it actually is
because the reality of it is yes it is a
voice assistant and yes it also already
shipped Android users already have this
on their phone I'm an Apple user but I'm
lucky enough that team member Daniel
actually went ahead and gave this a shot
and tested it and I'll just quote some
of the pointy forwarded here to me keep
in mind that this is coming from an
angle where we're comparing it to what
the voice assistant PR and what is
available in the open AI app today
because if you're not familiar there's a
voice assistant already it might not be
the sophisticated one with the voice
changes that you can interrupt and the
multimodal capabilities but there's a
voice assistant you can use the voice
function to talk to chat GPT right uh
quick spoiler that's what this really is
Google shipped a voice input and output
function that you can also interrupt but
it's not great okay so what's the review
well apparently the Gemini live voice
assistant feels more like a Beta release
than something that is actually on the
level of The Voice Voice Assistant
teased by open ey the voices are good
how can I help you today but so are cat
gpt's voices today there's no voice
modulation and no multimodal
capabilities like using the camera to
actually infer context and to use it as
this advanced voice assistant what it
does have is the ability for you to
interrupt it which is actually my
biggest gripe with the current version
of the chat GPT voice features but the
problem is it's not great and Daniel
reported back that if he has a speaker
volume on the phone over 75% The Voice
Assistant actually starts interrup in
itself cuz it here's the output and then
it stops I think you get the point CET
GPT never does that and because of this
he concluded that the interrupting
feature is something that's currently
more annoying than useful not sure they
can fix that over time but again it just
goes to underline this point that it
feels like something a little Half Baked
that was maybe rushed out but not to
bash it too hard what it does have is
access to Integrations like your Google
calendar or your Gmail and then you can
interact with those on your Android
phone that is absolutely fantastic and
something you cannot get inside of C GPT
as of yet so there you go that would be
my first little look at the voice
assistant feature I do have to add that
at the end of their presentation they
showed that they're looking at this
advanced multimodal voice assistant in
the future right but as of what's
released today it's just voice input and
output with Gemini which is a nice to
have but that doesn't mean they're
pulled ahead of open AI they just caught
up and that's fine but let's call us Spa
Spade if you have a different experience
by the way please leave a comment below
we would love to hear about it all right
this is going to be a quick but fun one
vigle the app that came out a few months
ago that lets you put yourself or
somebody else into dancing type video
has a new update where you can actually
do it for two people and I think that's
sort of fun because you can use it as a
fun way to communicate with friends or
family I just briefly want to show it to
you so if you go in here you can see the
new update right here I just logged in
with Google on a free account by the way
you can just try this right away and if
you head on over to this multi tab then
you can pick a template I'm going to
Simply take a matrix fight that sounds
perfect
all right use the template and now I can
pick the two characters right so one
character right here I'll just use the
camera real quick okay ideally should be
a full body photo but I'll just a selfie
here and then as the second one how
about a picture of Tyrion Lannister here
because you guys seem to enjoy the Game
of Thrones clip we did with the sponsor
last week and that's it I'll just go
ahead and generate and by the way this
is not sponsor it's just an interesting
release of a cool app let's see what we
get
here okay no way it's
Tyrion this
epic well this is as I thought this is
really sort of funny and quirky and you
could just download it with the
watermark lights so no problem you could
send it to somebody again this is just
possible on the free plan sure they have
other tiers I haven't even looked into
them so far but there you go I found
little use case cuz a I can be that too
it doesn't always just have to be useful
and productive and then last but
certainly not not least there is a very
very interesting release out of
anthropic coming this week and this was
sort of shocking to me prompt caching
with CLA and what this essentially is
that it saves context into a cache
memory that goes along with the API with
the Practical result of reducing costs
up to 90% and latency by up to
85% meaning you can integrate really
complex personas into CLA and then call
the API it will cost 90% less and it
will be roughly 5 to 10 times as fast I
mean that's a bold claim and reading
through this this sounds really
impressive all of these use cases
conversational agents coding assistants
anything that needs a little bit more
context like for example here if you
upload a book of 100,000 tokens into one
of claude's models the latency without
the caching would have been 11 seconds
to get a response okay so you give it
all this context then you ask something
about the book and then it takes 11
seconds to reply that's what this means
in Practical terms and with the caching
2.5 at a 90% cost
reduction honestly this sounds a little
too good to be true so I had a closer
look at this and I have to be honest I
have one single gripe of this which is
like what is the downside here what is
the negative part honestly this looks
too good to be true they say it's still
in beta they give you explanations on
how it works and how it's priced one
limitation is that it doesn't work on
the Opus model but the Sonet model is
best right now anyway there's even a
prompt caching cookbook on their GitHub
if you want to check that out and hey
let me tell you what this just came out
I didn't really have time to dive deep
into this but over the weekend I'll be
having a closer look at this and
experimenting with this because again it
just sounds too good to be true this is
sort of like having rag up to a certain
context limit I suppose but without the
downside of the long loading times and
the embeddings being created and
retrieved and compared to just adding a
lot of context which was the rag
alternative it's way faster now too so
that's amazing but I want to know what's
downside here and what does this
compared to for example fine tuning
because they do say that one of the best
use cases here is actually giving it
multi-shot prompts into this cache and
then it can consider that as extra
context for the generations so I'll have
to run a few experiments and I'll report
back and if you're interested in this
sort of topic I do want to point out the
fact that I'm actually restarting my llm
Innovations event series before this was
called chat GPT Innovations and I used
to do it every two weeks for a year
straight for all course members and now
I hold it in the community and it's a
part of the membership and I'm going to
hold it once a month we went with this
image of eigor Einstein because this is
going to be where I'll be presenting
some of the experiments that I run to
the community it's a long format usually
the lecture takes about an hour and then
we do a Q&A afterwards and the next
session in September we'll be looking at
when you should using prompts versus
using a GPT versus using finetuning and
apparently now I'll have to extend this
with when should you be using prompt
caching and all of the results I'll be
showing there will be evidence-based and
include results of the experiments we
run internally with the team and then
moving forward I'll do one of these a
month as this has always been the most
popular format within the community so
we do a lot of things there but I
thought I'd tell you about this one and
as I pointed out many times that's the
whole idea behind the community we can
go deep on single topics rather than
doing what we do on YouTube which is
brushing over many different topics and
that's because this will be assuming
that you already took my prompting
course and the GPT building course that
is also accessible within the community
I cannot make that assumption in a video
like this but what I can do is test
prompt caching more and then come back
with a video on that so there you go AI
has been wild lately and these are some
very exciting developments we'll be
playing with all of it and if you find
something interesting I'll be reporting
back next Friday in our weekly show AI
news you can use and that's all I got
for today see you soon
Browse More Related Video
These AI Use Cases Will Affect Everyone You Know
Wake up babe, a dangerous new open-source AI model is here
New AI Tools Anyone Can Use Today
BIG AI NEWS: 10,000X Bigger Than GPT-4, AGI 2025, New Boston Dynamics Demo And More
[ML News] Groq, Gemma, Sora, Gemini, and Air Canada's chatbot troubles
AI News: Will AI Art Be Outlawed?
5.0 / 5 (0 votes)