Google I/O 2024 keynote in 17 minutes
Summary
TLDRGoogle IO has unveiled a plethora of advancements in AI technology, focusing on enhancing user experiences across various platforms. The event highlighted the launch of Gemini 1.5 Pro, an AI model capable of understanding complex contexts with a 1 million token context window, expanding to 2 million tokens for developers. New features include multimodality, allowing for richer interactions, and the introduction of Gemini 1.5 Flash, a lighter model for developers. Project Astra promises future AI assistance with capabilities like identifying parts of objects and understanding code encryption. Google also introduced Imagine 3 for photorealistic image generation, Music AI Sandbox for professional music creation, and VR, a generative video model. Trillium, the sixth generation of TPUs, offers significant compute performance improvements. Google Search will incorporate multi-step reasoning for complex queries, and Gmail mobile will receive new capabilities like summarization and Q&A features. Additional tools like video FX and personalized trip planning in Gemini Advanced were also announced, along with the upcoming release of Gemini Nano with multimodality for accessibility. The event concluded with the announcement of Poly Gemma, an open vision-language model, and the upcoming Jimma 2, signifying Google's commitment to AI innovation.
Takeaways
- 🌟 Google IO introduces a revamped AI experience with expanded capabilities for context understanding and multimodal interactions.
- 🚗 Gemini, Google's AI, facilitates tasks like identifying a car in a parking station and providing the license plate number.
- 🏊♂️ Gemini's advanced search capabilities can recognize different contexts, such as swimming laps in a pool versus snorkeling in the ocean.
- 🔍 The launch of Gemini 1.5 Pro with a 1 million token context window, available globally for developers and consumers in multiple languages.
- 📈 Expansion of the context window to 2 million tokens, marking progress towards the goal of infinite context for more complex queries and answers.
- 🎥 Project Astra is a new initiative in AI assistance that includes advancements in generative media tools for images, music, and video.
- 📚 Google's AI can create personalized learning experiences, like a science discussion tailored for a student's interests.
- 📈 Imagine 3, a new model for photorealistic image generation, is announced, offering richer details and fewer visual artifacts.
- 📹 VR, a new generative video model, can create high-quality 1080p videos from text, image, and video prompts in various styles.
- 🧘♀️ Google search will soon include multi-step reasoning to answer complex questions, such as finding the best yoga studios and their offers.
- 📧 Gmail mobile will receive new features powered by Gemini, including a summarize option and a Q&A feature for quick responses to emails.
Q & A
What is the new feature that Google is launching for a fully revamped experience?
-Google is launching a fully revamped AI overviews feature, which is initially available to everyone in the US and will be expanded to more countries soon.
How does Gemini assist in recognizing and identifying a user's car in a parking station?
-Gemini uses AI to recognize cars that appear often, triangulates which one is the user's, and provides the license plate number.
What does the term 'multimodality' refer to in the context of Gemini's capabilities?
-Multimodality in Gemini refers to the ability to recognize and process different types of data and contexts, such as text, images, audio, and video, to provide more comprehensive answers.
What is the significance of the 1 million token context window in Gemini 1.5 Pro?
-The 1 million token context window in Gemini 1.5 Pro allows for the processing of long contexts, such as hundreds of pages of text, hours of audio, or a full hour of video, which is a significant step towards handling infinite context.
How does Gemini help in summarizing a long meeting recording?
-If the meeting is recorded using Google Meet, Gemini can be asked to provide highlights of the meeting, summarizing the key points without the need to listen to the entire recording.
What is the purpose of the 'flash' model in Gemini 1.5?
-The Gemini 1.5 Flash is a lighter weight model compared to the Pro version, designed to be more accessible and cost-effective for users with up to 1 million tokens in Google AI Studio and Vertex AI.
What is the new generative media tool introduced by Google called?
-The new generative media tool introduced by Google is called Imagine 3, which is more photorealistic and capable of producing high-quality images with rich details.
What is the name of the new generative video model announced by Google?
-The new generative video model announced by Google is called VR, which can create high-quality 1080p videos from text, image, and video prompts.
What is the name of the sixth generation of TPUs developed by Google?
-The sixth generation of TPUs developed by Google is called Trillium, which offers a significant improvement in compute performance per chip.
How does the new Gemini powered side panel enhance Gmail mobile?
-The new Gemini powered side panel in Gmail mobile provides a summary of the salient information from emails, allows users to ask questions directly from the mobile card, and offers quick answers without having to open the email.
What is the purpose of the 'gems' feature in the Gemini app?
-The 'gems' feature in the Gemini app allows users to create personalized experts on any topic. These gems can be customized with specific instructions and used whenever the user needs information or assistance on that topic.
What is the new capability that allows users to interact with Gemini using voice?
-The new capability is called 'live', which enables users to have in-depth conversations with Gemini using their voice and allows Gemini to see what the user sees through the camera and respond to the surroundings in real time.
Outlines
🚀 Launch of AI Overviews and Gemini Features
The script introduces the audience to Google IO and announces the launch of a revamped AI experience. It discusses the expansion of AI overviews across the US and upcoming availability in more countries. The role of Gemini in simplifying tasks such as identifying a user's car in a parking station and providing the license plate number is highlighted. The script also covers the advanced capabilities of Gemini, including recognizing different contexts and handling complex queries with multimodality and long context support. Gemini 1.5 Pro is introduced with 1 million token context windows, and an expansion to 2 million tokens is announced. The ability of Gemini to provide meeting highlights and draft applications is also mentioned, along with the introduction of Gemini 1.5 Flash, a lighter model for AI assistance.
🎨 New AI Tools and Project Astra
The paragraph discusses the introduction of Imagine 3, a photorealistic AI tool that can create high-quality images with rich details. It also covers the development of Music AI Sandbox, a suite of professional music AI tools, and the unveiling of VR, a generative video model that can create 1080p videos from various prompts. The importance of consistency in space and time for objects and subjects in videos is emphasized. The paragraph also mentions the sixth generation of TPUs, Trillium, which offers significant improvements in compute performance. Additionally, it covers updates to Google search, including multi-step reasoning and the ability to ask questions with video. New features for Gmail mobile are also announced, such as the summarize option and Q&A features, with capabilities for organizing and tracking receipts.
🤖 Virtual Teammate Chip and Personalized AI Tools
The script introduces Chip, a virtual Gemini-powered teammate designed to monitor and track projects, organize information, and provide context. Chip is shown to flag potential issues and create documentation to address them. The paragraph also discusses the upcoming feature 'live' which allows for real-time interaction with Gemini using voice and camera input. The concept of 'gems' is introduced, which are personalized AI experts on various topics, created by users for their specific needs. The trip planning experience in Gemini Advanced is detailed, showcasing how it gathers information to create a personalized vacation plan. The ability to upload and analyze academic and business-related documents with Gemini is also highlighted, along with the upcoming expansion of the long context window and context-aware features.
📈 Pricing, Accessibility, and Future AI Developments
The final paragraph covers the pricing details for Gemini 1.5 Pro and Flash, with a special offer for prompts up to 128k tokens. It introduces Poly Gemma, the first Vision language open model, and teases the upcoming release of Jimma 2. The expansion of Synth ID to text and video modalities is announced, with plans to open source Synth ID text watermarking. Learn LM, a new family of models based on Gemini and tailored for learning, is introduced, with pre-made gems being developed for various educational needs. The script ends with a light-hearted note on the frequent mention of AI during the presentation and a forward-looking statement on future possibilities.
Mindmap
Keywords
💡Google IO
💡AI Overviews
💡Gemini
💡Multimodality
💡Gemini 1.5 Pro
💡Project Astra
💡TPUs (Tensor Processing Units)
💡Google Search Updates
💡Gmail Mobile
💡Gemini Nano
💡Gems
Highlights
Google IO introduces a fully revamped AI experience with a launch of AI overviews in the US, with plans for expansion to more countries.
Gemini's AI capabilities are enhanced to recognize different contexts and provide more complex answers, such as identifying a user's car in a parking station.
The rollout of Gemini 1.5 Pro with a 1 million token context window, available globally for developers and consumers across 35 languages.
Expansion of the context window to 2 million tokens, marking a step towards the goal of infinite context.
Google Meet integration allows Gemini to provide meeting highlights from hour-long recordings.
Introduction of Gemini 1.5 Flash, a lighter model compared to Pro, available for use in Google AI studio and Vertex AI.
Project Astra aims to advance the future of AI assistance with new capabilities in sound and code analysis.
Imagine 3, a new generative media tool, offers more photorealistic images with richer details and fewer visual artifacts.
Music AI Sandbox by Google and YouTube allows creation of new instrumental sections and style transfers between tracks.
VR, a generative video model, creates high-quality 1080p videos from text, image, and video prompts in various visual and cinematic styles.
Sixth generation TPUs, called Trillium, offer a 4.7x improvement in compute performance per chip.
Google search will introduce multi-step reasoning to answer complex questions, such as finding the best yoga studios in Boston.
A new feature in Gmail mobile allows for quick summarization of emails and a Q&A feature for quick answers within the inbox.
Gemini's context awareness enables it to generate images based on text prompts, such as creating an image of tennis with pickles.
Talk back, an accessibility feature, will be enhanced with multimodal capabilities of Gemini Nano for richer descriptions without a network connection.
Gemini 1.5 Pro is priced at $7 per 1 million tokens, with a 50% discount for prompts up to 128k tokens.
Poly Gemma, the first Vision language open model, is now available, and Jimma 2, the next generation of Gemma, will be available in June.
Synthetic ID (synth ID) is being expanded to include text and video modalities, with plans to open source synth ID text in the coming months.
Learn LM, a new family of models based on Gemini and fine-tuned for learning, will include pre-made gems for various educational needs.
Transcripts
[Applause]
[Music]
Google we all ready to do a little
Googling welcome to Google IO it's great
to have all of you with us we'll begin
launching this fully revamped experience
AI overviews to everyone in the US this
week and we'll bring it to more
countries soon with Gemini you're making
that a whole lot easier say you're at a
parking station ready to pay now you can
simply ask photos it knows the cars that
appear often it triangulates which one
is yours and just tells you the license
plate number you can even follow up with
something more complex show me how Luci
swimming has progressed here Gemini goes
beyond a simple search recognizing
different contexts from doing laps in
the pool to snorkeling in the ocean we
are rolling out as photos this this
summer with more capabilities to come
multimodality radically expands the
questions we can ask and the answers we
will get back long context takes this a
step further enabling us to bring in
even more information hundreds of pages
of text hours of audio a full hour of
video or entire code repost you need a 1
million token context window now
possible with Gemini 1.5 Pro I'm excited
to announce that we are bringing this
improved version of Gemini 1.5 Pro to to
all developers globally Gemini 1.5 Pro
with 1 million contexts is now directly
available for consumers in Gemini
Advanced and can be used across 35
languages so today we are expanding the
context window to 2 million
tokens this represents the next step on
our journey towards the ultimate goal of
infinite context and you couldn't make
the PTA meeting the recording of the
meeting is an hour along if it's from
Google meet you can ask Gemini to give
you the
highlights there's a parents group
looking for volunteers you're free that
day of course Gemini can draft a apply
Gemini 1.5 Pro is available today in
workspace Labs notebook LM is going to
take all the materials on the left as
input and output them into a lively
science discussion personalized for him
so let's uh let's dive into physics
what's on deck for today well uh we're
starting with the basics force and
motion okay and that of course means we
have to talk about Sir Isaac Newton and
his three laws of motion and what's
amazing is that my son and I can join
into the conversation and steer it
whichever direction we want when I tap
join hold on we have a question what's
up
Josh yeah can you give my son Jimmy a
basketball
example hey Jimmy that's a fantastic
idea basketball is actually a great way
to visualize force and motion let's
break it down okay so first imagine a
basketball just sitting there on the
court it's not moving right that's
because all the forces acting on it are
balanced the downward pull of grav it
connected the dots and created that age
appropriate example for him making AI
helpful for everyone last year we
reached a milestone on that path when we
formed Google Deep Mind So today we're
introducing
Gemini 1.5 flash flash is a lighter
weight model compared to Pro starting
today you can use 1.5 Flash and 1.5 Pro
with up to 1 million tokens in Google AI
studio and vertex AI today we have some
exciting new progress to share about the
future of AI assistance that we're
calling project Astra tell me when you
see something that makes
sound I see a speaker which makes sound
what is that part of the speaker
called that is the Tweeter it produces
high frequency
sounds what does that part of the code
do this code defines encryption and
decryption functions it seems to use AES
CBC encryption to encode and decode data
based on a key and an initialization
Vector
IV what can I add here here to make this
system
faster adding a cache between the server
and database could improve speed today
we're introducing a series of updates
across our generative media tools with
new models covering image music and
video today I'm so excited to introduce
imagine 3 imagine 3 is more
photorealistic you can literally count
the whiskers on its snout with richer
details like this incredible sunlight in
the shot and fewer visual artifacts or
distorted images you can sign up today
to try imagine 3 in image FX part of our
suite of AI tools at labs. gooogle
together with YouTube we've been
building music AI sandbox a suite of
professional music AI tools that can
create new instrumental sections from
scratch transfer Styles between tracks
and more today I'm excited to announce
our newest most capable generative video
model called
VR VR creates high quality 1080p videos
from text image and video prompts it can
capture the details of your instructions
in different Visual and cinematic Styles
you can prompt for things like aerial
shots of a landscape or time lapse and
further edit your videos using
additional prompts you can use vo in our
new experimental tool called video FX
we're exploring features like
storyboarding and generating longer
scenes not only is it important to
understand where an object or subject
should be in space it needs to maintain
this consistency over time just like the
car in this video over the coming weeks
some of these features will be available
to select creators through video effects
at labs. gooogle and the weit list is
open now today we are exited to announce
the sixth generation of tpus called
Trillium Trillium delivers a 4.7x
Improvement in compute performance per
chip over the previous generation will
make Trillium available to our Cloud
customers in late 2024 we're making AI
overviews even more helpful for your
most complex questions to make this
possible we're introducing multi-step
reasoning in Google search soon you'll
be able to ask search to find the best
yoga or Pilates studios in Boston and
show you details on their intro offers
and the walking time from Beacon Hill
you get some studios with great ratings
and their introductory offers and you
can see the distance for each like this
one it's just a 10-minute walk away
right below you see where they're
located laid out visually it breaks your
bigger question down into all its parts
and it figures out which problems it
needs to solve and in what
order next take planning for example now
you can ask search to create a 3-day
meal plan for a group that's easy to
prepare and here you get a plan with a
wide range of recipes from across the
web if you want to get more veggies in
you can simply ask search to swap in a
vegetarian dish and you can export your
meal plan or get the ingredients as a
list just by tapping here soon you'll be
able to ask questions with video right
in Google search I'm going to take a
video and ask
Google why will this not stay in
place and a near instant Google gives me
an AI overview I guess some reasons this
might be happening and steps I can take
to troubleshoot you'll start to see
these features rolling out in search in
the coming weeks and now we're really
excited that the new Gemini powered side
panel will be generally available next
month three new capabilities coming to
Gmail mobile it looks like there's an
email threat on this with lots of emails
that I haven't read and luckily for me I
can simply tap the summarize option up
top and Skip reading this long back and
forth now Gemini pulls up this helpful
Mobile card as an overlay and this is
where I can read a nice summary of all
the Salient information that I need to
know now I can simply type out my
question right here in the Mobile card
and say something like compare my roof
repair bids by price and availability
this new Q&A feature makes it so easy to
get quick answers on anything in my
inbox without having to First search
Gmail then open the email and then look
for the specific information and
attachments and so on I see some
suggested replies from Gemini now here I
see I have declined the service
suggested new time these new
capabilities in Gemini and Gmail will
start rolling out this month to Labs
users it's got a PDF that's an
attachment from a hotel as a receipt and
I see a suggestion in the side panel
help me organize and track my receipts
step one create a drive folder and put
this receipt and 37 others it's found
into that folder step two extract the
relevant information from those receipts
in that folder into a new spreadsheet
Gemini offers you the option to automate
this so that this particular workflow is
run on all future emails Gemini does the
hard work of extracting all the right
information from all the files and in
that folder and generates this sheet for
you show me where the money is
spent Gemini not only analyzes the data
from the sheet but also creates a nice
visual to help me see the complete
breakdown by category this particular
ability will be rolling out to Labs
users this September we're prototyping a
virtual Gemini powered teammate Chip's
been given a specific job role with a
set of descriptions on how to be helpful
for the team you can see that here and
some of the jobs are to Monitor and
track projects we've listed a few out to
organize information and provide context
and a few more things are we on
track for
launch chip gets to work not only
searching through everything it has
access to but also synthesizing what's
found and coming back with an up-to-date
response there it is a clear timeline a
nice summary and notice even in this
first message here chip Flags a
potential issue the team should be aware
of because we're in a group space
everyone can follow along anyone can
jump in at any time as you see someone
just did asking chip to help create a
doc to help address the issue and this
summer you can have an in-depth
conversation with gini using your voice
we're calling this new experience live
when you go live you'll be able to open
your camera so Gemini can see what you
see and respond to your surroundings in
real time so we're rolling out a new
feature that lets you customize it for
your own needs and create personal
experts on any topic you want we're
calling these gems just tap to create a
gem write your instructions once and
come back whenever you need it for
example here's a gem that I created that
acts as a personal writing coach it
specializes in short stories with
mysterious twists and it even Builds on
the story drafts in my Google Drive gems
will roll out in the coming months that
reasoning and intelligence all come
together in the new trip planning
experience in in Gemini Advanced we're
going to Miami my son loves art my
husband loves seafood and our flight and
hotel details are already in my Gmail
inbox to make sense of these variables
Gemini starts by gathering all kinds of
information from search and helpful
extensions like maps and Gmail the end
result is a personalized vacation plan
presented in Gemini's new Dynamic UI I
like these recommendations but my family
likes to sleep in so I tap to change the
start time and just like that Gemini
adjusted my intinerary for the rest of
the trip this new trip planning
experience will be rolling out to Gemini
Advanced this summer you can upload your
entire thesis your sources your notes
your research and soon interview audio
recordings and videos too it can dissect
your main points identify improvements
and even roleplay as your profession
maybe you have a side hustle selling
handcrafted products simply upload all
of your spreadsheets and ask Gemini to
visualize your
earnings Gemini goes to work calculating
your returns and pulling its analysis
together into a single chart and of
course your files are not used to train
our models later this year we'll be
doubling the long context window to two
million tokens we're putting AI powered
search right at your fingertips create
let's say my son needs help with a
tricky physics word problem like this
one if he stumped on this question
instead of putting me on the spot he can
Circle the exact part he's stuck on and
get stepbystep
instructions right where he's already
doing the work this new capability is
available today now we're making Gemini
context aware so my friend Pete is
asking if I want to play pickle ball
this weekend so I'm going to reply and
try to be funny and I'll say uh is that
like tennis but with uh pickles and I'll
say uh create image of tennis with
Pickles now one new thing you'll notice
is that the Gemini window now hovers in
place above the app so I stay in the
flow okay so that generated some pretty
good images uh what's nice is I can then
drag and drop any of these directly into
the messages app below so like so cool
let me send that and because it's
context aware Gemini knows I'm looking
at a video so it proactively shows me an
ask this video chip what is is can't
type the two bounce rule by the way this
uses signals like YouTube's captions
which means you can use it on billions
of videos so give it a moment and there
starting with pixel later this year
we'll be expanding what's possible with
our latest model Gemini Nano with
multimodality so several years ago we
developed talk back an accessibility
feature that helps people navigate their
phone through touch and spoken feedback
and now we're taking that to the next
level with the multimodal capabilities
of Gemini Nano so when someone sends
Cara a photo she'll get a richer and
clearer description of what's happening
and the model even works when there's no
network connection these improvements to
talk back are coming later this year 1.5
Pro is $7 per 1 million tokens and I'm
excited to share that for prompts up to
128k it'll be 50% less for
$3.50 and 1.5 flash will start at 35
cents per 1 million tokens and today's
newest member poly Gemma our first
Vision language open model and it's
available right now I'm also too excited
to announce that we have Jimma 2 coming
it's the next generation of Gemma and it
will be available in June today we're
expanding synth ID to two new
modalities text and
video and in the coming months we'll be
open sourcing synth ID text water
marking I'm excited to introduce learn
LM our new family of models based on
Gemini and fine-tuned for learning we're
developing some pre-made gems which will
be available in the Gemini app and web
experience including one called learning
coach I have a feeling that someone out
there might be
counting how many times we have
mentioned AI today we went ahead and
counted so that you don't have
[Applause]
to that might be a record in how many
times someone has said
AI here's to the possibilities ahead and
creating them together thank you
関連動画をさらに表示
Google I/O 2024: Everything Revealed in 12 Minutes
These AI Use Cases Will Affect Everyone You Know
Google Releases AI AGENT BUILDER! 🤖 Worth The Wait?
HUGE AI NEWS : MAJOR BREAKTHROUGH!, 2x Faster Inference Than GROQ, 3 NEW GEMINI Models!
SHOCKING New AI Models! | All new GPT-4, Gemini, Imagen 2, Mistral and Command R+
Harness The Unbelievable Power of Gemini 1.5 Pro
5.0 / 5 (0 votes)