Google I/O 2024: Everything Revealed in 12 Minutes

CNET
14 May 202411:26

Summary

TLDRGoogle IO has unveiled a plethora of AI advancements across their platforms. Project Astra, an AI assistance initiative, enhances information processing by encoding video frames into a timeline for efficient recall. Google's new generative video model, Vo, generates high-quality 1080p videos from various prompts, offering users creative control. The sixth generation of TPUs, Trillion, promises a 4.7x improvement in compute performance. Google Search has been revolutionized with AI, allowing users to ask complex questions and search with photos. The new Gemini model will offer a revamped AI overview experience, providing dynamic and organized search results. Gemini is also set to become the new AI assistant on Android, with context awareness and multimodality capabilities. The integration of AI into Android's OS aims to enhance the smartphone experience while maintaining privacy.

Takeaways

  • 📈 **Gemini Model Usage**: Over 1.5 million developers utilize Gemini models for debugging, gaining insights, and developing AI applications.
  • 🚀 **Project Astra**: An advancement in AI assistance that processes information faster by encoding video frames and combining them with speech input into a timeline for efficient recall.
  • 📚 **Caching for Speed**: Introducing a cache between the server and database to improve system speed.
  • 🎥 **VO Video Model**: A new generative video model that creates high-quality 1080p videos from text, image, and video prompts, offering creative control and various cinematic styles.
  • 🧠 **Sixth Generation TPUs**: The introduction of Trillion TPU, offering a 4.7x improvement in compute performance per chip over the previous generation.
  • 🔍 **Google Search Transformation**: The use of Gemini in Google Search has led to a new generative experience, allowing for more complex queries and innovative search methods.
  • 🍽️ **AI Overviews**: A revamped search experience that clusters results and provides dynamic, whole-page experiences for categories like dining, recipes, movies, and more.
  • 🤖 **Live Interaction with Gemini**: A new feature allowing real-time interaction with Gemini using Google's latest speech models, enabling more natural conversations.
  • 📱 **Personalized AI with Gems**: The ability to create personalized AI experts, or 'gems,' for any topic, offering tailored assistance.
  • 📱 **AI-Powered Android**: Android's multi-year journey to integrate AI more deeply, starting with AI-powered search, a new AI assistant, and on-device AI for fast, private experiences.
  • 📚 **Educational Assistance**: The use of on-device AI for educational purposes, such as providing step-by-step instructions for homework problems directly on the device.

Q & A

  • What is the significance of Gemini models for developers?

    -Gemini models are crucial for developers as they are used across various tools to debug code, gain new insights, and build the next generation of AI applications.

  • What is Project Astra and how does it improve AI assistance?

    -Project Astra is an advancement in AI assistance that builds on the Gemini model. It developed agents that can process information faster by continuously encoding video frames, combining video and speech input into a timeline of events, and caching this for efficient recall.

  • How does adding a cache between the server and database improve the system's speed?

    -Adding a cache between the server and database can significantly improve speed by reducing the need to access the database for every request, thus speeding up data retrieval times.

  • What is the new generative video model announced at Google IO?

    -The new generative video model announced is called 'vo'. It creates high-quality 1080p videos from text, image, and video prompts, offering unprecedented creative control and the ability to capture details in various visual and cinematic styles.

  • What is the improvement in compute performance per chip offered by the sixth generation of TPUs, called Trillian?

    -Trillian, the sixth generation of TPUs, delivers a 4.7x improvement in compute performance per chip over the previous generation, making it the most efficient and performant TPU to date.

  • How has Gemini transformed Google Search?

    -Gemini has transformed Google Search by enabling a generative experience that allows users to search in entirely new ways, ask new types of questions, and even search with photos, leading to an increase in search usage and user satisfaction.

  • What is the new feature in Android that allows for AI-powered search at the user's fingertips?

    -The new feature in Android is an AI-powered search that provides step-by-step instructions and answers directly on the device, making it easier for users to get the information they need without having to switch between apps.

  • How does Gemini become a more helpful assistant on Android?

    -Gemini becomes a more helpful assistant on Android by becoming context-aware, which allows it to anticipate what the user is trying to do and provide more helpful suggestions at the moment.

  • What is the 'gems' feature in Gemini and how does it work?

    -The 'gems' feature in Gemini allows users to create personalized experts on any topic they want. Users can set up a gem by tapping to create it, writing their instructions once, and then coming back to it whenever they need it.

  • How does the new live experience with Gemini using Google's latest speech models enhance user interaction?

    -The new live experience with Gemini enhances user interaction by allowing Gemini to better understand the user and answer naturally. Users can even interrupt while Gemini is responding, and it will adapt to the user's speech patterns.

  • What is the significance of Android being the first mobile operating system to include a built-in on-device Foundation model?

    -The inclusion of a built-in on-device Foundation model in Android signifies a major step forward in integrating AI directly into the OS. This allows for faster experiences while also protecting user privacy by bringing the capabilities of models like Gemini directly to the user's device.

Outlines

00:00

🚀 Project Astra and AI Advancements at Google IO

The first paragraph introduces the audience to Google IO and highlights the widespread use of Gemini models by developers for various purposes, including debugging code and building AI applications. It also mentions the integration of Gemini's capabilities into Google's products such as search, photos, workspace, Android, and more. The speaker then presents Project Astra, which is built on the Gemini model to process information faster by encoding video frames and combining video and speech input. The system's efficiency is discussed, along with the announcement of a new generative video model called 'vo' that creates high-quality videos from various prompts. The paragraph concludes with the unveiling of the sixth generation of TPU, Trillion, and the introduction of new CPUs and GPUs, emphasizing Google's commitment to offering diverse and powerful hardware options for cloud customers.

05:04

🔍 Enhanced Search and Personalized AI with Gemini

The second paragraph delves into the transformation of Google search with the help of Gemini, where users are engaging with a new generative search experience that allows for more complex queries and even photo-based searches. The speaker announces the launch of an AI-driven search experience that will be available to users in the US, with plans for global expansion. The paragraph also covers a new live conversational experience with Gemini, which utilizes Google's latest speech models for better understanding and natural responses. Additionally, the concept of 'gems' is introduced, allowing users to create personalized AI experts on any topic. The paragraph concludes with a demonstration of how Gemini can be used to assist with tasks such as solving physics problems and understanding sports rules, showcasing its contextual awareness and ability to provide helpful suggestions.

10:05

📱 AI Integration in Android and the Future of Mobile Experiences

The third paragraph focuses on the integration of Google AI directly into the Android operating system, enhancing the smartphone experience by making Android the first mobile OS to include a built-in on-device Foundation model. This integration aims to bring the benefits of Gemini to users' pockets while protecting their privacy. The speaker discusses the upcoming expansion of capabilities with the latest model, Gemini Nano, which introduces multimodality, allowing the phone to understand the world through text, sound, and spoken language. The paragraph ends with a light-hearted moment where the speaker acknowledges the frequent mention of AI during the presentation and provides a humorous touch by counting the occurrences.

Mindmap

Keywords

💡Gemini models

Gemini models refer to a set of advanced AI tools utilized by over 1.5 million developers for debugging code, gaining insights, and developing AI applications. They are central to the theme of leveraging AI for innovation and efficiency, as mentioned in the context of using these models across Google's various products like search, photos, workspace, and Android.

💡Project Astra

Project Astra is an exciting new development in AI assistance that builds upon the Gemini model. It involves developing agents capable of processing information more rapidly by encoding video frames continuously, combining video and speech input into a timeline of events, and caching this information for efficient recall. This project is pivotal to the video's narrative on advancing AI capabilities for faster and more integrated processing of multimedia data.

💡TPUs (Tensor Processing Units)

TPUs, or Tensor Processing Units, are specialized hardware accelerators designed to speed up machine learning tasks. The announcement of the sixth generation of TPUs, named 'Trillion', highlights a significant 4.7x improvement in compute performance per chip. This keyword is key to understanding Google's commitment to enhancing AI processing capabilities and its role in the video's focus on the future of AI.

💡AI overviews

AI overviews represent a revamped search experience that uses AI to provide users with a more comprehensive and dynamic set of search results. This feature is set to roll out to users in the US and then globally. It is a prime example in the script that illustrates the integration of AI into everyday online activities, offering a more intuitive and personalized search experience.

💡Live using Google's latest speech models

This refers to an upcoming feature that will allow users to have in-depth, natural conversations with Gemini using voice commands. It signifies Google's progress in speech recognition and natural language processing, enabling more interactive and accessible AI experiences. The feature is a part of the broader theme of making AI more user-friendly and integrated into daily life.

💡Gems

Gems are customizable features within the Gemini app that allow users to create personal experts on any topic they desire. They are designed to be simple to set up and use, offering personalized AI assistance. Gems are an example of the video's emphasis on tailoring AI technology to individual needs and preferences.

💡Android with AI at the core

This phrase encapsulates Google's vision for the future of its mobile operating system, where AI is an integral part of the user experience. The integration of AI in Android aims to create new ways to access information, provide contextual assistance, and maintain privacy. It is a key component of the video's narrative on the transformational potential of AI in mobile technology.

💡Gemini Nano

Gemini Nano is the latest model of Google's AI technology that is set to expand the capabilities of Android devices with multimodality. This means the phone will understand the world not just through text but also through sight, sounds, and spoken language. It represents the video's focus on making AI more intuitive and responsive to various forms of human interaction.

💡Video FX

Video FX is an experimental tool mentioned in the script that will allow users to create high-quality 1080p videos from text, image, and video prompts using the new generative video model called 'vo'. This tool signifies the video's theme of providing creative control and enhancing the ability to generate content through AI.

💡Search generative experience

The search generative experience is a feature that has been tested and is planned for wider rollout. It involves using AI to answer queries in new ways, including handling more complex and longer queries, and even searching with photos. This experience is tied to the video's overarching theme of enhancing search capabilities with AI to improve user satisfaction and search efficiency.

💡Custom arm-based CPU

The custom arm-based CPU is Google's first in-house processor designed with industry-leading performance and energy efficiency. It represents a significant step in the video's discussion about the hardware advancements that support advanced AI functionalities and computational demands, highlighting Google's commitment to building end-to-end solutions for AI.

Highlights

Google IO welcomes over 1.5 million developers using Gemini models for debugging code, gaining insights, and building AI applications.

Gemini's capabilities are being integrated across Google's products like search, photos, workspace, Android, and more.

Project Astra is introduced, an advancement in AI assistance that processes information faster by encoding video frames and combining inputs into a timeline.

A new generative video model called 'vo' is announced, capable of creating high-quality 1080p videos from various prompts.

Sixth generation of TPUs, named Trillian, offers a 4.7x improvement in compute performance per chip.

Google is offering CPUs and GPUs, including the new Axion processors and Nvidia's Blackwell GPUs, for cloud customers.

Gemini has transformed Google search, enabling new ways to search with longer and more complex queries, including photo searches.

A fully revamped AI overview experience for search is being launched in the US with plans for global expansion.

Google's new search experience uses Gemini to uncover interesting angles and organize results into helpful clusters.

An AI overview feature provides instant troubleshooting steps for issues, like why a device might not be staying in place.

Live conversational experiences with Gemini using the latest speech models allow for natural interactions and real-time adjustments.

Customization of Gemini through 'gems' allows users to create personal experts on any topic.

Android is being reimagined with AI at its core, starting with AI-powered search, a new AI assistant, and on-device AI for fast, private experiences.

Circle the search feature on Android provides step-by-step instructions for solving problems, like physics word problems.

Gemini becomes context-aware on Android, offering more helpful suggestions in the moment.

Google AI is being integrated directly into the OS, starting with Android being the first mobile OS with a built-in on-device Foundation model.

Gemini Nano, with multimodality, will be expanded to understand the world through text, sights, sounds, and spoken language.

Google counted the number of times 'AI' was mentioned during the presentation as part of the theme of letting Google do the work.

Transcripts

play00:00

welcome to Google IO it's great to have

play00:02

all of you with us more than 1.5 million

play00:05

developers use Gemini models across our

play00:08

tools you're using it to debug code get

play00:11

new insights and the build build the

play00:14

next generation of AI

play00:16

applications we've also been bringing

play00:18

Gemini's breakthrough capabilities

play00:20

across our products in powerful ways

play00:23

we'll show examples today across search

play00:26

photos workspace Android and more today

play00:30

we have some exciting new progress to

play00:31

share about the future of AI assistance

play00:34

that we're calling project Astra

play00:36

building on our Gemini model we

play00:38

developed agents that can process

play00:40

information Faster by continuously

play00:42

encoding video frames combining the

play00:44

video and speech input into a timeline

play00:46

of events and caching this for efficient

play00:49

recall tell me when you see something

play00:51

that makes

play00:53

sound I see a speaker which makes sound

play00:56

do you remember where you saw my glasses

play01:00

yes I do your glasses were on the desk

play01:03

near a red

play01:08

apple what can I add here to make this

play01:11

system

play01:14

faster adding a cach between the server

play01:17

and database could improve

play01:20

speed what does this remind you

play01:25

of shringer cat today I'm excited to

play01:28

announce our newest most capable

play01:30

generative video model called

play01:37

vo vo creates high quality 1080p videos

play01:41

from text image and video prompts it can

play01:44

capture the details of your instructions

play01:46

in different Visual and cinematic Styles

play01:49

you can prompt for things like aerial

play01:50

shots of a landscape or a time lapse and

play01:53

further edit your videos using

play01:54

additional prompts you can use vo in our

play01:57

new experimental tool called video FX

play02:00

we're exploring features like

play02:02

storyboarding and generating longer

play02:04

scenes vo gives you unprecedented

play02:07

creative control core technology is

play02:10

Google deep mind's generative video

play02:12

model that has been trained to convert

play02:15

input text into output

play02:18

video it looks good we are able to bring

play02:21

ideas to life that were otherwise not

play02:24

possible we can visualize things on a

play02:26

time scale that's 10 or 100 times faster

play02:28

than before today we are excited to

play02:31

announce the sixth generation of tpus

play02:33

called

play02:39

trillion trillum delivers a 4.7x

play02:42

Improvement in compute performance per

play02:44

chip over the previous generation it's

play02:47

our most efficient and performant TPU

play02:50

today we'll make trillum available to

play02:53

our Cloud customers in late

play02:55

2024 alongside our tpus we are proud to

play02:59

offer CPUs and gpus to support any

play03:01

workload that includes the new Axion

play03:04

processes we announced last month our

play03:06

first custom arm-based CPU with

play03:08

industry-leading performance and Energy

play03:11

Efficiency we are also proud to be one

play03:13

of the first Cloud providers to offer

play03:16

envidia Cutting Edge Blackwell gpus

play03:19

available in early 2025 one of the most

play03:22

exciting Transformations with Gemini has

play03:24

been in Google search in the past year

play03:27

we answered billions of queries as part

play03:30

of her search generative experience

play03:32

people are using it to search in

play03:34

entirely new ways and asking new types

play03:37

of questions longer and more complex

play03:40

queries even searching with photos and

play03:44

getting back the best the web has to

play03:46

offer we've been testing this experience

play03:48

outside of labs and we are encouraged to

play03:52

see not only an increase in search usage

play03:54

but also an increase in user

play03:56

satisfaction I'm excited to announce

play03:58

that we will begin will'll begin

play04:00

launching this fully revamped experience

play04:03

AI overviews to everyone in the US this

play04:05

week and we'll bring it to more

play04:07

countries soon say you're heading to

play04:10

Dallas to celebrate your anniversary and

play04:12

you're looking for the perfect

play04:14

restaurant what you get here breaks AI

play04:17

out of the box and it brings it to the

play04:19

whole

play04:20

page our Gemini model uncovers the most

play04:22

interesting angles for you to explore

play04:25

and organizes these results into these

play04:27

helpful

play04:28

clusters like like you might never have

play04:30

considered restaurants with live

play04:32

music or ones with historic

play04:35

charm our model even uses contextual

play04:38

factors like the time of the year so

play04:40

since it's warm in Dallas you can get

play04:42

rooftop patios as an

play04:44

idea and it pulls everything together

play04:46

into a dynamic whole page

play04:49

experience you'll start to see this new

play04:52

AI organized search results page when

play04:54

you look for inspiration starting with

play04:57

dining and recipes and coming to movies

play04:59

music books hotels shopping and more I'm

play05:03

going to take a video and ask

play05:05

Google why will does not stay in

play05:10

place and in a near instant Google gives

play05:14

me an AI overview I guess some reasons

play05:17

this might be happening and steps I can

play05:19

take to troubleshoot so looks like first

play05:22

this is called a tonger very helpful and

play05:25

it looks like it may be unbalanced and

play05:27

there's some really helpful steps here

play05:29

and I love that because I'm new to all

play05:31

this I can check out this helpful link

play05:33

from Audio Technica to learn even more

play05:36

and this summer you can have an in-depth

play05:38

conversation with Gemini using your

play05:40

voice we're calling this new experience

play05:44

live using Google's latest speech models

play05:48

Gemini can better understand you and

play05:50

answer naturally you can even interrupt

play05:53

while Gemini is responding and it will

play05:55

adapt to your speech

play05:57

patterns and this is just the beginning

play06:00

we're excited to bring the speed gains

play06:02

and video understanding capabilities

play06:05

from Project Astra to the Gemini app

play06:08

when you go live you'll be able to open

play06:10

your camera so Gemini can see what you

play06:13

see and respond to your surroundings in

play06:16

real

play06:17

time now the way I use Gemini isn't the

play06:21

way you use Gemini so we're rolling out

play06:23

a new feature that lets you customize it

play06:25

for your own needs and create personal

play06:28

experts on any any topic you want we're

play06:31

calling these gems they're really simple

play06:34

to set up just tap to create a gem write

play06:37

your instructions once and come back

play06:39

whenever you need it we've embarked on a

play06:41

multi-year journey to reimagine Android

play06:44

with AI at the core and it starts with

play06:48

three breakthroughs you'll see this

play06:51

year first we're putting AI powered

play06:54

search right at your fingertips creating

play06:57

entirely new ways to get the answers you

play07:00

need second Gemini is becoming your new

play07:04

AI assistant on Android there to help

play07:06

you any time and third we're harnessing

play07:10

on device AI to unlock new experiences

play07:13

that work as fast as you do while

play07:16

keeping your sensitive data private one

play07:19

thing we've heard from students is that

play07:21

they're doing more of their schoolwork

play07:23

directly on their phones and tablets so

play07:26

we thought could Circle the search be

play07:29

your perfect study

play07:31

buddy let's say my son needs help with a

play07:33

tricky physics word problem like this

play07:36

one my first thought is oh boy it's been

play07:40

a while since I've thought about

play07:42

kinematics if he stumped on this

play07:44

question instead of putting me on the

play07:46

spot he can Circle the exact part he's

play07:48

stuck on and get stepbystep

play07:51

instructions right where he's already

play07:53

doing the work now we're making Gemini

play07:56

context aware so it can anticipate what

play07:59

you're trying to do and provide more

play08:01

helpful suggestions in the Moment In

play08:04

other words to be a more helpful

play08:06

assistant so let me show you how this

play08:08

works and I have my shiny new pixel 8A

play08:11

here to help

play08:16

me so my friend Pete is asking if I want

play08:19

to play pickle ball this weekend and I

play08:21

know how to play tennis sort of I had to

play08:23

say that for the demo uh but I'm new to

play08:25

this pickle ball thing so I'm going to

play08:27

reply and try to be funny and I'll say

play08:29

uh is that like tennis but with uh

play08:33

pickles um this would be actually a lot

play08:35

funnier with a meme so let me bring up

play08:37

Gemini to help with that and I'll say uh

play08:40

create image of tennis with Pickles now

play08:45

one you think you'll notice is that the

play08:46

Gemini window now hovers in place above

play08:49

the app so that I stay on the

play08:51

flow okay so that generates some pretty

play08:53

good images uh what's nice is I can then

play08:55

drag and drop any of these directly into

play08:57

the messages app below so like so and

play09:00

now I can ask specific questions about

play09:03

the video so for example uh what is is

play09:08

kind type the two bounce rule because

play09:11

that's something that I've heard about

play09:12

but don't quite understand in the game

play09:14

by the way this us signals like

play09:16

YouTube's captions which means you can

play09:18

use it on billions of videos so give it

play09:20

a moment and there and get a nice

play09:24

distinct answer the ball must B once on

play09:26

each side of the Court uh after a serve

play09:28

so instead of trolling through this

play09:30

entire document I can pull up Gemini to

play09:33

help and again Gemini anticipates what I

play09:36

need and offers me an ask this PDF

play09:39

option so if I tap on that Gemini now

play09:42

ingests all of the rules to become a

play09:44

pickle ball expert and that means I can

play09:47

ask very esoteric questions like for

play09:49

example are

play09:51

spin uh

play09:54

serves allowed and there you have it it

play09:57

turns out nope spin serves are not

play10:00

allowed so Gemini not only gives me a

play10:03

clear answer to my question it also

play10:05

shows me exactly where in the PDF to

play10:07

learn more building Google AI directly

play10:09

into the OS elevates the entire

play10:11

smartphone experience and Android is the

play10:14

first mobile operating system to include

play10:16

a built-in on device Foundation model

play10:20

this lets us bring Gemini goodness from

play10:21

the data center right into your pocket

play10:24

so the experience is faster while also

play10:27

protecting your privacy starting with

play10:29

pixel later this year we'll be expanding

play10:32

what's possible with our latest model

play10:34

Gemini Nano with

play10:36

multimodality this means your phone can

play10:39

understand the world the way you

play10:40

understand it so not just through text

play10:42

input but also through sites sounds and

play10:46

spoken language before we wrap I have a

play10:49

feeling that someone out there might be

play10:51

counting how many times you have

play10:53

mentioned AI today

play10:56

[Applause]

play10:59

and since the big theme today has been

play11:01

letting Google do the work for you we

play11:03

went ahead and counted so that you don't

play11:07

have

play11:08

[Applause]

play11:18

to that might be a record in how many

play11:21

times someone has said AI

Rate This

5.0 / 5 (0 votes)

Related Tags
AI FutureGoogle IOGemini ModelsProject AstraTPU GenerationDeveloper ToolsAI AssistanceGenerative VideoSearch InnovationAndroid AICustom CPUNvidia GPUUser SatisfactionAI OverviewLive SpeechPersonalizationMultimodal AIData Privacy