Google I/O 2024 keynote in 17 minutes

The Verge
14 May 202417:03

Summary

TLDRGoogle IO has unveiled a plethora of advancements in AI technology, focusing on enhancing user experiences across various platforms. The event highlighted the launch of Gemini 1.5 Pro, an AI model capable of understanding complex contexts with a 1 million token context window, expanding to 2 million tokens for developers. New features include multimodality, allowing for richer interactions, and the introduction of Gemini 1.5 Flash, a lighter model for developers. Project Astra promises future AI assistance with capabilities like identifying parts of objects and understanding code encryption. Google also introduced Imagine 3 for photorealistic image generation, Music AI Sandbox for professional music creation, and VR, a generative video model. Trillium, the sixth generation of TPUs, offers significant compute performance improvements. Google Search will incorporate multi-step reasoning for complex queries, and Gmail mobile will receive new capabilities like summarization and Q&A features. Additional tools like video FX and personalized trip planning in Gemini Advanced were also announced, along with the upcoming release of Gemini Nano with multimodality for accessibility. The event concluded with the announcement of Poly Gemma, an open vision-language model, and the upcoming Jimma 2, signifying Google's commitment to AI innovation.

Takeaways

  • 🌟 Google IO introduces a revamped AI experience with expanded capabilities for context understanding and multimodal interactions.
  • 🚗 Gemini, Google's AI, facilitates tasks like identifying a car in a parking station and providing the license plate number.
  • 🏊‍♂️ Gemini's advanced search capabilities can recognize different contexts, such as swimming laps in a pool versus snorkeling in the ocean.
  • 🔍 The launch of Gemini 1.5 Pro with a 1 million token context window, available globally for developers and consumers in multiple languages.
  • 📈 Expansion of the context window to 2 million tokens, marking progress towards the goal of infinite context for more complex queries and answers.
  • 🎥 Project Astra is a new initiative in AI assistance that includes advancements in generative media tools for images, music, and video.
  • 📚 Google's AI can create personalized learning experiences, like a science discussion tailored for a student's interests.
  • 📈 Imagine 3, a new model for photorealistic image generation, is announced, offering richer details and fewer visual artifacts.
  • 📹 VR, a new generative video model, can create high-quality 1080p videos from text, image, and video prompts in various styles.
  • 🧘‍♀️ Google search will soon include multi-step reasoning to answer complex questions, such as finding the best yoga studios and their offers.
  • 📧 Gmail mobile will receive new features powered by Gemini, including a summarize option and a Q&A feature for quick responses to emails.

Q & A

  • What is the new feature that Google is launching for a fully revamped experience?

    -Google is launching a fully revamped AI overviews feature, which is initially available to everyone in the US and will be expanded to more countries soon.

  • How does Gemini assist in recognizing and identifying a user's car in a parking station?

    -Gemini uses AI to recognize cars that appear often, triangulates which one is the user's, and provides the license plate number.

  • What does the term 'multimodality' refer to in the context of Gemini's capabilities?

    -Multimodality in Gemini refers to the ability to recognize and process different types of data and contexts, such as text, images, audio, and video, to provide more comprehensive answers.

  • What is the significance of the 1 million token context window in Gemini 1.5 Pro?

    -The 1 million token context window in Gemini 1.5 Pro allows for the processing of long contexts, such as hundreds of pages of text, hours of audio, or a full hour of video, which is a significant step towards handling infinite context.

  • How does Gemini help in summarizing a long meeting recording?

    -If the meeting is recorded using Google Meet, Gemini can be asked to provide highlights of the meeting, summarizing the key points without the need to listen to the entire recording.

  • What is the purpose of the 'flash' model in Gemini 1.5?

    -The Gemini 1.5 Flash is a lighter weight model compared to the Pro version, designed to be more accessible and cost-effective for users with up to 1 million tokens in Google AI Studio and Vertex AI.

  • What is the new generative media tool introduced by Google called?

    -The new generative media tool introduced by Google is called Imagine 3, which is more photorealistic and capable of producing high-quality images with rich details.

  • What is the name of the new generative video model announced by Google?

    -The new generative video model announced by Google is called VR, which can create high-quality 1080p videos from text, image, and video prompts.

  • What is the name of the sixth generation of TPUs developed by Google?

    -The sixth generation of TPUs developed by Google is called Trillium, which offers a significant improvement in compute performance per chip.

  • How does the new Gemini powered side panel enhance Gmail mobile?

    -The new Gemini powered side panel in Gmail mobile provides a summary of the salient information from emails, allows users to ask questions directly from the mobile card, and offers quick answers without having to open the email.

  • What is the purpose of the 'gems' feature in the Gemini app?

    -The 'gems' feature in the Gemini app allows users to create personalized experts on any topic. These gems can be customized with specific instructions and used whenever the user needs information or assistance on that topic.

  • What is the new capability that allows users to interact with Gemini using voice?

    -The new capability is called 'live', which enables users to have in-depth conversations with Gemini using their voice and allows Gemini to see what the user sees through the camera and respond to the surroundings in real time.

Outlines

00:00

🚀 Launch of AI Overviews and Gemini Features

The script introduces the audience to Google IO and announces the launch of a revamped AI experience. It discusses the expansion of AI overviews across the US and upcoming availability in more countries. The role of Gemini in simplifying tasks such as identifying a user's car in a parking station and providing the license plate number is highlighted. The script also covers the advanced capabilities of Gemini, including recognizing different contexts and handling complex queries with multimodality and long context support. Gemini 1.5 Pro is introduced with 1 million token context windows, and an expansion to 2 million tokens is announced. The ability of Gemini to provide meeting highlights and draft applications is also mentioned, along with the introduction of Gemini 1.5 Flash, a lighter model for AI assistance.

05:01

🎨 New AI Tools and Project Astra

The paragraph discusses the introduction of Imagine 3, a photorealistic AI tool that can create high-quality images with rich details. It also covers the development of Music AI Sandbox, a suite of professional music AI tools, and the unveiling of VR, a generative video model that can create 1080p videos from various prompts. The importance of consistency in space and time for objects and subjects in videos is emphasized. The paragraph also mentions the sixth generation of TPUs, Trillium, which offers significant improvements in compute performance. Additionally, it covers updates to Google search, including multi-step reasoning and the ability to ask questions with video. New features for Gmail mobile are also announced, such as the summarize option and Q&A features, with capabilities for organizing and tracking receipts.

10:01

🤖 Virtual Teammate Chip and Personalized AI Tools

The script introduces Chip, a virtual Gemini-powered teammate designed to monitor and track projects, organize information, and provide context. Chip is shown to flag potential issues and create documentation to address them. The paragraph also discusses the upcoming feature 'live' which allows for real-time interaction with Gemini using voice and camera input. The concept of 'gems' is introduced, which are personalized AI experts on various topics, created by users for their specific needs. The trip planning experience in Gemini Advanced is detailed, showcasing how it gathers information to create a personalized vacation plan. The ability to upload and analyze academic and business-related documents with Gemini is also highlighted, along with the upcoming expansion of the long context window and context-aware features.

15:03

📈 Pricing, Accessibility, and Future AI Developments

The final paragraph covers the pricing details for Gemini 1.5 Pro and Flash, with a special offer for prompts up to 128k tokens. It introduces Poly Gemma, the first Vision language open model, and teases the upcoming release of Jimma 2. The expansion of Synth ID to text and video modalities is announced, with plans to open source Synth ID text watermarking. Learn LM, a new family of models based on Gemini and tailored for learning, is introduced, with pre-made gems being developed for various educational needs. The script ends with a light-hearted note on the frequent mention of AI during the presentation and a forward-looking statement on future possibilities.

Mindmap

Keywords

💡Google IO

Google IO is Google's annual developer conference where the company announces new products, updates to existing services, and discusses the future of technology. In the script, it's the event where the speaker introduces various AI advancements and updates, indicating its significance in the tech industry and the video's theme.

💡AI Overviews

AI Overviews refer to a feature that uses artificial intelligence to provide a comprehensive summary or analysis. In the context of the video, it's mentioned as a new or revamped experience being launched, suggesting that it plays a central role in the narrative of enhancing user experience through AI.

💡Gemini

Gemini is referenced as an AI model or system that is capable of performing advanced tasks such as recognizing contexts, searching, and generating responses. It is integral to the video's theme of showcasing AI's evolving capabilities, with mentions of its ability to understand different contexts and perform complex operations.

💡Multimodality

Multimodality in AI refers to the ability of a system to process and understand information from multiple modes of input, such as text, audio, and video. The script highlights this concept as a feature that expands the types of questions users can ask and the richness of the answers they receive, emphasizing the theme of AI's growing versatility.

💡Gemini 1.5 Pro

Gemini 1.5 Pro is presented as an improved version of an AI system with an expanded context window, allowing it to process more information and provide more nuanced responses. It represents a step towards the video's overarching message of continuous AI development and its potential to transform various aspects of life and work.

💡Project Astra

Project Astra is introduced as a future initiative related to AI assistance. Although not elaborated upon in detail in the script, its mention contributes to the video's portrayal of ongoing innovation in AI, suggesting further advancements to come.

💡TPUs (Tensor Processing Units)

TPUs are specialized hardware accelerators developed by Google that are used to speed up machine learning tasks. The script mentions the sixth generation of TPUs, called Trillium, which offers significant improvements in compute performance. This ties into the video's emphasis on the infrastructure that supports advanced AI capabilities.

💡Google Search Updates

The script discusses upcoming updates to Google Search that incorporate multi-step reasoning and the ability to process video queries. These updates are part of the video's focus on making AI more accessible and useful in everyday tasks, showcasing how search technology is becoming more integrated with AI.

💡Gmail Mobile

Gmail Mobile is highlighted for its new capabilities powered by Gemini, such as summarizing emails and providing quick answers without opening them. This feature is tied to the video's theme of enhancing productivity and efficiency through AI-driven tools.

💡Gemini Nano

Gemini Nano is mentioned as the latest model in the Gemini series, with a focus on multimodality and accessibility. It represents the video's narrative of AI becoming more integrated and helpful in various aspects of life, including for those with accessibility needs.

💡Gems

Gems are个性化的AI工具,用户可以创建它们来满足特定的需求或兴趣,如个人写作教练。在视频中,它们展示了AI如何被定制化以增强个人体验,强调了个性化AI工具在提高用户满意度方面的潜力。

Highlights

Google IO introduces a fully revamped AI experience with a launch of AI overviews in the US, with plans for expansion to more countries.

Gemini's AI capabilities are enhanced to recognize different contexts and provide more complex answers, such as identifying a user's car in a parking station.

The rollout of Gemini 1.5 Pro with a 1 million token context window, available globally for developers and consumers across 35 languages.

Expansion of the context window to 2 million tokens, marking a step towards the goal of infinite context.

Google Meet integration allows Gemini to provide meeting highlights from hour-long recordings.

Introduction of Gemini 1.5 Flash, a lighter model compared to Pro, available for use in Google AI studio and Vertex AI.

Project Astra aims to advance the future of AI assistance with new capabilities in sound and code analysis.

Imagine 3, a new generative media tool, offers more photorealistic images with richer details and fewer visual artifacts.

Music AI Sandbox by Google and YouTube allows creation of new instrumental sections and style transfers between tracks.

VR, a generative video model, creates high-quality 1080p videos from text, image, and video prompts in various visual and cinematic styles.

Sixth generation TPUs, called Trillium, offer a 4.7x improvement in compute performance per chip.

Google search will introduce multi-step reasoning to answer complex questions, such as finding the best yoga studios in Boston.

A new feature in Gmail mobile allows for quick summarization of emails and a Q&A feature for quick answers within the inbox.

Gemini's context awareness enables it to generate images based on text prompts, such as creating an image of tennis with pickles.

Talk back, an accessibility feature, will be enhanced with multimodal capabilities of Gemini Nano for richer descriptions without a network connection.

Gemini 1.5 Pro is priced at $7 per 1 million tokens, with a 50% discount for prompts up to 128k tokens.

Poly Gemma, the first Vision language open model, is now available, and Jimma 2, the next generation of Gemma, will be available in June.

Synthetic ID (synth ID) is being expanded to include text and video modalities, with plans to open source synth ID text in the coming months.

Learn LM, a new family of models based on Gemini and fine-tuned for learning, will include pre-made gems for various educational needs.

Transcripts

play00:00

[Applause]

play00:02

[Music]

play00:06

Google we all ready to do a little

play00:09

Googling welcome to Google IO it's great

play00:11

to have all of you with us we'll begin

play00:13

launching this fully revamped experience

play00:16

AI overviews to everyone in the US this

play00:19

week and we'll bring it to more

play00:21

countries soon with Gemini you're making

play00:24

that a whole lot easier say you're at a

play00:26

parking station ready to pay now you can

play00:30

simply ask photos it knows the cars that

play00:33

appear often it triangulates which one

play00:35

is yours and just tells you the license

play00:38

plate number you can even follow up with

play00:41

something more complex show me how Luci

play00:44

swimming has progressed here Gemini goes

play00:48

beyond a simple search recognizing

play00:50

different contexts from doing laps in

play00:53

the pool to snorkeling in the ocean we

play00:56

are rolling out as photos this this

play00:58

summer with more capabilities to come

play01:01

multimodality radically expands the

play01:03

questions we can ask and the answers we

play01:04

will get back long context takes this a

play01:08

step further enabling us to bring in

play01:10

even more information hundreds of pages

play01:13

of text hours of audio a full hour of

play01:17

video or entire code repost you need a 1

play01:20

million token context window now

play01:22

possible with Gemini 1.5 Pro I'm excited

play01:25

to announce that we are bringing this

play01:26

improved version of Gemini 1.5 Pro to to

play01:30

all developers globally Gemini 1.5 Pro

play01:34

with 1 million contexts is now directly

play01:37

available for consumers in Gemini

play01:39

Advanced and can be used across 35

play01:42

languages so today we are expanding the

play01:45

context window to 2 million

play01:49

tokens this represents the next step on

play01:51

our journey towards the ultimate goal of

play01:54

infinite context and you couldn't make

play01:55

the PTA meeting the recording of the

play01:58

meeting is an hour along if it's from

play02:01

Google meet you can ask Gemini to give

play02:03

you the

play02:04

highlights there's a parents group

play02:06

looking for volunteers you're free that

play02:08

day of course Gemini can draft a apply

play02:12

Gemini 1.5 Pro is available today in

play02:14

workspace Labs notebook LM is going to

play02:17

take all the materials on the left as

play02:19

input and output them into a lively

play02:23

science discussion personalized for him

play02:26

so let's uh let's dive into physics

play02:27

what's on deck for today well uh we're

play02:30

starting with the basics force and

play02:31

motion okay and that of course means we

play02:33

have to talk about Sir Isaac Newton and

play02:35

his three laws of motion and what's

play02:37

amazing is that my son and I can join

play02:39

into the conversation and steer it

play02:42

whichever direction we want when I tap

play02:46

join hold on we have a question what's

play02:48

up

play02:49

Josh yeah can you give my son Jimmy a

play02:53

basketball

play02:57

example hey Jimmy that's a fantastic

play03:00

idea basketball is actually a great way

play03:03

to visualize force and motion let's

play03:05

break it down okay so first imagine a

play03:07

basketball just sitting there on the

play03:09

court it's not moving right that's

play03:11

because all the forces acting on it are

play03:13

balanced the downward pull of grav it

play03:16

connected the dots and created that age

play03:18

appropriate example for him making AI

play03:22

helpful for everyone last year we

play03:24

reached a milestone on that path when we

play03:26

formed Google Deep Mind So today we're

play03:29

introducing

play03:30

Gemini 1.5 flash flash is a lighter

play03:33

weight model compared to Pro starting

play03:35

today you can use 1.5 Flash and 1.5 Pro

play03:39

with up to 1 million tokens in Google AI

play03:41

studio and vertex AI today we have some

play03:44

exciting new progress to share about the

play03:47

future of AI assistance that we're

play03:49

calling project Astra tell me when you

play03:52

see something that makes

play03:54

sound I see a speaker which makes sound

play04:00

what is that part of the speaker

play04:03

called that is the Tweeter it produces

play04:06

high frequency

play04:08

sounds what does that part of the code

play04:13

do this code defines encryption and

play04:16

decryption functions it seems to use AES

play04:20

CBC encryption to encode and decode data

play04:23

based on a key and an initialization

play04:25

Vector

play04:27

IV what can I add here here to make this

play04:30

system

play04:33

faster adding a cache between the server

play04:36

and database could improve speed today

play04:39

we're introducing a series of updates

play04:41

across our generative media tools with

play04:43

new models covering image music and

play04:46

video today I'm so excited to introduce

play04:49

imagine 3 imagine 3 is more

play04:52

photorealistic you can literally count

play04:54

the whiskers on its snout with richer

play04:55

details like this incredible sunlight in

play04:58

the shot and fewer visual artifacts or

play05:00

distorted images you can sign up today

play05:02

to try imagine 3 in image FX part of our

play05:05

suite of AI tools at labs. gooogle

play05:08

together with YouTube we've been

play05:09

building music AI sandbox a suite of

play05:13

professional music AI tools that can

play05:15

create new instrumental sections from

play05:17

scratch transfer Styles between tracks

play05:20

and more today I'm excited to announce

play05:22

our newest most capable generative video

play05:25

model called

play05:27

VR VR creates high quality 1080p videos

play05:31

from text image and video prompts it can

play05:35

capture the details of your instructions

play05:36

in different Visual and cinematic Styles

play05:39

you can prompt for things like aerial

play05:41

shots of a landscape or time lapse and

play05:43

further edit your videos using

play05:45

additional prompts you can use vo in our

play05:48

new experimental tool called video FX

play05:51

we're exploring features like

play05:52

storyboarding and generating longer

play05:54

scenes not only is it important to

play05:57

understand where an object or subject

play05:58

should be in space it needs to maintain

play06:00

this consistency over time just like the

play06:03

car in this video over the coming weeks

play06:06

some of these features will be available

play06:08

to select creators through video effects

play06:10

at labs. gooogle and the weit list is

play06:13

open now today we are exited to announce

play06:16

the sixth generation of tpus called

play06:19

Trillium Trillium delivers a 4.7x

play06:23

Improvement in compute performance per

play06:25

chip over the previous generation will

play06:28

make Trillium available to our Cloud

play06:30

customers in late 2024 we're making AI

play06:33

overviews even more helpful for your

play06:35

most complex questions to make this

play06:37

possible we're introducing multi-step

play06:39

reasoning in Google search soon you'll

play06:41

be able to ask search to find the best

play06:43

yoga or Pilates studios in Boston and

play06:46

show you details on their intro offers

play06:48

and the walking time from Beacon Hill

play06:50

you get some studios with great ratings

play06:52

and their introductory offers and you

play06:54

can see the distance for each like this

play06:57

one it's just a 10-minute walk away

play07:00

right below you see where they're

play07:01

located laid out visually it breaks your

play07:04

bigger question down into all its parts

play07:07

and it figures out which problems it

play07:09

needs to solve and in what

play07:11

order next take planning for example now

play07:15

you can ask search to create a 3-day

play07:16

meal plan for a group that's easy to

play07:19

prepare and here you get a plan with a

play07:22

wide range of recipes from across the

play07:24

web if you want to get more veggies in

play07:26

you can simply ask search to swap in a

play07:28

vegetarian dish and you can export your

play07:30

meal plan or get the ingredients as a

play07:32

list just by tapping here soon you'll be

play07:35

able to ask questions with video right

play07:38

in Google search I'm going to take a

play07:40

video and ask

play07:42

Google why will this not stay in

play07:46

place and a near instant Google gives me

play07:50

an AI overview I guess some reasons this

play07:53

might be happening and steps I can take

play07:55

to troubleshoot you'll start to see

play07:57

these features rolling out in search in

play07:59

the coming weeks and now we're really

play08:02

excited that the new Gemini powered side

play08:05

panel will be generally available next

play08:10

month three new capabilities coming to

play08:13

Gmail mobile it looks like there's an

play08:17

email threat on this with lots of emails

play08:19

that I haven't read and luckily for me I

play08:22

can simply tap the summarize option up

play08:26

top and Skip reading this long back and

play08:28

forth now Gemini pulls up this helpful

play08:32

Mobile card as an overlay and this is

play08:35

where I can read a nice summary of all

play08:38

the Salient information that I need to

play08:40

know now I can simply type out my

play08:43

question right here in the Mobile card

play08:45

and say something like compare my roof

play08:48

repair bids by price and availability

play08:50

this new Q&A feature makes it so easy to

play08:53

get quick answers on anything in my

play08:55

inbox without having to First search

play08:56

Gmail then open the email and then look

play08:58

for the specific information and

play09:00

attachments and so on I see some

play09:02

suggested replies from Gemini now here I

play09:04

see I have declined the service

play09:06

suggested new time these new

play09:09

capabilities in Gemini and Gmail will

play09:11

start rolling out this month to Labs

play09:14

users it's got a PDF that's an

play09:16

attachment from a hotel as a receipt and

play09:19

I see a suggestion in the side panel

play09:21

help me organize and track my receipts

play09:24

step one create a drive folder and put

play09:27

this receipt and 37 others it's found

play09:30

into that folder step two extract the

play09:33

relevant information from those receipts

play09:35

in that folder into a new spreadsheet

play09:37

Gemini offers you the option to automate

play09:40

this so that this particular workflow is

play09:43

run on all future emails Gemini does the

play09:46

hard work of extracting all the right

play09:48

information from all the files and in

play09:50

that folder and generates this sheet for

play09:52

you show me where the money is

play09:54

spent Gemini not only analyzes the data

play09:57

from the sheet but also creates a nice

play10:01

visual to help me see the complete

play10:03

breakdown by category this particular

play10:06

ability will be rolling out to Labs

play10:08

users this September we're prototyping a

play10:12

virtual Gemini powered teammate Chip's

play10:16

been given a specific job role with a

play10:18

set of descriptions on how to be helpful

play10:20

for the team you can see that here and

play10:22

some of the jobs are to Monitor and

play10:23

track projects we've listed a few out to

play10:25

organize information and provide context

play10:27

and a few more things are we on

play10:31

track for

play10:34

launch chip gets to work not only

play10:36

searching through everything it has

play10:38

access to but also synthesizing what's

play10:40

found and coming back with an up-to-date

play10:44

response there it is a clear timeline a

play10:47

nice summary and notice even in this

play10:48

first message here chip Flags a

play10:51

potential issue the team should be aware

play10:52

of because we're in a group space

play10:54

everyone can follow along anyone can

play10:56

jump in at any time as you see someone

play10:59

just did asking chip to help create a

play11:01

doc to help address the issue and this

play11:04

summer you can have an in-depth

play11:06

conversation with gini using your voice

play11:09

we're calling this new experience live

play11:12

when you go live you'll be able to open

play11:15

your camera so Gemini can see what you

play11:17

see and respond to your surroundings in

play11:20

real time so we're rolling out a new

play11:22

feature that lets you customize it for

play11:25

your own needs and create personal

play11:27

experts on any topic you want we're

play11:30

calling these gems just tap to create a

play11:34

gem write your instructions once and

play11:36

come back whenever you need it for

play11:39

example here's a gem that I created that

play11:41

acts as a personal writing coach it

play11:44

specializes in short stories with

play11:46

mysterious twists and it even Builds on

play11:48

the story drafts in my Google Drive gems

play11:52

will roll out in the coming months that

play11:54

reasoning and intelligence all come

play11:56

together in the new trip planning

play11:58

experience in in Gemini Advanced we're

play12:01

going to Miami my son loves art my

play12:04

husband loves seafood and our flight and

play12:06

hotel details are already in my Gmail

play12:09

inbox to make sense of these variables

play12:12

Gemini starts by gathering all kinds of

play12:15

information from search and helpful

play12:17

extensions like maps and Gmail the end

play12:20

result is a personalized vacation plan

play12:23

presented in Gemini's new Dynamic UI I

play12:27

like these recommendations but my family

play12:29

likes to sleep in so I tap to change the

play12:33

start time and just like that Gemini

play12:37

adjusted my intinerary for the rest of

play12:39

the trip this new trip planning

play12:41

experience will be rolling out to Gemini

play12:43

Advanced this summer you can upload your

play12:45

entire thesis your sources your notes

play12:48

your research and soon interview audio

play12:51

recordings and videos too it can dissect

play12:54

your main points identify improvements

play12:57

and even roleplay as your profession

play13:00

maybe you have a side hustle selling

play13:01

handcrafted products simply upload all

play13:04

of your spreadsheets and ask Gemini to

play13:06

visualize your

play13:08

earnings Gemini goes to work calculating

play13:11

your returns and pulling its analysis

play13:13

together into a single chart and of

play13:15

course your files are not used to train

play13:17

our models later this year we'll be

play13:20

doubling the long context window to two

play13:23

million tokens we're putting AI powered

play13:26

search right at your fingertips create

play13:29

let's say my son needs help with a

play13:30

tricky physics word problem like this

play13:33

one if he stumped on this question

play13:36

instead of putting me on the spot he can

play13:38

Circle the exact part he's stuck on and

play13:41

get stepbystep

play13:42

instructions right where he's already

play13:44

doing the work this new capability is

play13:47

available today now we're making Gemini

play13:51

context aware so my friend Pete is

play13:55

asking if I want to play pickle ball

play13:56

this weekend so I'm going to reply and

play13:58

try to be funny and I'll say uh is that

play14:00

like tennis but with uh pickles and I'll

play14:04

say uh create image of tennis with

play14:08

Pickles now one new thing you'll notice

play14:10

is that the Gemini window now hovers in

play14:12

place above the app so I stay in the

play14:15

flow okay so that generated some pretty

play14:17

good images uh what's nice is I can then

play14:19

drag and drop any of these directly into

play14:22

the messages app below so like so cool

play14:25

let me send that and because it's

play14:27

context aware Gemini knows I'm looking

play14:30

at a video so it proactively shows me an

play14:33

ask this video chip what is is can't

play14:38

type the two bounce rule by the way this

play14:41

uses signals like YouTube's captions

play14:43

which means you can use it on billions

play14:45

of videos so give it a moment and there

play14:49

starting with pixel later this year

play14:51

we'll be expanding what's possible with

play14:53

our latest model Gemini Nano with

play14:56

multimodality so several years ago we

play14:58

developed talk back an accessibility

play15:01

feature that helps people navigate their

play15:03

phone through touch and spoken feedback

play15:06

and now we're taking that to the next

play15:07

level with the multimodal capabilities

play15:09

of Gemini Nano so when someone sends

play15:12

Cara a photo she'll get a richer and

play15:14

clearer description of what's happening

play15:17

and the model even works when there's no

play15:18

network connection these improvements to

play15:21

talk back are coming later this year 1.5

play15:24

Pro is $7 per 1 million tokens and I'm

play15:29

excited to share that for prompts up to

play15:31

128k it'll be 50% less for

play15:36

$3.50 and 1.5 flash will start at 35

play15:41

cents per 1 million tokens and today's

play15:45

newest member poly Gemma our first

play15:49

Vision language open model and it's

play15:51

available right now I'm also too excited

play15:55

to announce that we have Jimma 2 coming

play15:59

it's the next generation of Gemma and it

play16:01

will be available in June today we're

play16:04

expanding synth ID to two new

play16:07

modalities text and

play16:09

video and in the coming months we'll be

play16:12

open sourcing synth ID text water

play16:15

marking I'm excited to introduce learn

play16:18

LM our new family of models based on

play16:22

Gemini and fine-tuned for learning we're

play16:25

developing some pre-made gems which will

play16:28

be available in the Gemini app and web

play16:30

experience including one called learning

play16:33

coach I have a feeling that someone out

play16:35

there might be

play16:36

counting how many times we have

play16:38

mentioned AI today we went ahead and

play16:42

counted so that you don't have

play16:45

[Applause]

play16:48

to that might be a record in how many

play16:50

times someone has said

play16:54

AI here's to the possibilities ahead and

play16:57

creating them together thank you

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
AI InnovationGoogle IOGemini ProMultimodal AISearch EnhancementAI AssistanceProject AstraGenerative MediaTPUsAI StudioVertex AIAI LearningAccessibilityReal-time AIGmail IntegrationTravel PlanningAI PersonalizationContextual AIAI ModelsOpen SourceAI LanguageVision AI
هل تحتاج إلى تلخيص باللغة الإنجليزية؟