Google I/O 2024 keynote in 17 minutes
Summary
TLDRDans cette conférence Google IO, la société a annoncé une série d'améliorations et de lancements technologiques impressionnants. Le nouveau système Gemini 1.5 Pro, avec un contexte étendu à 1 million de jetons, sera disponible pour les développeurs et les consommateurs dans 35 langues. Google prévoit d'élargir la fenêtre de contexte à 2 millions de jetons, un pas en avant vers l'objectif ultime de contexte infini. Les outils génératifs de médias, tels que Image 3, Music AI Sandbox et le modèle de vidéo VR, offriront des expériences plus immersives et personnalisées. La sixième génération de TPU, Trillium, apportera une amélioration significative des performances de calcul. De nouvelles fonctionnalités dans Google Search, comme la raison multi-étapes et la recherche vidéo, rendront la recherche d'informations plus efficace. Les utilisateurs de Gmail mobile bénéficieront de nouvelles capacités telles que la synthèse et la réponse aux questions directement dans l'application. Gemini Nano, avec ses capacités multimodales, améliorera l'accessibilité pour les utilisateurs. Les modèles Gemini 1.5 Flash et Pro seront tarifés compétitifs, et l'introduction de Poly Gemma, le premier modèle de langage vision open, promet de nouvelles possibilités pour les développeurs. L'ensemble de ces annonces montre l'engagement de Google pour l'innovation et l'amélioration continue de l'expérience des utilisateurs.
Takeaways
- 🚀 Lancement d'une nouvelle expérience de recherche améliorée avec Gemini, qui sera disponible pour tous les développeurs à l'échelle mondiale.
- 📈 Gemini 1.5 Pro, avec un contexte de 1 million de tokens, sera directement accessible pour les consommateurs dans Gemini Advanced et sera utilisable dans 35 langues.
- 🔍 Expansion de la fenêtre de contexte à 2 millions de tokens, représentant une étape supplémentaire vers l'objectif ultime du contexte infini.
- 📱 Progrès dans l'assistance IA avec le projet Astra, qui comprend des outils génératifs pour l'image, la musique et la vidéo.
- 🎥 Introduction d'Imagine 3, un modèle plus photoréaliste avec des détails enrichis et moins d'artefacts visuels.
- 🎼 Mise en place de Music AI Sandbox, un ensemble d'outils IA musicale professionnel pour créer de nouvelles sections instrumentales et transférer des styles entre pistes.
- 📹 Présentation du modèle de vidéo génératif le plus performant appelé VR, capable de créer des vidéos 1080p de haute qualité à partir de textes, d'images et de vidéos.
- 🧠 Intégration de la raisonnement à plusieurs étapes dans Google Search pour répondre à des questions plus complexes.
- 📧 Nouveaux développements dans Gmail mobile avec des fonctionnalités telles que la synthèse des e-mails et la réponse aux questions directement dans les cartes mobiles.
- 📊 Prochain lancement de la sixième génération de TPUs appelée Trillium, offrant une amélioration significative des performances de calcul par rapport à la génération précédente.
- 💬 Prochaine disponibilité d'un nouveau tableau de bord Gemini pour Gmail, avec des capacités telles que la création de documents et la visualisation des revenus.
Q & A
Quelle est la nouvelle expérience que Google lance pour les utilisateurs américains ?
-Google lance une expérience entièrement repensée des aperçus de l'IA, qui sera disponible pour tous les utilisateurs américains cette semaine et sera étendue à d'autres pays sous peu.
Comment Gemini facilite-t-il le processus de paiement dans une station de stationnement ?
-Gemini utilise la reconnaissance des images pour identifier la voiture qui apparaît fréquemment, triangule la vôtre et vous donne directement le numéro de plaque. Cela permet de simplifier le processus de paiement.
Quels types de recherches peut-on effectuer avec Gemini 1.5 Pro ?
-Avec Gemini 1.5 Pro, on peut effectuer des recherches multimodales qui reconnaissent différents contextes, allant de la natation en bassin à la plongée sous-marine, et cela avec un contexte long pouvant traiter des centaines de pages de texte, des heures d'audio ou de vidéo, voire des dépôts de code complets.
Quelle est la capacité de contexte de Gemini 1.5 Pro ?
-Gemini 1.5 Pro peut gérer jusqu'à 1 million de jetons de contexte, ce qui permet d'obtenir des réponses plus riches et détaillées.
Comment Gemini peut-il aider dans la planification d'un voyage ?
-Gemini peut rassembler diverses informations à partir de la recherche, des extensions utiles comme Maps et Gmail, pour créer un plan de vacances personnalisé présenté dans la nouvelle interface dynamique de Gemini.
Quels sont les avantages de l'utilisation de Gemini pour la résolution de problèmes scolaires ?
-Gemini peut aider à résoudre des problèmes scolaires complexes en fournissant des instructions étape par étape directement là où l'utilisateur travaille, par exemple en circonscrivant la partie exacte d'un problème qu'un élève est bloqué.
Quelle est la fonctionnalité de Gemini qui permet de répondre à des questions complexes directement dans Gmail mobile ?
-La nouvelle fonctionnalité de carte mobile de Gemini permet de lire un résumé des informations clés dans une carte superposable, d'où il est possible de poser des questions directement et d'obtenir des réponses rapides sur n'importe quoi dans la boîte de réception sans avoir besoin de rechercher ou d'ouvrir les e-mails.
Comment Gemini aide-t-il dans l'organisation et le suivi des reçus de facture ?
-Gemini propose de créer un dossier Drive, de déplacer les reçus dedans, d'extraire les informations pertinentes et de générer automatiquement une feuille de calcul pour suivre les dépenses, tout en offrant la possibilité de visualiser la répartition par catégorie.
Quelle est la nouvelle fonctionnalité de Gemini qui permet de personnaliser l'IA pour créer des experts personnels sur n'importe quel sujet ?
-Les 'gems' sont des fonctionnalités Gemini qui permettent aux utilisateurs de créer des experts personnels sur des sujets spécifiques en écrivant une seule fois des instructions. Ces 'gems' seront disponibles pour répondre aux besoins futurs des utilisateurs.
Quels sont les avantages de l'utilisation de Gemini Nano dans le contexte de l'accessibilité ?
-Gemini Nano améliore l'accessibilité en fournissant des descriptions plus riches et plus claires des photos envoyées aux utilisateurs, même sans connexion réseau, grâce à ses capacités multimodales.
Quels sont les prix pour l'utilisation de Gemini 1.5 Pro et Gemini 1.5 Flash ?
-Gemini 1.5 Pro coûte 7 dollars pour 1 million de jetons, avec un rabais de 50% pour les prompts jusqu'à 128k jetons, soit 3,50 dollars. Gemini 1.5 Flash commence à 35 cents pour 1 million de jetons.
Outlines
🚀 Lancement de Gemini 1.5 Pro et nouvelles capacités
Le texte décrit le lancement de la nouvelle expérience Gemini 1.5 Pro avec une fenêtre de contexte de 1 million de jetons, qui sera disponible pour tous les développeurs à travers le monde. Cela inclut la possibilité d'interroger des contextes longs tels que des pages de texte, des heures d'audio ou de vidéo, et du code. Gemini 1.5 Pro permettra aux utilisateurs de traiter des questions complexes et de recevoir des réponses détaillées. La fonctionnalité de multimodalité permet de poser des questions plus larges et de recevoir des réponses plus riches. Le texte mentionne également l'expansion de la fenêtre de contexte à 2 millions de jetons et l'introduction de Gemini 1.5 Flash.
🎥 Projet Astra et génération de médias
Dans ce paragraphe, on annonce Project Astra, une avancée future de l'IA. On parle également de l'introduction de nouveaux outils génératifs pour l'image, la musique et la vidéo. Imagine 3 est présenté comme un outil plus photoréaliste avec des détails enrichis et moins d'artefacts visuels. On mentionne également la création de vidéos haute qualité avec le modèle VR, qui prend en charge des instructions textuelles, des images et des vidéos. Un outil expérimental appelé Video FX est également mentionné pour éditer des vidéos plus longue et créer des storyboards.
🧠 Amélioration des TPUs et nouvelles fonctionnalités de recherche
Le texte parle de la sixième génération de TPUs, Trillium, qui apporte une amélioration significative des performances de calcul par rapport à la génération précédente. Il annonce également l'introduction de la raison multi-étapes dans Google Search, permettant aux utilisateurs de poser des questions complexes et d'obtenir des réponses détaillées et structurées. On décrit également de nouvelles capacités pour Gmail mobile, notamment la possibilité de résumer des emails et de poser des questions directement à partir des cartes mobiles.
🤖 Intégration d'IA dans les outils de productivité
Ce paragraphe décrit l'intégration d'IA dans divers outils de productivité, y compris la création d'un assistant virtuel nommé Chip, qui peut effectuer des tâches spécifiques telles que le suivi des projets et l'organisation des informations. On mentionne également la possibilité d'avoir des conversations approfondies avec Gemini à l'aide de la voix et la création de 'gems', des experts personnels sur des sujets spécifiques. On décrit également une nouvelle expérience de planification de voyages qui permet de créer des itinéraires personnalisés en utilisant les informations de recherche et les extensions utiles comme Maps et Gmail.
📈 Analyse de données et nouvelles fonctionnalités Gemini
Le texte explique comment Gemini peut aider à analyser des données et à générer des visualisations, comme pour visualiser les revenus d'une activité annexe. On annonce également l'élargissement de la fenêtre de contexte à deux millions de jetons plus tard dans l'année. Gemini devient contextuellement conscient, permettant aux utilisateurs d'interagir de manière plus intuitive avec les applications. On mentionne également l'amélioration de l'accessibilité avec Gemini Nano, qui fournira des descriptions plus riches et claires des photos, même sans connexion réseau.
📉 Coûts et nouveaux modèles Gemini
On donne des informations sur les coûts de l'utilisation de Gemini 1.5 Pro et Gemini 1.5 Flash, ainsi que l'annonce de la prochaine génération de Gemini, Gemini 2, qui sera disponible en juin. On mentionne également l'expansion de Synth ID à de nouvelles modalités et l'ouverture de Synth ID text watermarking. On introduit également Learn LM, une nouvelle famille de modèles basés sur Gemini et finement ajustés pour l'apprentissage, avec des 'gems' préfabriqués qui seront disponibles dans l'application Gemini et l'expérience Web.
Mindmap
Keywords
💡Google IO
💡AI overviews
💡Gemini
💡Multimodality
💡1 million token context window
💡Project Astra
💡Imagine 3
💡TPUs (Tensor Processing Units)
💡Google Search updates
💡Gmail mobile
💡Gemini Advanced
Highlights
Google IO 2023 introduced a fully revamped AI experience with new features and improvements.
Gemini, Google's AI assistant, is set to launch a new version with expanded capabilities across 35 languages.
The context window for Gemini 1.5 Pro is being expanded to 2 million tokens, a significant step towards infinite context.
Google is introducing a lighter weight model called Gemini 1.5 Flash, offering more accessible AI capabilities.
A new feature called 'flash' is being introduced, which is a lighter model compared to Pro, with up to 1 million tokens available in Google AI studio and Vertex AI.
Project Astra is a new initiative in AI assistance that will allow users to interact with AI through sound and code analysis.
Imagine 3, a new generative media tool, offers more photorealistic images with richer details and fewer visual artifacts.
Google and YouTube are building Music AI Sandbox, a suite of professional music AI tools for creating and transforming music.
A new generative video model called VR is capable of creating high-quality 1080p videos from text, image, and video prompts.
TPUs are advancing with the sixth generation, Trillium, offering a 4.7x improvement in compute performance per chip.
Google search will soon feature multi-step reasoning to help users find the best solutions to complex questions.
Google is working on integrating video into search queries, allowing users to ask questions directly with a video.
A new Gemini powered side panel will be available next month with three new capabilities for Gmail mobile.
Gmail mobile will receive updates including a summarize option and a Q&A feature for quick answers within emails.
Google is prototyping a virtual Gemini powered teammate, Chip, designed to monitor and track projects, organize information, and provide context.
Live, a new Gemini feature, will allow users to have in-depth conversations with Gemini using voice and visual input.
Gems, personalized AI experts on any topic, will be introduced, allowing users to create personal experts for various subjects.
Gemini Advanced is introducing a new trip planning experience that gathers information from various sources to create a personalized vacation plan.
Google is expanding the capabilities of its AI models, including doubling the long context window to two million tokens and introducing context awareness.
Talk Back, an accessibility feature, is being enhanced with multimodal capabilities of Gemini Nano for richer and clearer descriptions.
Google is introducing new pricing models for Gemini 1.5 Pro and 1.5 Flash, making them more accessible to developers.
Poly Gemma, the first Vision language open model, is now available, and Jimma 2, the next generation of Gemma, will be available in June.
Synth ID is expanding to include text and video modalities, with plans to open source Synth ID text in the coming months.
Learn LM, a new family of models based on Gemini and fine-tuned for learning, is being developed with pre-made gems for educational purposes.
Transcripts
[Applause]
[Music]
Google we all ready to do a little
Googling welcome to Google IO it's great
to have all of you with us we'll begin
launching this fully revamped experience
AI overviews to everyone in the US this
week and we'll bring it to more
countries soon with Gemini you're making
that a whole lot easier say you're at a
parking station ready to pay now you can
simply ask photos it knows the cars that
appear often it triangulates which one
is yours and just tells you the license
plate number you can even follow up with
something more complex show me how Luci
swimming has progressed here Gemini goes
beyond a simple search recognizing
different contexts from doing laps in
the pool to snorkeling in the ocean we
are rolling out as photos this this
summer with more capabilities to come
multimodality radically expands the
questions we can ask and the answers we
will get back long context takes this a
step further enabling us to bring in
even more information hundreds of pages
of text hours of audio a full hour of
video or entire code repost you need a 1
million token context window now
possible with Gemini 1.5 Pro I'm excited
to announce that we are bringing this
improved version of Gemini 1.5 Pro to to
all developers globally Gemini 1.5 Pro
with 1 million contexts is now directly
available for consumers in Gemini
Advanced and can be used across 35
languages so today we are expanding the
context window to 2 million
tokens this represents the next step on
our journey towards the ultimate goal of
infinite context and you couldn't make
the PTA meeting the recording of the
meeting is an hour along if it's from
Google meet you can ask Gemini to give
you the
highlights there's a parents group
looking for volunteers you're free that
day of course Gemini can draft a apply
Gemini 1.5 Pro is available today in
workspace Labs notebook LM is going to
take all the materials on the left as
input and output them into a lively
science discussion personalized for him
so let's uh let's dive into physics
what's on deck for today well uh we're
starting with the basics force and
motion okay and that of course means we
have to talk about Sir Isaac Newton and
his three laws of motion and what's
amazing is that my son and I can join
into the conversation and steer it
whichever direction we want when I tap
join hold on we have a question what's
up
Josh yeah can you give my son Jimmy a
basketball
example hey Jimmy that's a fantastic
idea basketball is actually a great way
to visualize force and motion let's
break it down okay so first imagine a
basketball just sitting there on the
court it's not moving right that's
because all the forces acting on it are
balanced the downward pull of grav it
connected the dots and created that age
appropriate example for him making AI
helpful for everyone last year we
reached a milestone on that path when we
formed Google Deep Mind So today we're
introducing
Gemini 1.5 flash flash is a lighter
weight model compared to Pro starting
today you can use 1.5 Flash and 1.5 Pro
with up to 1 million tokens in Google AI
studio and vertex AI today we have some
exciting new progress to share about the
future of AI assistance that we're
calling project Astra tell me when you
see something that makes
sound I see a speaker which makes sound
what is that part of the speaker
called that is the Tweeter it produces
high frequency
sounds what does that part of the code
do this code defines encryption and
decryption functions it seems to use AES
CBC encryption to encode and decode data
based on a key and an initialization
Vector
IV what can I add here here to make this
system
faster adding a cache between the server
and database could improve speed today
we're introducing a series of updates
across our generative media tools with
new models covering image music and
video today I'm so excited to introduce
imagine 3 imagine 3 is more
photorealistic you can literally count
the whiskers on its snout with richer
details like this incredible sunlight in
the shot and fewer visual artifacts or
distorted images you can sign up today
to try imagine 3 in image FX part of our
suite of AI tools at labs. gooogle
together with YouTube we've been
building music AI sandbox a suite of
professional music AI tools that can
create new instrumental sections from
scratch transfer Styles between tracks
and more today I'm excited to announce
our newest most capable generative video
model called
VR VR creates high quality 1080p videos
from text image and video prompts it can
capture the details of your instructions
in different Visual and cinematic Styles
you can prompt for things like aerial
shots of a landscape or time lapse and
further edit your videos using
additional prompts you can use vo in our
new experimental tool called video FX
we're exploring features like
storyboarding and generating longer
scenes not only is it important to
understand where an object or subject
should be in space it needs to maintain
this consistency over time just like the
car in this video over the coming weeks
some of these features will be available
to select creators through video effects
at labs. gooogle and the weit list is
open now today we are exited to announce
the sixth generation of tpus called
Trillium Trillium delivers a 4.7x
Improvement in compute performance per
chip over the previous generation will
make Trillium available to our Cloud
customers in late 2024 we're making AI
overviews even more helpful for your
most complex questions to make this
possible we're introducing multi-step
reasoning in Google search soon you'll
be able to ask search to find the best
yoga or Pilates studios in Boston and
show you details on their intro offers
and the walking time from Beacon Hill
you get some studios with great ratings
and their introductory offers and you
can see the distance for each like this
one it's just a 10-minute walk away
right below you see where they're
located laid out visually it breaks your
bigger question down into all its parts
and it figures out which problems it
needs to solve and in what
order next take planning for example now
you can ask search to create a 3-day
meal plan for a group that's easy to
prepare and here you get a plan with a
wide range of recipes from across the
web if you want to get more veggies in
you can simply ask search to swap in a
vegetarian dish and you can export your
meal plan or get the ingredients as a
list just by tapping here soon you'll be
able to ask questions with video right
in Google search I'm going to take a
video and ask
Google why will this not stay in
place and a near instant Google gives me
an AI overview I guess some reasons this
might be happening and steps I can take
to troubleshoot you'll start to see
these features rolling out in search in
the coming weeks and now we're really
excited that the new Gemini powered side
panel will be generally available next
month three new capabilities coming to
Gmail mobile it looks like there's an
email threat on this with lots of emails
that I haven't read and luckily for me I
can simply tap the summarize option up
top and Skip reading this long back and
forth now Gemini pulls up this helpful
Mobile card as an overlay and this is
where I can read a nice summary of all
the Salient information that I need to
know now I can simply type out my
question right here in the Mobile card
and say something like compare my roof
repair bids by price and availability
this new Q&A feature makes it so easy to
get quick answers on anything in my
inbox without having to First search
Gmail then open the email and then look
for the specific information and
attachments and so on I see some
suggested replies from Gemini now here I
see I have declined the service
suggested new time these new
capabilities in Gemini and Gmail will
start rolling out this month to Labs
users it's got a PDF that's an
attachment from a hotel as a receipt and
I see a suggestion in the side panel
help me organize and track my receipts
step one create a drive folder and put
this receipt and 37 others it's found
into that folder step two extract the
relevant information from those receipts
in that folder into a new spreadsheet
Gemini offers you the option to automate
this so that this particular workflow is
run on all future emails Gemini does the
hard work of extracting all the right
information from all the files and in
that folder and generates this sheet for
you show me where the money is
spent Gemini not only analyzes the data
from the sheet but also creates a nice
visual to help me see the complete
breakdown by category this particular
ability will be rolling out to Labs
users this September we're prototyping a
virtual Gemini powered teammate Chip's
been given a specific job role with a
set of descriptions on how to be helpful
for the team you can see that here and
some of the jobs are to Monitor and
track projects we've listed a few out to
organize information and provide context
and a few more things are we on
track for
launch chip gets to work not only
searching through everything it has
access to but also synthesizing what's
found and coming back with an up-to-date
response there it is a clear timeline a
nice summary and notice even in this
first message here chip Flags a
potential issue the team should be aware
of because we're in a group space
everyone can follow along anyone can
jump in at any time as you see someone
just did asking chip to help create a
doc to help address the issue and this
summer you can have an in-depth
conversation with gini using your voice
we're calling this new experience live
when you go live you'll be able to open
your camera so Gemini can see what you
see and respond to your surroundings in
real time so we're rolling out a new
feature that lets you customize it for
your own needs and create personal
experts on any topic you want we're
calling these gems just tap to create a
gem write your instructions once and
come back whenever you need it for
example here's a gem that I created that
acts as a personal writing coach it
specializes in short stories with
mysterious twists and it even Builds on
the story drafts in my Google Drive gems
will roll out in the coming months that
reasoning and intelligence all come
together in the new trip planning
experience in in Gemini Advanced we're
going to Miami my son loves art my
husband loves seafood and our flight and
hotel details are already in my Gmail
inbox to make sense of these variables
Gemini starts by gathering all kinds of
information from search and helpful
extensions like maps and Gmail the end
result is a personalized vacation plan
presented in Gemini's new Dynamic UI I
like these recommendations but my family
likes to sleep in so I tap to change the
start time and just like that Gemini
adjusted my intinerary for the rest of
the trip this new trip planning
experience will be rolling out to Gemini
Advanced this summer you can upload your
entire thesis your sources your notes
your research and soon interview audio
recordings and videos too it can dissect
your main points identify improvements
and even roleplay as your profession
maybe you have a side hustle selling
handcrafted products simply upload all
of your spreadsheets and ask Gemini to
visualize your
earnings Gemini goes to work calculating
your returns and pulling its analysis
together into a single chart and of
course your files are not used to train
our models later this year we'll be
doubling the long context window to two
million tokens we're putting AI powered
search right at your fingertips create
let's say my son needs help with a
tricky physics word problem like this
one if he stumped on this question
instead of putting me on the spot he can
Circle the exact part he's stuck on and
get stepbystep
instructions right where he's already
doing the work this new capability is
available today now we're making Gemini
context aware so my friend Pete is
asking if I want to play pickle ball
this weekend so I'm going to reply and
try to be funny and I'll say uh is that
like tennis but with uh pickles and I'll
say uh create image of tennis with
Pickles now one new thing you'll notice
is that the Gemini window now hovers in
place above the app so I stay in the
flow okay so that generated some pretty
good images uh what's nice is I can then
drag and drop any of these directly into
the messages app below so like so cool
let me send that and because it's
context aware Gemini knows I'm looking
at a video so it proactively shows me an
ask this video chip what is is can't
type the two bounce rule by the way this
uses signals like YouTube's captions
which means you can use it on billions
of videos so give it a moment and there
starting with pixel later this year
we'll be expanding what's possible with
our latest model Gemini Nano with
multimodality so several years ago we
developed talk back an accessibility
feature that helps people navigate their
phone through touch and spoken feedback
and now we're taking that to the next
level with the multimodal capabilities
of Gemini Nano so when someone sends
Cara a photo she'll get a richer and
clearer description of what's happening
and the model even works when there's no
network connection these improvements to
talk back are coming later this year 1.5
Pro is $7 per 1 million tokens and I'm
excited to share that for prompts up to
128k it'll be 50% less for
$3.50 and 1.5 flash will start at 35
cents per 1 million tokens and today's
newest member poly Gemma our first
Vision language open model and it's
available right now I'm also too excited
to announce that we have Jimma 2 coming
it's the next generation of Gemma and it
will be available in June today we're
expanding synth ID to two new
modalities text and
video and in the coming months we'll be
open sourcing synth ID text water
marking I'm excited to introduce learn
LM our new family of models based on
Gemini and fine-tuned for learning we're
developing some pre-made gems which will
be available in the Gemini app and web
experience including one called learning
coach I have a feeling that someone out
there might be
counting how many times we have
mentioned AI today we went ahead and
counted so that you don't have
[Applause]
to that might be a record in how many
times someone has said
AI here's to the possibilities ahead and
creating them together thank you
Voir Plus de Vidéos Connexes
5.0 / 5 (0 votes)