Sergey Brin on Gemini 1.5 Pro (AGI House; March 2, 2024)

AttentionX

3 Mar 202436:59

Summary

TLDRThe speaker introduces an AI model called Gemini 1.5 Pro, explaining it performed much better than expected during training. He invites the audience to try interacting with the model and asks for questions. When asked about problematic image generation, he admits they messed up due to insufficient testing. He acknowledges text models can also say peculiar things if prompted aggressively enough. He claims Gemini 1.5 Pro text capabilities should not have such issues except the general AI quirks all models exhibit. Overall, he is excited about Gemini's potential for long context understanding and multimodal applications.

Takeaways

😊 The chat reveals behind-the-scenes info about the AI model Gemini 1.5 Pro, saying it performed better than expected during training.
🤓 Gemini is experimenting with feeding images and video frame-by-frame to the models to enable them to talk about the visual input.
😟 The speaker acknowledges issues with problematic image generation and text outputs from AI models.
🧐 Efforts are ongoing to understand why models sometimes generate concerning outputs when prompted in certain ways.
👩‍💻 The speaker personally writes a little bit of code to debug models or analyze performance, but says it is probably not impressive.
🤔 In response to a question, the speaker says today's AI models likely can't recursively self-improve sophisticated systems without human guidance.
😊 The speaker is excited about using AI to summarize lengthy personalized information like medical history to potentially enable better health diagnoses.
😕 The speaker says detecting AI-generated content is an important capability to combat misinformation.
🤔 When asked if programming careers are under threat, the speaker responds that AI's impacts across many careers over decades is difficult to predict.
😀 The speaker expresses optimism about AI advancing healthcare through better understanding biology and personalizing patient information.

Q & A

What model was the team testing when they created the 'goldfish' model?
-The team was experimenting with scaling up models as part of a 'scaling ladder' when they created the 1.5 Pro model they internally referred to as 'goldfish'. It was not specifically intended to be released.
Why was the 1.5 Pro model named 'goldfish' internally?
-The name 'goldfish' was meant ironically, referring to the short memory capacity of goldfish. This was likely meant to indicate the limits of the 1.5 Pro model's memory and context capacity at the time.
What issues did the speaker acknowledge with the image generation capabilities?
-The speaker acknowledged that they 'definitely messed up' on image generation, mainly due to insufficient testing. This upset many people based on the problematic images that were generated.
What two issues did the speaker identify with the text models?
-The speaker identified two issues with text models - first, that weird or inappropriate content can emerge when deeply testing any text model. Second, there were still bias issues specifically within Gemini models that they had not fully resolved.
How does the speaker explain the model's ability to connect code snippets and bug videos?
-The speaker admits they do not fully understand how the model can connect code and video to identify bugs. They state that while it works, it requires a lot of time and study to deeply analyze why models can accomplish complex tasks.
What are the speaker's thoughts on training models on-device?
-The speaker is very positive about on-device model training and deployment. They mention Google has shipped models to Android, Chrome, and Pixel phones. Smaller models trained on-device can also call larger cloud models.
What healthcare applications seem most promising to the speaker?
-The speaker highlights AI applications for understanding biological processes and summarizing complex medical literature. Additionally, personalized patient diagnosis, history analysis, and treatment recommendations mediated by a doctor.
How does the speaker explain constraints around self-improving AI systems?
-The speaker says self-improving AI could work in very limited domains with human guidance. But complex codebases require more than long context, needing retrieval and augmentation. So far there are limits to totally automated improvement.
What lessons did the speaker learn from the early Google Glass rollout?
-The speaker feels Google Glass was released too early as an incomplete prototype rather than thoroughly tested product. Personally lacking consumer hardware expertise then, the speaker wishes expectations were properly set around an early prototype.
Despite business model shifts, why is the speaker optimistic?
-The speaker feels that as long as AI generates tremendous value and productivity gains displacing human labor time and effort, innovative business models will emerge around monetization.