Everything Google just announced

Theo - t3․gg

21 May 202527:47

Summary

TLDRThe video explores the latest advancements in AI, focusing on Google’s Gemini model, Flow for video generation, and OpenAI's Sora. It reviews video generation capabilities, comparing Google’s Flow with OpenAI's systems, highlighting both strengths and limitations like video rendering issues, audio syncing problems, and slow processing times. The speaker is excited about the potential of AI and VR/AR tech, particularly Android XR, though cautious about its future. The video emphasizes the challenges of achieving good user experience and showcases the continuous evolution of video models, giving a candid look at both successes and frustrations in the tech world.

Takeaways

😀 Gemini model shows potential but still lacks the predictive power of traditional LLMs, despite being fast and promising.
😀 Gemini processes tokens at a high speed (1,000 tokens per second), but it’s still an early experiment that requires further refinement.
😀 Flow, Google's video generation tool, produces animated videos but often suffers from unrealistic lighting and background issues.
😀 Google’s Cling V2 video model shows improvements but still struggles with syncing audio to visuals, resulting in poor quality at times.
😀 Higher-tier Google plans give users access to V3 models, offering better video and audio quality, although the price is steep.
😀 Audio syncing issues are common, as seen in a video where typing sounds are not in sync with the corgi's actions.
😀 Google's video generation process can be slow, with some tests taking several minutes to complete, showing room for faster performance.
😀 The speaker is excited about the potential of Extended Reality (XR) but remains skeptical due to past disappointments in the field.
😀 The Vision Pro by Apple offers superior XR features, especially with persistent object placement in the real world, unlike Google’s XR glasses.
😀 Google's floating window concept in XR is a promising development, though current demos have faced technical issues like poor connectivity.
😀 Despite UX and video quality challenges, there's optimism around AI tools like Flow and Sora, which may improve with time and further development.

Q & A

What is Gemini, and how does it compare to traditional LLMs?
-Gemini is a cutting-edge model that is still in the experimental phase. Unlike traditional LLMs, which predict the next token based on all previous tokens and training data, Gemini is not as advanced but shows promising results. It is notably fast, generating about 1000 tokens per second.
What are the key features of Flow in video generation?
-Flow is an AI tool used to generate videos, like a corgi writing code. The generated videos are not high-end CGI quality, but they offer decent results. It uses models like Cling V2 and the newer V3 model, which includes audio generation. Flow offers users the ability to create scenes and link them to generate full videos.
How does Google's subscription tier impact the video generation quality?
-Google offers different subscription tiers that affect access to better models. The Ultra plan provides access to V3, the latest video generation model that includes audio generation. The lower-tier models, like V2, are still functional but offer lower-quality results, especially with audio syncing.
What issue was encountered with audio generation in Flow?
-In the Flow video generation process, the audio often didn't match the visuals. For example, random typing sounds were heard, even when the character (a corgi) wasn't touching the keyboard. This is due to the audio being generated poorly and not being synchronized correctly with the video.
How does the video generation time compare across different models?
-The video generation time varies depending on the model and the subscription tier. For example, generating a video with the Cling model took over 5 minutes, while Google's V3 model took about 2 minutes and 30 seconds. This showcases a significant speed difference based on the model used.
What are the UX issues mentioned in the video generation process?
-Several UX issues were pointed out, such as poor synchronization between video and audio, video player glitches, and the inability to resize or interact with the interface properly. The video generation process often led to frustrations, such as random background screens or failed scene rendering.
What is the significance of the Vision Pro in AR/VR technology?
-The Vision Pro is significant for its ability to persist floating windows in a virtual space, meaning objects and windows stay in place when the user moves around. This creates a more immersive and seamless experience compared to other devices, like Google's XR glasses, which do not yet offer this functionality.
What distinguishes Google's XR glasses from the Vision Pro?
-The main difference between Google's XR glasses and the Vision Pro is that the Vision Pro allows virtual objects to stay fixed in the real world as the user moves around, which provides a more immersive experience. Google's XR glasses, on the other hand, have limited spatial persistence, and objects move with the user's gaze.
Why is the author skeptical about VR and AR technologies?
-The author expresses skepticism due to past experiences where excitement for new VR and AR technologies led to disappointment. Despite some promising demos, the author remains cautious, as they have seen many VR and AR projects fail to live up to expectations.
What was the result of testing OpenAI's Sora for video generation?
-The author tested OpenAI's Sora and found that it was slower than expected. It took about 3 minutes to generate a video of a corgi coding, and the output was somewhat inconsistent. However, the higher-tier subscription enabled the model to generate videos concurrently, which was a more advanced feature.