New AI Video Goes Hard At Open AI!
TLDRThe video discusses a new AI video generator called 'Vu', which is being compared to the anticipated Sora model. Vu, developed by Shinu Technology and Singua University, can produce high-quality 16-second clips at 1080p. The video showcases a sizzle reel and longer examples of Vu's output, highlighting its temporal coherence and the use of Universal Video Transformer (UViT) architecture. While not as detailed as Sora, Vu demonstrates impressive results, especially with its handling of camera movement and background consistency. The video also touches on the challenges of creating realistic AI-generated videos and the need for post-production work. There's a sign-up link for Vu, but it appears to be temporarily broken due to high demand.
Takeaways
- π¬ The new AI video generator, possibly named 'Vu', is being compared to the yet-to-be-released Sora model.
- πΊ 'Vu' can generate video clips up to 16 seconds at 1080p resolution, as showcased in their Sizzle reel.
- π€ Developed by Shinu technology and Singua University, 'Vu' aims to compete with Sora in terms of video generation quality.
- π§ The architecture of 'Vu' is based on the Universal Video Transformer (UvIT), which combines Vision Transformers and U-Net for improved image generation.
- π UvIT treats all elements as tokens and uses long skip connections, allowing it to chart a path between the first and last frames of a video.
- πΉ Examples of 'Vu' outputs show temporal coherence and detailed background elements, although not as detailed as Sora's outputs.
- π In a side-by-side comparison, Sora's videos tend to have more action and clearer definition, but 'Vu' also presents a realistic environment.
- π Both 'Vu' and Sora are capable of creating compelling imagery, though Sora may have a slight edge in terms of realism.
- π½ A production process involving human effort is still necessary to achieve semi-consistency in AI-generated videos, as demonstrated in the short film 'Airhead'.
- π For those interested, there is a signup link for 'Vu' on their website, although it might be temporarily unavailable due to high demand.
- π The integration of Sora into Adobe Premiere and future plans for After Effects are discussed in an exclusive interview with Adobe.
Q & A
What is the name of the new AI video generator discussed in the transcript?
-The new AI video generator discussed is referred to as 'Vu' or 'Vidu', developed by Shinu technology and Singua University.
What is the maximum duration and resolution that the AI video generator can produce?
-The AI video generator can produce clips up to 16 seconds at 1080p resolution.
What is the architecture of the AI video generator based on?
-The architecture of the AI video generator is based on UID, or Universal Video Transformer, which is a combination of two separate papers: DPM solver and 'All Are Worth Words'.
How does the Universal Video Transformer (Uvit) treat different elements in video generation?
-Uvit treats everything, from time to specific conditions, as tokens and utilizes long skip connections, allowing it to chart a path between the first and last frame of the video.
What is the difference between the video generation approach of Sora and UVIT?
-Sora creates videos by generating temporal spaces, whereas UVIT has an in and an out point and figures out the transitions between them, which helps in avoiding the hallucinatory warpy effects seen in traditional AI video generators.
What is the significance of the longer run time examples of Vidu's output?
-The longer run time examples demonstrate Vidu's ability to maintain temporal coherence and generate detailed visuals, showcasing its potential as a competitive AI video generator.
How does the video output of Vidu compare to Sora in terms of realism and aesthetics?
-While Vidu's output looks really good and maintains temporal coherence, it may not be as detailed or cinematically realistic as Sora's output. However, Vidu's aesthetic, particularly the mid-journey V4 look, is appreciated for its surreal quality.
What are some of the challenges faced by AI video generators like Sora and Vidu?
-Challenges include maintaining temporal coherence, generating detailed and realistic visuals, and avoiding hallucinatory effects. Additionally, post-production work is often required to clean up and refine the generated footage for a polished final product.
How was Sora utilized in the short film 'Airhead'?
-Sora was used to generate initial video footage for 'Airhead', which then required significant post-production work, including cleaning up the footage, script writing, editing, voice over, music, sound design, color correction, and other typical post-production processes.
What is the current status of the sign-up link for Vidu on their website?
-As of the time of the transcript recording, the sign-up link on Vidu's website appears to be broken, possibly due to high traffic. It is suggested to try again after a day or two if it does not work.
What is the significance of the 'Tokyo walk' sequence in the comparison between Vidu and Sora?
-The 'Tokyo walk' sequence is used to illustrate the comparative quality of the video outputs from Vidu and Sora. Despite the short clip length, it shows that both models can produce fairly comparable results, although there are inherent challenges in the gait and realism of the generated footage.
How does the transcript suggest the future of AI video generation technology?
-The transcript suggests that AI video generation technology, even in its current state, can be used to create compelling imagery. It also highlights the potential for integration into professional tools like Adobe Premiere and future plans for enhancements in post-production software like After Effects.
Outlines
π Introduction to a Potential Sora Rival: Vu
The video introduces a new AI video generator named 'Vu', which is being considered as a potential competitor to Sora, despite Sora not being released yet. The presenter discusses the irony and dives into the features of Vu, which is capable of generating 16-second clips at 1080p resolution. The video showcases a sizzle reel from Vu, highlighting its direct references to Sora's initial video release. Vu's architecture is based on the Universal Video Transformer (UViT), which combines Vision Transformers for image analysis with a Unet model for image generation. This allows Vu to have a clear start and end frame, potentially avoiding the hallucinatory effects seen in some AI video generators. The presenter also shares a few examples of longer video outputs from Vu, noting the quality and temporal coherence in the clips.
π₯ Analysis of Longer Vid Outputs and Comparison with Sora
The presenter provides an in-depth analysis of longer video outputs from the Vidu model, comparing them with Sora. The video outputs from Vidu are described as high-quality, with temporal coherence and detailed visuals, although not as detailed as Sora's outputs. The presenter appreciates the aesthetic of the mid-journey V4 look in the Vidu outputs. The video also includes a comparison with Sora, noting that while Sora's videos are more action-packed and detailed, Vidu's outputs are still impressive and create a sense of a real place. The presenter acknowledges that both models have their strengths and that the examples shown are cherry-picked. They also discuss the effort required to clean up Sora footage for a final feature, highlighting the post-production process involved in utilizing AI-generated videos.
π Utilizing AI in Filmmaking and Future of Vidu
The video concludes with a discussion on the practical application of AI in filmmaking, referencing Paul Trello's VFX breakdown and his use of AI imagery in his short film 'Notes to My Future Self'. The presenter describes the process of integrating AI-generated images with live-action footage and the various tools used to enhance the scenes. Additionally, the presenter provides a sign-up link for Vidu, noting that there might be temporary issues due to high demand. The video ends with a teaser for an upcoming interview with Adobe about Sora's integration into Premiere and future plans for After Effects.
Mindmap
Keywords
Sora
Vu (Vidu)
Universal Video Transformer (UvIT)
DPM Solver
All Are Worth Words
Vision Transformers
Unet
Temporal Coherence
Sizzle Reel
Freebird
Tokyo Walk Sequence
Highlights
A new AI video generator, potentially rivaling Sora, has been released.
The AI can generate clips up to 16 seconds at 1080p resolution.
The model is developed by Shinu Technology and Singua University.
Vidu's architecture is based on the Universal Video Transformer (UViT).
UViT combines Vision Transformers with a Unet model for image generation.
UViT treats all aspects from time to specific conditions as tokens.
Long skip connections allow UViT to maintain coherence between the first and last frames of a video.
Vidu's output is compared to Sora, with a focus on temporal coherence and detail.
A full 16-second clip showcases Vidu's ability to maintain temporal coherence in visuals.
Vidu's generated panda playing guitar demonstrates impressive background coherence and shadow reactivity.
A beach vacation villa clip from Vidu shows an interesting dissolve effect between shots.
Vidu's imaginative output includes a ship in a bedroom, reacting correctly to water movements.
A side-by-side comparison with Sora reveals differences in action clarity and environment realism.
Despite the comparison, both Vidu and Sora are noted as very good video generators.
The Tokyo walk sequence from Vidu shows the model's capability in generating realistic movements.
Sora's video generation process requires significant post-production work for consistency.
AI tools are being used to create compelling imagery, as demonstrated by Paul Trello's VFX breakdown.
Vidu has a signup link on their website, but the submit button may be temporarily broken due to high traffic.
The integration of Sora into Adobe Premiere and future plans for After Effects are discussed in an exclusive interview.