Chinese Company Unveils SORA Competitor - "Vidu" AI Video Generator

AI Search
28 Apr 202411:37

TLDRA Chinese company named Shu has announced a new AI video generator called Vidu, which is positioned as a competitor to SORA. Vidu claims to generate high-quality 16-second 180p video clips with a single click, using a proprietary model architecture called Universal Vision Transformer (Uvit). This architecture combines the strengths of the diffusion and Transformer models, which are foundational to current generative AI technologies. The company's research team reportedly proposed Uvit before Sora's model architecture, aiming to advance beyond the limitations of existing models. The video showcases Vidu's ability to generate realistic videos, although there are some inconsistencies noted when compared to Sora's output. Interested users can apply to use Vidu through the company's website, shanguai.com. The announcement highlights the growing competition in the AI space, with China's advancements in AI technology challenging the dominance of American tech giants.

Takeaways

  • 📢 Chinese company Shu has announced a new AI video generator named 'Vidu', which is positioned as a competitor to SORA.
  • 🚀 Vidu can generate a 16-second 180p video clip with a single click, utilizing a self-developed architecture called Universal Vision Transformer (Uvit).
  • 🔍 The UVit architecture integrates two AI models: diffusion and Transformer, which is seen as an advancement in generative AI following the limitations of stable diffusion models.
  • 🤖 The Transformer model, known for understanding context, is combined with the diffusion model to potentially produce more coherent and accurate video or image outputs.
  • 🏆 Ju Jun, from Tsinghua University and Shangu, claims that UVit's core technology was proposed before Sora's model architecture, marking a significant milestone in AI video generation.
  • 🎥 Vidu's showcase reel demonstrates high-quality video generation, including realistic hands and other elements, although it has some inconsistencies compared to Sora.
  • 📈 A side-by-side comparison reveals that while Vidu's results are impressive, Sora's output appears to be of higher quality and more realistic.
  • 📸 The resolution of Vidu's showcased videos is 720p, which is lower than Sora's full HD, affecting the crispness and sharpness of the details.
  • 📝 The Global Times article mentions that Vidu can output 1080P videos, suggesting that the showcased videos may not represent its full potential.
  • 🌐 Interested users can apply to use Vidu through the website shanguai.com, where they are asked to leave their contact details for a marketing consultant to follow up.
  • 📈 China has been making significant strides in the AI space, with recent advancements in language models and robotics, indicating a competitive edge in the global AI race.
  • 💬 The speaker encourages viewers to share their thoughts on Vidu, whether they believe it's on par with or superior to Sora, and to apply for access if interested.

Q & A

  • What is the name of the AI video generator announced by the Chinese company Shu?

    -The name of the AI video generator is 'Vidu'.

  • What is the core technology behind Vidu's AI video generator?

    -The core technology behind Vidu's AI video generator is the Universal Vision Transformer (UViT), which integrates two text video AI models: the diffusion model and the Transformer.

  • How does Vidu's technology compare to Sora's in terms of video generation capabilities?

    -Vidu's technology is claimed to be on par with Sora's, being able to generate a 16-second 180p video clip with one click. It is suggested that the combination of the diffusion and Transformer models in Vidu's UViT could potentially offer more coherent and accurate video generation compared to Sora.

  • What are some of the limitations of the stable diffusion model that Vidu's technology aims to overcome?

    -The stable diffusion model has limitations such as not being able to generate text very well and not fully understanding context or following more complicated prompts. Vidu's technology aims to address these by merging the Transformer model, which is good at understanding context, with the diffusion model.

  • Who is Ju Jun and what is his role in the development of Vidu's technology?

    -Ju Jun is the vice dean of The Institute of AI at Chingua University and the chief scientist at Shangu. He states that after the release of Sora, which closely aligned with their technical roadmap, it further motivated their research team to advance their research.

  • How can one apply to use Vidu's AI video generator?

    -To apply to use Vidu's AI video generator, one can visit the website shanguai.com, scroll down to the video generation section, and fill out the form with their name, phone number, company name, and an inquiry message.

  • What is the significance of the recent advancements in AI from China?

    -The recent advancements in AI from China, including the new language model, the S1 robot, and the Vidu AI video generator, indicate that China is making significant strides in the AI race. These developments show that other countries might not be far behind the tech giants in America and are actively contributing to the field of AI.

  • What are some of the other AI advancements from China that have been mentioned in the transcript?

    -Apart from Vidu's AI video generator, the transcript mentions the launch of 'Since Nova 5.0' by a Chinese company, which reportedly beats GPT for turbo on nearly all benchmarks, and the unveiling of the S1 robot by a company called ASOT, which is noted for its super fast speed.

  • How does the video quality of Vidu's AI video generator compare to Sora's?

    -While Vidu's show reel seems to have good quality, the resolution of the provided examples is 720p, which is lower than Sora's full HD videos. However, the Global Times article mentions that Vidu can output 1080p, suggesting that the quality could be comparable when viewed at the same resolution.

  • What are some of the observations made about the consistency and realism of Vidu's generated videos?

    -The observations include that Vidu generates hands very well with normal-looking fingers. However, there are noted inconsistencies such as a hair transforming into a red ribbon and a green leaf disappearing, which do not seem to occur in Sora's generated videos.

  • How does the AI video generator Vidu's performance compare to other video generators like Runway and Pika?

    -Vidu's showcased results appear to be notably better than Pika and Runway, with more realistic and coherent outputs. However, it is not yet clear if it is on par with Open AI's Sora, which has not been released for public use.

  • What is the significance of the Transformer model in the context of AI video generation?

    -The Transformer model, which is the backbone of many large language models, is known for its ability to understand context. When merged with the diffusion model, it is expected to produce more coherent and accurate video or image generation, potentially surpassing the capabilities of the diffusion model alone.

Outlines

00:00

🚀 Introduction to Shu's AI Video Generator

The video script begins with an announcement from a Chinese company, Shu, which has developed an AI video generator called SORA. The presenter expresses a sense of urgency to cover this development, suggesting it's a significant advancement in the field. The tool is claimed to be on par with OpenAI's Sora, and the presenter intends to play a show reel to provide more details on its capabilities and how one can apply to use it. The script also references an article from globaltimes.cn, indicating that the new video generator is built on a self-developed architecture called Universal Vision Transformer (UViT), which integrates two AI models: diffusion and Transformer. This merger is seen as a potential next step in generative AI, overcoming limitations of previous models like stable diffusion. The presenter also mentions the contributions of Ju Jun, a key figure in the development of UViT, and compares the new tool with existing video generators like Runway and Pika, noting that while the results are promising, they may not yet match the unreleased OpenAI Sora.

05:01

📊 Comparative Analysis of AI Video Generators

The second paragraph of the script presents a side-by-side comparison between the AI video generator from Vdu and OpenAI's Sora. The presenter plays videos to showcase the quality and realism of the generated content from both tools. Notably, the Vdu's video is criticized for inconsistencies, such as a hair transforming into a red ribbon and a green leaf disappearing, which are not present in Sora's output. The script also mentions a specific video prompt comparison between the two AIs, with the presenter inviting viewers to decide which one they prefer. It is highlighted that Vdu's show reel is in 720p, whereas Sora's videos are in full HD, which affects the perceived quality. The presenter guides viewers on how to apply for access to Vdu's AI video generator through shanguai.com and shares excitement about recent advancements in AI from China, including a new language model and a robot from different Chinese companies.

10:03

🌟 Global AI Competition and Future Prospects

The final paragraph emphasizes the global nature of the AI competition and the recent advancements from China. The presenter mentions the unveiling of a new language model and a robot by Chinese companies, suggesting that these developments are noteworthy and indicate a close race in the field of AI. The presenter expresses enthusiasm for the unveiling of Vdu's AI video generator, seeing it as a positive development for the industry due to the increased competition it brings. The presenter also encourages viewers to share their thoughts on Vdu's AI video generator and whether they plan to apply for access. The script concludes with a call to action for viewers to engage with the content by liking, sharing, and subscribing for more updates.

Mindmap

Keywords

AI Video Generator

An AI video generator is a technology that uses artificial intelligence to automatically create videos based on given inputs or prompts. In the context of the video, the AI video generator named 'Vidu' by the Chinese company Shu is presented as a competitor to SORA. It is claimed to generate video clips with a single click, showcasing the advancement in generative AI.

Universal Vision Transformer (Uvit)

Uvit refers to a self-developed visual transformation model architecture that integrates two AI models: the diffusion model and the Transformer model. It is mentioned in the video as the core technology behind Vidu, which is considered a significant advancement in generative AI for creating more coherent and accurate videos or images.

Diffusion Model

A diffusion model is a type of generative model used in machine learning to generate data samples, such as images or videos, that are similar to a given dataset. In the video, it is discussed as one of the components merged with the Transformer model to enhance the capabilities of the AI video generator Vidu.

Transformer Model

The Transformer model is a type of deep learning model that was introduced by Google's DeepMind in their paper 'Attention Is All You Need'. It is known for its ability to understand context and is the backbone of many language models like GPT and Claude. In the video, it is highlighted as a key component of the Univit architecture, contributing to the advanced capabilities of Vidu.

Stable Diffusion

Stable Diffusion is a generative model that has been noted for its ability to create images and videos. However, it has limitations, such as not generating text very well and struggling with complex prompts. The video discusses it in comparison to Vidu, indicating that Vidu aims to overcome these limitations.

Generative AI

Generative AI refers to the branch of artificial intelligence that focuses on creating new content, such as images, videos, or text, that is similar to existing data but not identical. The video's main theme revolves around the advancements in generative AI through the introduction of Vidu.

Shu Company

Shu is the Chinese company that has announced the development of Vidu, positioning it as a competitor to SORA. The company's announcement is significant as it suggests a step forward in the global race for advanced AI video generation technology.

Runway and Pika

Runway and Pika are mentioned in the video as two of the best existing video generators currently available. They are compared to the capabilities of Vidu, with the suggestion that Vidu may outperform these platforms in terms of video quality and realism.

Resolution

Resolution refers to the clarity and sharpness of a video or image, determined by the number of pixels in the display. The video discusses the resolution of Vidu's output, noting that while the examples shown were in 720p, the company claims the capability to output 1080p, which is considered full HD.

WeChat Page

The WeChat Page is mentioned as the source of the Vidu showreel. WeChat is a popular Chinese multi-purpose messaging, social media, and mobile payment app, and its page feature is used by companies to share information and engage with users. In the context of the video, it serves as a platform for showcasing Vidu's capabilities.

AI Race

The term 'AI Race' is used in the video to describe the global competition among different countries and companies to develop and implement advanced artificial intelligence technologies. The video highlights recent advancements from China, suggesting that the country is a significant player in this race.

Highlights

Chinese company Shu announces a new AI video generator called Vidu, which is a competitor to SORA.

Vidu claims to generate a 16-second 180p video clip with one click.

The AI is built on a self-developed visual transformation model architecture called Universal Vision Transformer (Uvit).

Uvit merges the diffusion and Transformer models, which is considered the next step in generative AI.

The Transformer model is known for understanding context, which should improve the coherence of generated content.

Vidu's core technology was first proposed by its research team in September 2022, earlier than Sora's model architecture.

Vidu showcased its AI's ability to generate realistic hands with five fingers.

A side-by-side comparison with Sora's video generation shows Vidu's potential to compete.

Vidu's video generation has some inconsistencies, such as transforming hair into a red ribbon.

Vidu's representation of a wooden toy ship navigating on a carpet was criticized for lack of realism.

The resolution of Vidu's videos is lower than Sora's, with the former being 720p and the latter in full HD.

Vidu can output 1080p videos according to the Global Times article.

The website shanguai.com is where users can apply to use Vidu's AI video generator.

The application process for Vidu does not specify eligibility and requires basic contact information.

Chinese companies have been making significant strides in the AI space with recent advancements.

SinNova 5.0, a Chinese AI language model, reportedly beats GPT-4 Turbo on nearly all benchmarks.

The unveiling of Vidu provides more competition in the AI video generation market.

Competition in the AI space is seen as beneficial for driving innovation and improvement.