Expressive AI Avatars Launch Event | Synthesia
Summary
TLDRSynthesia introduces Expressive Avatars, a significant upgrade to their AI video technology. These avatars can now understand and reflect emotions in speech, providing a more natural and engaging experience. The launch includes a real-time video preview feature and improved lip-sync in multiple languages. Synthesia's platform offers a full suite for video creation, editing, and sharing, emphasizing the importance of video content in today's economy where engagement is key.
Takeaways
- 🌐 The event is global, with participants from various time zones.
- 🎉 The launch is anticipated to be the most significant of the year, focusing on a major upgrade to Avatar technology.
- 🆓 Attendees are offered the chance to be among the first to try expressive avatars without needing to sign up.
- 🚀 Synthesia's mission is to simplify video content creation for everyone through a comprehensive AI video communication platform.
- 🎭 The platform includes avatars, voices, a video editor, and collaborative features for real-time team work on videos.
- 📈 Synthesia aims to boost user engagement through video, which is crucial for marketing, learning, and training due to increased retention and viewer attention.
- 📊 There's a significant shift in online consumption habits, with a preference for video over text, aligning with our biological wiring for visual content.
- 🤖 The evolution of Avatar technology at Synthesia began with looped videos, moved to fully synthetic generation, and now includes expressive avatars that understand and react to the text they're speaking.
- 💬 Expressive avatars are a breakthrough, allowing for sentiment prediction, improved lip sync, and more natural voice modulations that match the avatar's expressions.
- 🔍 The new technology is not just an upgrade but a transformation that enables avatars to perform like actors, making them more engaging and suitable for a wider range of content, including sensitive topics.
- 🔮 Synthesia is also working on additional features like multilingual voice cloning, an AI screen recorder, and an improved animation system called 'triggers'.
Q & A
What is Synthesia's mission?
-Synthesia's mission is to make it easy for everyone in the world to make video content by providing a platform for all AI Video Communications needs.
How does Synthesia help users create videos?
-Synthesia assists users in creating videos by offering avatars and voices, a full video editor for adding screen recordings, background images, text, animations, and a collaborative platform for team collaboration.
What is the significance of the new avatar technology launch mentioned in the script?
-The new avatar technology launch is significant as it represents a major upgrade to Synthesia's core avatar capabilities, promising more expressive and engaging avatars for users.
Why are expressive avatars considered an upgrade from previous avatar technologies?
-Expressive avatars are an upgrade because they understand and respond to the sentiment of the text, allowing for more natural and emotive performances, unlike previous avatars that were limited to lip-syncing and basic movements.
How does the Express one AI model enable avatars to be more expressive?
-The Express one AI model enables avatars to be more expressive by analyzing the text and predicting the sentiment, which then influences the avatar's facial expressions, lip-sync, and voice to match the intended emotion.
What are the three components of the new expressive avatar technology?
-The three components of the new expressive avatar technology are sentiment prediction and facial expressions, better lip-sync, and updated voices to match body language and emotion.
How does Synthesia plan to improve the avatar technology further in the future?
-Synthesia plans to improve avatar technology by adding features like multilanguage voice cloning, adding more expressive avatars, introducing an AI screen recorder, and enhancing the triggers system for animating content into videos.
What is Avatar Preview and how does it assist users?
-Avatar Preview is a feature that allows users to see a real-time, low-resolution preview of their video before rendering it out, which is helpful for iterating on the script and content to achieve the desired expression with the avatar.
How can users try out the new expressive avatars?
-Users can try out the new expressive avatars by visiting Synthesia's website and accessing the avatars feature without needing to sign up or provide an email address.
What are some use cases where expressive avatars can make a significant difference?
-Expressive avatars can make a significant difference in use cases such as healthcare education, product and marketing sales content, and customer support, where emotional connection and engagement are crucial.
Outlines
🌐 Introduction to Synthesia's Global Launch Event
The speaker welcomes the global audience to a live event, acknowledging their punctuality and inviting them to share their locations in the comments. The event is held in London, but the audience spans across different time zones, with some having breakfast and others dinner. The speaker hints at an exciting product launch, which is a significant upgrade to the core Avatar technology. They promise a link for early access to expressive avatars without the need for registration, encouraging viewers to stay until the end of the event. The speaker provides a brief overview of Synthesia, emphasizing its mission to simplify video content creation for everyone through AI-powered video communication tools. Synthesia offers a platform that includes avatars, voices, video editing, and collaborative features, aiming to be user-friendly for those without video editing experience.
🚀 Evolution of Avatar Technology at Synthesia
The speaker discusses the evolution of avatar technology since its inception in 2020. The initial avatars were simple, looping videos of real people with lip-syncing. However, these early avatars lacked body language and facial expressions that matched the voiceover, leading to unnatural movements. In 2022, Synthesia introduced fully synthetically generated avatars, which improved the avatar's movements but still lacked the expressiveness of a human. The speaker highlights the limitations of previous avatars, which could appear robotic and dull due to their lack of understanding of the content they were delivering. The new 'expressive avatars' are introduced as a significant leap forward, capable of understanding the sentiment behind the words and delivering performances that are more engaging and natural.
🎭 Demonstrating the Expressiveness of New Avatars
The speaker demonstrates the new 'expressive avatars' by showing a video of an avatar named Julia. The avatar is programmed to respond to emotionally charged words and emojis, adjusting its facial expressions and body language accordingly. The AI system, called Express one, has been trained to understand the nuances of language and perform accordingly. The speaker emphasizes the improved lip-sync technology and the updated voices that match the avatar's body language and emotional state. The new avatars are shown to be capable of a range of human-like expressions, from happiness to frustration, making them more engaging and relatable.
🤖 The Future of Avatar Technology and Customer Experience
Jonathan Stark, the CTO of Synthesia, discusses the new 'Express one' model, which has learned to understand text and express non-verbal cues for more natural communication. The model is trained with data from professional actors to create more lifelike avatar performances. The speaker highlights the potential for future enhancements, such as placing avatars in various locations and utilizing more body language. The new avatars are expected to open up new use cases, particularly in sensitive areas like healthcare, where empathy is crucial. The avatars can now connect with viewers on an emotional level, which was not possible with previous technology. Examples are provided to illustrate how expressive avatars can be used in different contexts, such as healthcare education, product marketing, and customer support, each with a distinct tone and style.
📈 Upcoming Features and Closing Remarks
The speaker outlines upcoming features for Synthesia, including multilingual voice cloning, additional expressive avatars, an AI screen recorder, and an improved triggers system. They invite the audience to test the new expressive avatars on the Synthesia website without needing to provide an email address. The speaker expresses excitement for the audience's feedback and future developments, hinting at an even more impressive 'Next Generation' of avatar technology to be revealed later in the year. The event concludes with gratitude for the audience's participation and anticipation for the creative uses of the new expressive avatars.
Mindmap
Keywords
💡Synthesia
💡Avatar Technology
💡Expressive Avatars
💡AI Video Assistant
💡Sentiment Prediction
💡Lip Sync
💡Engagement
💡Express One AI Model
💡Multimodal Communication
💡Video Life Cycle
Highlights
Introduction to the anticipated launch of a major upgrade to Avatar technology.
Invitation for participants to share their locations, showcasing the global reach of the audience.
Overview of Synthesia's mission to simplify video content creation for everyone.
Description of the all-in-one platform for AI Video Communications provided by Synthesia.
Emphasis on the ease of use for those without video editing experience.
Introduction of AI video assistant for content generation from various sources.
Explanation of one-click translation feature for global content reach.
Discussion on the importance of video content in the online economy.
Analysis of how visual content engages audiences more effectively.
Historical context of Avatar technology development since 2020.
Introduction of the new Expressive Avatars with improved emotive capabilities.
Explanation of the Express one AI model that powers the new avatars.
Demonstration of sentiment prediction and its impact on avatar performance.
Showcasing improved lip sync technology across multiple languages.
Preview of the new human-like voices that enhance avatar expressiveness.
CTO Jonathan Stark's discussion on the generative model behind Expressive Avatars.
Market feedback and customer anticipation for more lifelike avatars.
Examples of how expressive avatars can be used in healthcare, marketing, and customer support.
Tips and tricks for using expressive avatars effectively.
Announcement of upcoming features like multilingual voice cloning and AI screen recorder.
Invitation to test the new expressive avatars and provide feedback.
Transcripts
all right we're live thanks to all of you for being here on time that's amazing um we're going
to give people just a few more moments uh to trickle in so let me know in the comments where
you're calling in from I'm here in the London HQ but I know the cesia fam is is global so probably
some of you eating breakfast some of you having dinner um let us know where you are it's always
fun to see where people are are calling in from um in just a minute we'll get started this is
going to be I think it's the most anticipated launch we've had this year so far um and that
makes sense this is a huge upgrade to the core Avatar technology uh and I can't wait to show
you all the cool things that uh that you'll be able to do with it if you stick around to the
end we'll give you a link where we can go in and you can be one of the first people in the world
to try out expressive avatars um you don't even have to sign up so make sure you stay until the
end and we'll drop that link for all of you here that said let's get started um as we always do
for those of you new to Synthesia and who haven't heard about Synthesia before I want to spend just
a few moments on talking a little bit about what we do so at cesia our mission is pretty simple we
want to make it easy for everyone in the world to make video content and we do that by giving you
one platform for all your AI Video Communications we help you make the video with our avatars and
voices we're going to be talking more about that today we give you an entire video editor
where you can add in your screen recordings your background images text animations all the things
that you need to make an video end to endend we also give you a collaborative platform so you can
have a team on CIA you can invite your colleagues you can leave comments you can even work together
in videos in real time like you know it from Google Docs where you can see each other's cursors
and once you're done with the video we also help you share the video with the world that could be
through our sharing Pages or via our video player which you can put into your app or your website or
where you want to have it so it's one platform for the entire video life cycle and it's really easy
to use being really easy to use especially for people who don't have any video editing experience
but may come more from a background where they have created content maybe they've you've written
documents maybe you've made PowerPoints we try to make to these as easy to use as possible so
we do that in a few different ways uh we have a huge Bank of templates where you can go in and
you can find um templates for different use cases different visual looks and feels get you started
really quickly we have our AI video assistant a very very popular feature that we launched
recently and with the AI video system we can help take some of your existing content that could be
a PDF document it could be a PowerPoint it could be a blog post on your website so URL or it could
also be just an entirely free from uh free free form prompt idea and we can take one of those and
we can generate a video for you so we'll take the content we'll write the script we'll give you some
basic visuals and we have something which you can finalize yourself and once you've done all that we
also help you with one click translation so take your video and make it into as many different
languages as you want why is it important to make videos well for us the way we think of this is
that if you look at the online economy today it's just very very obvious that people want to watch
and listen to that content and they don't want to read that much anymore right in our private lives
most of us I'm guessing a lot of you out there as well um when you want to learn something new
probably new start on YouTube maybe you listen to a podcast you go on Tik Tok and may also buy
the book but usually that's like step five down the line right and that is definitely the pattern
most people have and it makes sense because biologically we're just hardwired to better
understand and remember visual content right it stimulates more sensors feels like more like
how we consume information in the real world and the kind of byproduct of this is that engagement
goes up significantly and for all of you out there who are making videos for a variety of
different stakeholders engagement is super super super important right if you're doing sales and
marketing content you want engagement because it translates to better verion rates if you're doing
learning and training content you want engagement because it translates into higher information
retention so engagement is really really important and engagement is only getting more important and
video is only going to get more important in terms of how we communicate if you look at the
average attention span it has shortened by 69% since 2004 that is a staggering number right and
what this means for everyone who makes content is you really need to be good at grabbing people's
attention and keeping it and of course you don't do that with a huge wall of text you do that with
really awesome video and audio content and that's what we help you do this is of course what led to
Synthesia um we developed the Avatar technology back in 2020 and the Insight we had back then
was that humans just respond so much better to human faces and voices than anything else when
it comes to Communications um since then the Avatar technology has come quite a far away and
today we're going to be showcasing you the latest and greatest but I wanted to just take a moment
to explain the progression of Avatar Technologies and what are the limitations and what are all the
cool new things you're going to be able to do with expressive avatars so back in 2020 we invented the
first avatar platform and this is the product that you'll see most most other Avatar products are
still using this technology today it's actually pretty simple what you do is you take a real video
of a real person you Loop it in a smart way and then you just change the lips so that they match
a new voice track um this illusion can work pretty well you've probably all seen really good results
of this but it does kind of break down especially when you're using it for creating a lot of content
because ultimately the body language the facial expressions and everything else just doesn't
match what the Avatar is saying because it's just a video playing in the background so you might get
avatars that do weird things with their head like this maybe like hands which are like clearly not
in beat with what's saying and that's because it's actually just a video playing in the background
right in in 2022 we launched the first version of avatars that are fully synthetically generated and
what that means is that everything in the video is generated by AI not just the lips and what
this enabled us to do back in 2022 was to take out those weird things you don't want in video
right you don't want avatars doing weird things with their head and their hands h in 2023 we kind
of took that technology and we began using it to build back the performance into the avatars so
you can add in guestures we improve devices and overall avatars just kind of perform better and
look more real right but there's a big problem with Avatar Technologies up until this day and
is that even though the results today are really good avatars have no idea what they're actually
saying what does that mean well it means that if you look at what humans do and the way we
talk we change our tone of voice we change our facial expressions we emote in our body language
differently depending on what we say so when I'm talking right now you'll see that my eyebrows
going up I have a lot of micro Expressions my hands are kind of in tune with what I'm actually
saying that's not something I'm conscious about right it's just that's just the way that's what
we do when we talk and all of you out there do the exact same thing right it's a lot of very small
things that makes us human essentially but avatars don't have this understanding today and that's why
a lot of avatars can look a bit robotic and a bit dull and just to some extent a little bit
unengaging right because there is very robotic in delivery of their Alliance and one way of thinking
of this is like when you give a line or script to a real actor they will perform that right they
won't just like read up what's on the paper today avatars are just kind of reading up what's on the
paper we um with expressive avatars have actually transformed them into actors that understand what
they're saying and can deliver their lines in a way that is much more engaging and natural
than before what you're seeing right here on the background is you can see the kind of emotions and
like micro expressions and the avatars the can be happy they can be a bit sad they could even sort
of laugh a little bit um and this is because we've built this new uh expressive Avatar product on top
of what we call our Express one AI model which is essentially an AI model that has watched hours and
hours and hours of people talking and kind of decoded what the language and the relationship
between what we say and how we say it so that we can make the avatars perform like that the system
has three components that you'll notice as a user of expressive avatars the first one is sentiment
prediction and facial expressions this is all in our Express one AI model and the second one
is better lip sync so this has of course been a continuous progress for us a project for us
to make the lip sync better and better you should see a significant uplift with this new release and
last but not least we've also given the voices an update so that they also match the body language
that they're emotive and have a high dynamic range of internality Rhythm and so on and so forth the
first thing here is around automatic Sy sentiment prediction so this really is all about figuring
out the relationship between a specific sentence and a performance of The Avatar um I'll love to
just show you kind of like how this this actually works so if we jump into Studio this is Julia this
is one of our new before expressive avatars she's on a nice basis background and I'm going to put in
um a few sentences here so the first one I'm going to put is I am very happy exclamation mark happy
smiley we'll get back to that in a minute but I am frustrated and I am so upset as you can see here
these are of course all emotionally Laten words and um the reason I'm using emotional upated words
here to demonstrate how this works is because what the AI System picks up on what Express one picks
up on is the nuances in the language right it's trying to figure out how this should be performed
and so you can even do kind of small fun things like adding in an emotic conon or an emoji there
just to kind of reemphasize over for the AI model that this should be like a happy sentence and this
way there's lots of like fun explanation to be had to get the right performance out of your avatars
we've made a video um with with the with the script let's watch what this looks like I am very
happy I am so
upset I am frustrated so that's what it looks like with um new expressive avatars as you can
see it's a huge difference from what we had before that and this is all because of our automatic uh
sentiment prediction Tech the next one here is lip sync so lip sync you're all very well aware
of and we've given it um a big Improvement in this version and it's not just in English this
is in any language let's see how some of our new avatars do with uh with some tongue twisters six
Sleek swans swam swiftly southwards Peter Piper picked a peck of pickled peppers how
can a clam cram in a clean cream can third is the human like voice so the voice is of course
an incredibly important part of your video both when you have an avatar but of course
even more so if you're doing slides that don't have or scen don't have an avatar in them um
and here again much more lifelike much more natural much more interesting to listen to
let's just take a quick uh listen here to what traditional AI Avatar voices sound like and then
we'll have a listen to the expressive voices after can you hear that I didn't realize it
would transform the video so much new voices really capture your attention can you hear
that I didn't realize it would transform the video so much new voices really capture your
attention so as you can hear here right it's like it's it's just two completely different ways of
saying the same thing um the latter one of course being much more natural and interesting um to
listen to I can talk all day about Express one uh because it's such an exciting piece of technology
and but I'd love to just bring it over to John Stark who's our CTO talk a bit more about the
model how it was made and what it means for you as U as our customers both today but also the future
as we continue to develop this this technology hi I'm Jonathan Stark I'm CTO at Synthesia um and
today we're here at our whopping press studio in London so Express one is a new type of model that
has learned how to understand the text that's spoken so that it can express non-verbal cues
for communication so when we speak it's not just the words that we say it's how we say it
it's a generative model that's learned to distill knowledge about the world from the data is trained
on and we work with actors in London in New York so our Production Studios I think it's something
like 50 or 60 actors a week that we work with and this is kind of an ongoing program to record the
best performances in the world so we can kind of distill that into the performances of our avatars
today with Avatar technology a lot of what's been done is looping and replay you've got an existing
video and you reuse it with Express one what we're doing is we're generating performances
it's a purely generative performance everything you see is completely new every single time so
when we think about the future of this type of Technology we think about um unlocking richer and
richer content for users so being able to place avatars in locations being able to use more body
language and the way we communicate this is this is all the sorts of things we'd like to bring to
our users awesome so what does this actually mean for you as customers that the avatars now
much better and much more lifelike uh probably for a lot of you I don't need to tell you that
because it's what everyone has been asking for for a really long time and I think the result
sort of like speak for themselves um but we did go out to the market we did talk to a lot of our
customers we looked at how we use some internally um and showed it to just a lot of people and I
think um what it really kind of comes down to is that it opens up a lot of new use cases now that
the avatars are more expressive and more natural and the way it does that is because for videos
where you want to connect with the viewer on some sort of emotional level we actually can
do that now with these avatars but that wasn't really possible with the previous generation so
if you're talking about things that are sensitive like healthcare for example like you want to have
empathy in device you want it to be pleasant to listen to if you're doing product and Marketing
sales content you probably want a bit more kind of upbeat and excitement in device um to really kind
of Storytelling customer support you want to make sure that what is probably a frustrated customer
and you know feels like they're being listened to that it's a friendly video that they're watching
with a degree of understanding not just like a robot and kind of giving them tips on how to solve
their problem there's of course a million more use cases with this makes sense but for us it's
really about now you can actually begin to connect with the viewer in a very different way than you
could before so we prepared a few examples here of what that look looks and sounds like so first
off we got Healthcare education and I think when you look at this view this video right it's just
it's it's such a nice and pleasant experience to to watch it hi James I'm Paloma I'm here to help
with your medication and share some comforting tips it's normal to feel overwhelmed and anxious
but I promise you'll start feeling better soon my first piece of advice keep your medication next
to something you use daily have you considered involving your friends or family they can provide
great support during this time I hope these tips are helpful let's go through treatment
together let me know if you have any questions yeah so as you can hear much more natural has
that kind of empathic voice right as a patient you'd feel kind of in safe hands if you if you
watch this video on the product and marketing is a bit the different end of the spectrum you want
excitement you want optimism um and again all this is actually deduced by our model right from just a
text it understands that for healthcare it should sound a bit more empathetic than if you're putting
in the script of someone selling you or making an ad so all this is automatically determined by the
system um let's listen to the product marketing which should be quite different hello there let
me introduce to you Spen smartly exciting new feature Autos savings with it saving money
becomes a seamless and enjoyable part of your life let me explain we round up to the nearest
dollar and boost your savings whenever you shop Cofe groceries every swipe adds up I could buy
a house already join Spen smartly now and turn your everyday purchases into future savings so as
you can see here this ad is much more upbeat much more excited optimistic um and that's again like
inferred from just the script the third example we're going to see here is customer support so
you're dealing with someone who's probably a bit unhappy with your product they run into a problem
and they're trying to solve it so you want to meet them with a friendly face a nice video
calm them down and help them solve that problem so let's watch this one as well hello there I'm
Jazz seems like you are having some trouble with the internet connection that's annoying but no
worries we'll have it up and running again quickly first let's find your router you'll notice some
lights on it go ahead and try unplugging it for a moment all right next stop let's take a quick
look at your devices just make sure they are all connected to the right Network okay next let's
give your connection a quick test can you open a website or watch a video how's is it looking hope
this was helpful and easy to follow if not please let us know via email so that's a few examples of
what expressive avatars can do and how they're like different from um the avatars that you know
today a few tips and tricks for how to use these avatars um along with uh with expressive avatars
we're also launching Avatar preview so Avatar preview is real real time previewing of your
video before you've actually rendered it and this is quite powerful when you're putting together
your performance you're testing out different types of scripts and emotions the way it works
is just you hit the video preview button and once you do that you will get um in something like 10
seconds you actually get uh to see the video and what it looks like before you before you render
it out it's a bit low resolution than what the final result is but it's fantastic for this kind
of iteration um on on the script and the content that you have to get expression out the avatar
use emotive words and there's a few here on the slide that you can see but in general you
want to use language that kind of provokes emotions right you can also use punctuation
so exclamation marks question marks you can even try putting smilees in there like everything you
can do to sort of guide the system towards the emotion that you want to have in your video and
helps the system understand you uh better so to sum up the difference between kind of old
school AI avatars and expressive avatars for old school avatars right you all know them you've all
seen the videos and they work really well for some types of content especially if it's more
how to a very kind of practical content like how to change your password works pretty fine that is
robotic demeanor bit of robotic voice the script and the facial expressions and the body language
is not really in sync and that can give you these kind of weird moments but again it works for some
use cases um on the other hand expressive avatars actually understand what they're saying the actors
or air avatars are performing like an actor would and this means that it's much more pleasant to
look at that longer form videos is even better and you can now begin to build a bit more of
emotional connection with your viewer right so if you're doing sensitive content content where you
want to drive a bit more engagement you want to drive that connection then expressive avatars is
really really Superior in this regard expressive Avatar is very exciting but we of course working
on a bunch of other things as well and I just wanted to mention a few of them before we we jump
off first one we got multilangual voice cloning so that means your voice in a lot of different
languages very very very cool feature you can hear yourself speak with a different accent
which just in itself is a really fun experience um we're adding more expressive avatars of course
so keep an eye on in the coming weeks more of them will drop into the product we have our AI
screen recorder and this is going to be really really amazing addition to the product for those
of you who work with screen recordings today will soon be ready to lift the veil of what people are
working on um but this will make your life a lot easier the last one is triggers we're renaming
markers which is a system we use for animating content into videos we just simplified it made
it a bit more powerful and for those of you who are super users I think you'll really enjoy kind
of the direction that this feature is taking up next but of course last but not least test out
the expressive avatars and all of you here in the call have a chance to be the first in the
world to actually play with this technology if you go to Synthesia doio avatars you can try that out
for free you don't even have to put your email in um we also have a premium so if you want to
try out the entire product you can definitely do that as well let us know what you think of
the avatars what do you want how can we make them better and then I'll be very excited to talk about
the Next Generation Um sometime later this year which will be even more mind-blowing than what
we're seeing today thank you so much for being here I really really really appreciate it and I
cannot wait to see all the cool things you'll be creating with expressive avatars thank you
Browse More Related Video
Introducing Synthesia 2.0
5 BEST FREE AI TOOLS TO MAKE YOUTUBE VIDEOS
10 INCREDIBLE AI Tools Like ChatGPT You Must Try in 2023!
AI News Anchors: How China Uses AI Deepfake avatars as 'news anchors' to spread disinformation
I Tried 1200 AI Tools - These Are the Top 50
Best AI Video Generator in 2024 (Top 5 Tools We Recommend!)
5.0 / 5 (0 votes)