4th HLF – Hot Topic: Artificial Intelligence – Presentation Raj Reddy
Summary
TLDRThe speaker discusses the transformative potential of voice computing, particularly for the semi-literate population of over 3 billion people who cannot read or write effectively. Voice computing allows these individuals to interact with technology solely through speech, offering access to services like reading newspapers, shopping, education, and voting. Existing AI technologies, such as speech recognition, translation, and synthesis, can support this. The speaker highlights how voice computing could bridge the digital divide, empower illiterate populations, and create a massive market for speech-based apps, transforming access to information globally.
Takeaways
- 🎤 Voice computing has the potential to create a profound societal impact by removing the need for keyboards and touch interfaces.
- 🌍 The focus is on the 3 billion semi-literate or illiterate people, who can only interact with technology through voice.
- 🗣️ Voice computing can empower semi-literate populations to access services like reading newspapers, voting, online shopping, and education.
- 🎥 Technologies like real-time speech-to-speech translation, synthesis, and dynamic dubbing already exist, but are not yet fully developed for everyday use.
- 📱 Simple, speech-based mobile apps tailored to semi-literate users could revolutionize how they interact with technology.
- 💡 These apps could enable users to learn to read by listening and following highlighted text, essentially teaching them to read without formal education.
- 🛍️ Voice computing could allow rural and semi-literate populations to benefit from market economies, shopping online, and accessing educational content.
- 🎓 Learning without a teacher through Socratic dialogue with intelligent agents would help overcome the shortage of teachers in developing countries.
- 💸 The semi-literate market for voice-based apps is estimated to be worth $1 trillion annually, as 3 billion people could use such technology daily.
- 📚 Big tech companies like Microsoft, Google, and Amazon should collaborate to develop voice technologies for orphan languages, benefiting millions of people.
Q & A
What is the main focus of the speaker's passion in the provided transcript?
-The speaker's main focus is on voice computing, a technology that enables people to interact with computers using only voice commands, without the need for keyboards or touch interfaces. The speaker believes this technology can have a profound impact on society, especially for semi-literate populations.
Who does the speaker identify as the primary beneficiaries of voice computing technology?
-The speaker identifies semi-literate and illiterate people, particularly those in developing countries, as the primary beneficiaries of voice computing. These individuals may not be able to read or use traditional interfaces but can interact with technology through voice commands.
How could voice computing help semi-literate populations, according to the speaker?
-Voice computing could help semi-literate populations by enabling them to use technology in various ways, such as reading newspapers, watching foreign language movies, attending online lectures, voting, and shopping online. It would also empower them to access educational resources and market opportunities they are currently deprived of.
What is the estimated market potential of speech-based apps for semi-literate populations?
-The speaker estimates the market potential of speech-based apps for semi-literate populations to be worth a trillion dollars per year. This estimate is based on three billion people with an average disposable income of $1 a day.
What are the key technologies required to make voice computing a reality for semi-literate populations?
-The key technologies required include speech recognition, translation, speech synthesis, and spoken dialogue systems. These technologies already exist in some form, as demonstrated by platforms like Microsoft's, Facebook's, and Google's translation tools.
Why does the speaker think commercial languages are not the main market for voice computing?
-The speaker believes that commercial languages like French, German, English, and Chinese are not the main market because most people who speak these languages are literate and do not need voice computing to access information. The true potential lies in less commercially viable languages spoken by semi-literate populations who can benefit from speech-based technology.
What is the significance of 'one-click' user interfaces for semi-literate populations, according to the speaker?
-For semi-literate populations, the user interface needs to be extremely simple and intuitive, ideally requiring only one minute to learn and one click to operate. This reduces the barrier to entry and allows users to quickly and easily interact with the technology using voice commands.
What are some potential applications of voice computing mentioned by the speaker?
-Potential applications include reading newspapers aloud, real-time translation and dubbing of foreign language media, online shopping, and education through Socratic dialogue with AI agents. The technology can also facilitate online voting and access to market economies.
What challenges does the speaker mention regarding the development of voice computing for less commercially viable languages?
-One challenge is the lack of investment in developing speech technology for 'orphan languages,' which are languages not considered commercially viable. Although the necessary tools and technology exist, there is a lack of willingness from companies to invest in these languages.
What role does the speaker believe governments and major tech companies should play in promoting voice computing?
-The speaker believes that local governments and major tech companies like Amazon, Google, and Microsoft should collaborate and share data to promote voice computing for semi-literate populations. Governments may support the development of technology for orphan languages, while companies should work together to avoid duplicating efforts and reduce development costs.
Outlines
🎙️ The Rise of Voice Computing
In this opening paragraph, the speaker discusses their passion for voice computing, a technology that eliminates the need for keyboards and touch interfaces, relying solely on voice interaction. They explain how this technology can have a profound impact on society, especially for the semi-literate or illiterate population, which numbers around three billion people globally. Voice computing could provide access to technologies that are currently inaccessible due to literacy barriers. The speaker outlines how these people could perform tasks such as reading newspapers, watching educational content, and shopping online through voice commands.
📰 Voice-Enabled Apps for Literacy and Accessibility
This section explores how voice computing can transform the lives of semi-literate populations by enabling them to access newspapers and other digital content in their local languages. Through real-time speech recognition and translation, apps can help users comprehend and interact with information, potentially helping them learn to read by following highlighted text. The speaker also touches on digital democracy, predicting a future where voting and other democratic processes happen online. They highlight the potential for voice-based apps to help people in rural or underdeveloped regions access global markets and online education.
🛍️ Socratic Dialogue and the Power of Voice-Based Learning
Here, the focus shifts to the educational applications of voice computing. The speaker introduces the idea of learning without a teacher, where a digital assistant engages users in a Socratic dialogue, helping them learn through conversation. They emphasize the importance of such technology in regions lacking qualified teachers, allowing users to engage in self-directed learning. This has massive market potential, especially among semi-literate populations, which the speaker estimates to be a trillion-dollar opportunity. They reiterate the need for seamless speech-based user interfaces for the widespread adoption of such apps.
📱 Existing Technologies and the Future of Voice Interfaces
The speaker discusses the current state of speech-to-speech technologies and their presence in platforms like Microsoft, Facebook, and Google. They express disappointment that the biggest market for these tools—semi-literate populations—remains untapped, as companies focus on commercial languages like English and French. The speaker believes that voice-driven apps could become a daily necessity for billions of people, offering opportunities in underserved languages. They stress that the success of such apps hinges on intuitive design, requiring one-click interfaces and minimal learning curves, similar to how one might interact with an assistant or postdoc.
🤖 Voice Computing as the Future of Communication
In this final section, the speaker expands on the architecture and usability challenges of voice computing, emphasizing the need for always-on, autonomic systems that don't drain battery life. They highlight the potential for voice computing to evolve through constant learning and adaptation, much like how social networks and error-correction systems operate today. The speaker underscores the importance of building speech recognition systems for 'orphan' languages, which aren't currently commercially viable but have millions of speakers. They conclude by calling for collaboration between tech giants to support voice computing for these underserved populations, as it could redefine daily life for billions.
Mindmap
Keywords
💡Voice Computing
💡Semi-literate Population
💡Speech Recognition
💡Speech-to-Speech Translation
💡Digital Democracy
💡Killer App
💡Learning without a Teacher
💡Socratic Dialogue
💡User Interface
💡Orphan Languages
Highlights
Voice computing enables interaction without keyboards or touch, similar to human conversation.
Three billion people live on less than 2.5 dollars a day and are semi-literate or illiterate.
Voice computing can empower semi-literate individuals by allowing them to speak to computers.
Technology like speech recognition, translation, and synthesis can profoundly impact society.
Voice computing can assist in reading newspapers, understanding foreign languages, and accessing educational content.
Real-time translation and dubbing of speeches, like the 2012 TED Talk, is already possible.
Newspaper reading apps could read articles aloud, helping users learn to sight-read.
Digital democracy will likely involve online voting, requiring authentication and authorization technologies.
Online shopping can be revolutionized for rural, illiterate populations through voice computing.
Learning without a teacher is facilitated by AI agents that can explain complex concepts.
The potential market for speech-based apps among the illiterate could be a trillion dollars a year.
Existing speech-to-speech, speech-to-translate, and speech-synthesis technologies are underutilized.
User interface for voice computing must be intuitive, with minimal learning time and one-click actions.
Amazon Echo, Siri, Cortana, and Alexa are examples of existing voice computing technologies.
Orphan languages, not commercially viable, need investment for translation technologies.
Architectural issues for voice computing include non-intrusiveness, autonomic operation, and always-on learning.
Deep learning is essential for voice computing, but it's not the only technology needed.
Collaboration between tech giants like Amazon, Google, and Microsoft could unlock the potential of voice computing for all.
The semi-literate population is a major untapped market for AI technology and voice computing.
Transcripts
[Music]
so the first part of the first hour is
supposed to cover the most recent
interesting aspects of AI technology and
I'd like to tell you about my passion
which is which is now reaching the point
where I think it can have a profound
impact what what I call voice computing
and voice computing is when your cannot
you can no longer use keyboards can no
longer use touch the only thing you have
is voice just like when you are
interacting with other people like your
postdocs and your assistants that's all
you have and in which case how does that
proceed and what can happen and it turns
out we now have all the relevant
technologies and I like to go through
that and if we do do the create such a
voice computing it can have profound
impact on the society as a whole so let
me take you through it so the bottom of
the pyramid represents about three
billion people who have less than 2.5
dollars of income every day and these
are what I call semi-literate many of
them are illiterate they can't read any
language that's part of that look also
called
literacy divide and there are others who
can read but they really cannot
understand comprehend and so I call both
of these groups together semi-literate
if you're a semi-literate person on this
planet the only way you can use the
computers is to speak to them you can't
read what is on the screen
you can't you know understand you can't
follow but but on the other hand these
three billion people lead perfectly
normal lives you know and so the
question is should they be deprived of
our technology if their knife not how
can they use it and the only way they
can use it is to voice computing it
turns out and by voice computing imagine
for you in your lab you're you know
you're a professor and you have a
postdoc and you have a secretary you
just tell them what to do and they go do
it if they can't do it they come back
and say I'm sorry I can't do it did you
mean this or not they have a
clarification dialogue and I imagine
computing and computer a intelligent
agents being exactly like that you don't
tell them all sorts of unnecessary
information and the question is are we
there the other questions what do we
have and what do we need to do to get
there so why is computing where there's
no keyboard attached can help these
semi-literate people in all kinds of
ways that you may not have thought about
they can help you to read newspapers
they can help you read foreign language
movies listen to Khan Academy lectures
vote online and do shopping online and
the interesting aspect of the all of
these is to have a speech recognition
translation and synthesis technologies
speech to speech and having apps that
can enable different aspects of it so
for example a mobile app for
entertainment at education it would have
to be able to understand speech and and
for example YouTube you can actually get
transcriptions of what is being said
then you have to translate it and then
do the speech synthesis all in real-time
you so basically you've heard about
dubbing
in movies automatic dynamic real-time
dubbing is possible today in fact in 9
2012 wreck crashes gave a talk in
English in in in China and it got
translated into Chinese and synthesized
into Chinese all in real time in that
particular case it was kind of a domain
that was understood and they've tried it
and it was but I it's not yet ready for
primetime it should be I don't know why
not and but I think it will be there
soon enough and the same thing if you
have such technology if I can't read a
newspaper I can say I want to read this
newspaper in my local language I see
some quicker I can't read this screen
I can't touch on it and say but I can
touch and say there's a nice picture
what is this headline about it will read
the headline and now you can go and say
can you now read me the rest of the
story as they go to the next so the
whole issue of designing an app and a
newspaper reading app that will read to
you literate people so suddenly they all
can not only the one of the interesting
side effects of it is they will learn to
sight-read they ever see see the words
and if you do the proper interface it
highlights the words as we are reading
then you essentially learn to read
without having to be taught how to read
in subsets so one of the areas I work in
is digital democracy
I believe we are entering a phase where
almost all countries in the next 50
years will be avoiding online and
different countries have different
constitutional requirements so that may
or may not happen soon but
authentication authorization audit
technologies we understand fully well
and it will be done and the same is true
with respect to online shopping
essentially it empowers a rural
illiterate person to get the same
benefits of market economies that the
economies now and benefit from from
Amazon and so on you're getting the best
product at the lowest possible price and
Amazon is getting to one day and for for
our delivery times and I believe they
will solve that same problem in rural
India and rural Africa so that you'll be
able to do the same thing and learning
without a teacher is perhaps the most
interesting app that happens with this
because we're getting more and more
online content and if I can have a
Socratic dialogue with an agent which
knows a lot more than I do and which
would explain to me when I don't
understand otherwise I'm reading so the
whole idea of learning without a teacher
because there are not enough good
teachers especially in the third world
countries the two you know kind of give
you the opportunity so given if all of
these things are real
I claim they're not only real there near
term then the illiterate populations
will be the biggest source of customers
for speech base up apps in the future
and I estimate that to be a trillion
dollars a year how did I get that number
three billion times $1 a day which is
the disposable income is essentially
three billion dollars a day times 365
days is where you get the trillion
dollars from ok so there is not to be
sneezed at so the question is what
technology exists speech to speech
speech in translate speech out into
another language already exists in
Microsoft Facebook Google I haven't seen
the Google version yet but they have the
best language translation capability of
anybody but I'm sure all of them have it
this point there but the unfortunate
thing is most of them mistakenly think
the biggest market are the commercial
languages like French German English
Japanese and Chinese but they're most
people are literate they don't need the
technology that I have a Translate app
on my command I phone I never use it
because I what if we can't read it right
then you use it every day if you use it
every day that is the killer app if
three billion people are using it you
can't think of any better killer app
than that right so there is a mistaken
understanding of the market
opportunities but I think it will be
fixed itself so the apps tailored to
semi-literate populations of the kind
that I told you will have a profound
impact but the user interface is very
important it has to be one minute
learning time one-click not three clicks
to do something and you know I said two
clicks here sometimes you have to say
what do you want to doing and then do it
so you may need to install spoken
dialogue there's no keyboard there's no
tax no you know everything has to be
speech only imagine your assistant or
imagine your postdoc and that's what
you'll be doing here if you can tell
that your postdoc to do something and
then this app should do exactly the
kinda same behave the same way and all
these apps will require speech
recognition spoken dialogue switch to
speech translation and all of those
technologies exist already question
answering dialogue already exists so if
I take you know I don't know how many of
you have used Amazon echo Alexa agent I
created another agent called Asha it
turns out in order for the systems we
create you know quickly recognized it
has to be a two syllable name there
should be a fricative end in between
within with that it's easier to reliably
recognize
so inertia is a common name in many of
the many cultures so you can do say our
shop Amy Shakespeare Hamlet and it'll
play too in local language you can say I
shall read me the Eenadu newspaper it'll
do that or you can say I shall order me
eggs and bread and whatever it'll get
delivered to you say you know the same
day or next day and I shall you know I
want to talk to my grandson in Seattle
and it'll do that and I shall my mobile
device with a thousand rupees or
whatever and again all these things are
routine and they are already being done
you know if you take Siri or Cortana
Alexa they're all doing it and so the
question is why can't we do it for the
orphan languages what we need our orphan
you know languages there are many
languages that are not considered
commercially viable
therefore it turns out to build a
translation afternoon to Marathi
you need to understand Marathi you need
to have a database all of that requires
certain amount of investment the tools
exist the technology exists they just
haven't been done for these languages
and when I talk to them everybody agrees
is a good idea but nobody wants to step
up to the plate right so anyway there's
a whole set of architectural issues I
won't go into here but it has to be non
intrusive autonomic that is if you have
a half- on your iPhone you have to tap
it this has to be always on and always
working and always learning and if it's
always on you run out of battery very
quickly therefore you need either a
desktop one like Amazon echo or some
other new technology we haven't invented
yet and it monitors analyzes and learns
from experience and it shares data
suitably anonymized
so that it's learning preferences by
observing other similar people's choice
through social networks learning tasks
similarity and uses similarity learning
by error correction and simply learning
by through clarification by walk so none
of these are what you we now call deep
learning which we'll talk about
but deep learning is an essential part
of the Cystic knowledge e that has been
invented for speech so I think what
we'll see is that the implications of
all of these so no single company seems
to want to make investments in these
orful languages but there are many
languages that more than twenty million
people speak so the market size is
reasonably big so we ought to be able to
look at these languages and so for every
for a literate population it will become
a life mine and use every used every day
that's what is the definition of a
killer app if you touch it every day
then you have an app that makes sense I
have a lot of apps on my iPhone I have
not touched even one thing ever in a
year so then it doesn't matter
in conclusion two billion semi-literate
population two billion plus in the world
are a major untapped market and AI
technology stretched pressure especially
effective use of speech technology and
voice computing is the only option only
option to support their needs
and we have all the needed tools and
technology and if there's an issue of
foreign languages costs associated with
it local net governments will probably
be willing to support and here is one
area where Amazon Google Microsoft
should all collaborate share the data
there's no nothing magical quick they'll
all collect the same stupid data and I
know they're all kind of proprietary we
should not keep them some preparatory we
should kind of shed it
so with that thank you
[Applause]
[Music]
you
[Music]
تصفح المزيد من مقاطع الفيديو ذات الصلة
5.0 / 5 (0 votes)