4th HLF – Hot Topic: Artificial Intelligence – Presentation Raj Reddy

Heidelberg Laureate Forum
13 Jan 201715:38

Summary

TLDRThe speaker discusses the transformative potential of voice computing, particularly for the semi-literate population of over 3 billion people who cannot read or write effectively. Voice computing allows these individuals to interact with technology solely through speech, offering access to services like reading newspapers, shopping, education, and voting. Existing AI technologies, such as speech recognition, translation, and synthesis, can support this. The speaker highlights how voice computing could bridge the digital divide, empower illiterate populations, and create a massive market for speech-based apps, transforming access to information globally.

Takeaways

  • 🎤 Voice computing has the potential to create a profound societal impact by removing the need for keyboards and touch interfaces.
  • 🌍 The focus is on the 3 billion semi-literate or illiterate people, who can only interact with technology through voice.
  • 🗣️ Voice computing can empower semi-literate populations to access services like reading newspapers, voting, online shopping, and education.
  • 🎥 Technologies like real-time speech-to-speech translation, synthesis, and dynamic dubbing already exist, but are not yet fully developed for everyday use.
  • 📱 Simple, speech-based mobile apps tailored to semi-literate users could revolutionize how they interact with technology.
  • 💡 These apps could enable users to learn to read by listening and following highlighted text, essentially teaching them to read without formal education.
  • 🛍️ Voice computing could allow rural and semi-literate populations to benefit from market economies, shopping online, and accessing educational content.
  • 🎓 Learning without a teacher through Socratic dialogue with intelligent agents would help overcome the shortage of teachers in developing countries.
  • 💸 The semi-literate market for voice-based apps is estimated to be worth $1 trillion annually, as 3 billion people could use such technology daily.
  • 📚 Big tech companies like Microsoft, Google, and Amazon should collaborate to develop voice technologies for orphan languages, benefiting millions of people.

Q & A

  • What is the main focus of the speaker's passion in the provided transcript?

    -The speaker's main focus is on voice computing, a technology that enables people to interact with computers using only voice commands, without the need for keyboards or touch interfaces. The speaker believes this technology can have a profound impact on society, especially for semi-literate populations.

  • Who does the speaker identify as the primary beneficiaries of voice computing technology?

    -The speaker identifies semi-literate and illiterate people, particularly those in developing countries, as the primary beneficiaries of voice computing. These individuals may not be able to read or use traditional interfaces but can interact with technology through voice commands.

  • How could voice computing help semi-literate populations, according to the speaker?

    -Voice computing could help semi-literate populations by enabling them to use technology in various ways, such as reading newspapers, watching foreign language movies, attending online lectures, voting, and shopping online. It would also empower them to access educational resources and market opportunities they are currently deprived of.

  • What is the estimated market potential of speech-based apps for semi-literate populations?

    -The speaker estimates the market potential of speech-based apps for semi-literate populations to be worth a trillion dollars per year. This estimate is based on three billion people with an average disposable income of $1 a day.

  • What are the key technologies required to make voice computing a reality for semi-literate populations?

    -The key technologies required include speech recognition, translation, speech synthesis, and spoken dialogue systems. These technologies already exist in some form, as demonstrated by platforms like Microsoft's, Facebook's, and Google's translation tools.

  • Why does the speaker think commercial languages are not the main market for voice computing?

    -The speaker believes that commercial languages like French, German, English, and Chinese are not the main market because most people who speak these languages are literate and do not need voice computing to access information. The true potential lies in less commercially viable languages spoken by semi-literate populations who can benefit from speech-based technology.

  • What is the significance of 'one-click' user interfaces for semi-literate populations, according to the speaker?

    -For semi-literate populations, the user interface needs to be extremely simple and intuitive, ideally requiring only one minute to learn and one click to operate. This reduces the barrier to entry and allows users to quickly and easily interact with the technology using voice commands.

  • What are some potential applications of voice computing mentioned by the speaker?

    -Potential applications include reading newspapers aloud, real-time translation and dubbing of foreign language media, online shopping, and education through Socratic dialogue with AI agents. The technology can also facilitate online voting and access to market economies.

  • What challenges does the speaker mention regarding the development of voice computing for less commercially viable languages?

    -One challenge is the lack of investment in developing speech technology for 'orphan languages,' which are languages not considered commercially viable. Although the necessary tools and technology exist, there is a lack of willingness from companies to invest in these languages.

  • What role does the speaker believe governments and major tech companies should play in promoting voice computing?

    -The speaker believes that local governments and major tech companies like Amazon, Google, and Microsoft should collaborate and share data to promote voice computing for semi-literate populations. Governments may support the development of technology for orphan languages, while companies should work together to avoid duplicating efforts and reduce development costs.

Outlines

00:00

🎙️ The Rise of Voice Computing

In this opening paragraph, the speaker discusses their passion for voice computing, a technology that eliminates the need for keyboards and touch interfaces, relying solely on voice interaction. They explain how this technology can have a profound impact on society, especially for the semi-literate or illiterate population, which numbers around three billion people globally. Voice computing could provide access to technologies that are currently inaccessible due to literacy barriers. The speaker outlines how these people could perform tasks such as reading newspapers, watching educational content, and shopping online through voice commands.

05:01

📰 Voice-Enabled Apps for Literacy and Accessibility

This section explores how voice computing can transform the lives of semi-literate populations by enabling them to access newspapers and other digital content in their local languages. Through real-time speech recognition and translation, apps can help users comprehend and interact with information, potentially helping them learn to read by following highlighted text. The speaker also touches on digital democracy, predicting a future where voting and other democratic processes happen online. They highlight the potential for voice-based apps to help people in rural or underdeveloped regions access global markets and online education.

10:03

🛍️ Socratic Dialogue and the Power of Voice-Based Learning

Here, the focus shifts to the educational applications of voice computing. The speaker introduces the idea of learning without a teacher, where a digital assistant engages users in a Socratic dialogue, helping them learn through conversation. They emphasize the importance of such technology in regions lacking qualified teachers, allowing users to engage in self-directed learning. This has massive market potential, especially among semi-literate populations, which the speaker estimates to be a trillion-dollar opportunity. They reiterate the need for seamless speech-based user interfaces for the widespread adoption of such apps.

15:04

📱 Existing Technologies and the Future of Voice Interfaces

The speaker discusses the current state of speech-to-speech technologies and their presence in platforms like Microsoft, Facebook, and Google. They express disappointment that the biggest market for these tools—semi-literate populations—remains untapped, as companies focus on commercial languages like English and French. The speaker believes that voice-driven apps could become a daily necessity for billions of people, offering opportunities in underserved languages. They stress that the success of such apps hinges on intuitive design, requiring one-click interfaces and minimal learning curves, similar to how one might interact with an assistant or postdoc.

🤖 Voice Computing as the Future of Communication

In this final section, the speaker expands on the architecture and usability challenges of voice computing, emphasizing the need for always-on, autonomic systems that don't drain battery life. They highlight the potential for voice computing to evolve through constant learning and adaptation, much like how social networks and error-correction systems operate today. The speaker underscores the importance of building speech recognition systems for 'orphan' languages, which aren't currently commercially viable but have millions of speakers. They conclude by calling for collaboration between tech giants to support voice computing for these underserved populations, as it could redefine daily life for billions.

Mindmap

Keywords

💡Voice Computing

Voice computing refers to the technology that allows users to interact with computers and digital systems using voice commands instead of traditional input methods like keyboards or touchscreens. In the video, voice computing is seen as a way to bridge the gap for semi-literate populations who cannot use text-based systems, allowing them to access various services and information.

💡Semi-literate Population

Semi-literate populations are individuals who either cannot read or have limited comprehension skills, often due to lack of education. The video highlights the importance of creating voice-based technologies for this demographic, as traditional text-based interfaces are inaccessible to them. Voice computing can empower these individuals to use digital platforms for education, shopping, and communication.

💡Speech Recognition

Speech recognition is the technology that enables a computer or device to identify and process spoken language. In the context of the video, speech recognition is a key component of voice computing, allowing semi-literate users to interact with systems by speaking rather than reading or typing. This technology is crucial for providing access to digital services for people who struggle with literacy.

💡Speech-to-Speech Translation

Speech-to-speech translation involves converting spoken words from one language to another in real time. The video explains that this technology could allow semi-literate users to access content, like news or educational materials, in their own language even if the original content is in a different language. Real-time dubbing of videos and lectures is a specific example given in the transcript.

💡Digital Democracy

Digital democracy refers to the use of technology to enhance democratic processes, such as voting or public participation. The video discusses how voice computing could allow semi-literate populations to participate in online voting or engage with government services, thus empowering them within the digital age. Speech recognition could be used to authenticate users and allow them to cast votes or make decisions online.

💡Killer App

A 'killer app' is a software application so essential that it drives the adoption of the underlying technology. In the video, the speaker argues that voice-based apps for semi-literate populations could be a 'killer app' by being indispensable in everyday life. Examples include apps for reading newspapers, ordering products online, and interacting with virtual assistants, all through voice commands.

💡Learning without a Teacher

Learning without a teacher refers to the concept of self-education facilitated by digital tools and AI-driven systems. The video suggests that voice computing could enable users in underdeveloped regions to learn independently through Socratic dialogues with intelligent agents, making education accessible even in areas with a shortage of teachers. This technology could help semi-literate populations gain knowledge through interactive conversations.

💡Socratic Dialogue

A Socratic dialogue is a method of teaching through asking and answering questions to stimulate critical thinking. In the video, the speaker envisions intelligent agents using this approach to help users learn without formal instruction. The AI would engage in dialogues with users, helping them understand complex subjects by breaking them down into manageable discussions, tailored to the learner's pace and needs.

💡User Interface

A user interface (UI) is the space where interactions between humans and computers occur. The video stresses the importance of designing UIs that are intuitive and easy for semi-literate users to navigate with voice commands. The speaker mentions that these interfaces should allow for 'one-minute learning time' and 'one-click' actions, ensuring that users can quickly learn to use the system without confusion.

💡Orphan Languages

Orphan languages are languages that are not widely supported by commercial technology due to their limited economic value. The video highlights the need for investment in translation and voice technologies for these languages, as they are often spoken by millions of people in semi-literate populations. The speaker argues that despite the commercial focus on major languages, developing technologies for orphan languages could have a profound social impact.

Highlights

Voice computing enables interaction without keyboards or touch, similar to human conversation.

Three billion people live on less than 2.5 dollars a day and are semi-literate or illiterate.

Voice computing can empower semi-literate individuals by allowing them to speak to computers.

Technology like speech recognition, translation, and synthesis can profoundly impact society.

Voice computing can assist in reading newspapers, understanding foreign languages, and accessing educational content.

Real-time translation and dubbing of speeches, like the 2012 TED Talk, is already possible.

Newspaper reading apps could read articles aloud, helping users learn to sight-read.

Digital democracy will likely involve online voting, requiring authentication and authorization technologies.

Online shopping can be revolutionized for rural, illiterate populations through voice computing.

Learning without a teacher is facilitated by AI agents that can explain complex concepts.

The potential market for speech-based apps among the illiterate could be a trillion dollars a year.

Existing speech-to-speech, speech-to-translate, and speech-synthesis technologies are underutilized.

User interface for voice computing must be intuitive, with minimal learning time and one-click actions.

Amazon Echo, Siri, Cortana, and Alexa are examples of existing voice computing technologies.

Orphan languages, not commercially viable, need investment for translation technologies.

Architectural issues for voice computing include non-intrusiveness, autonomic operation, and always-on learning.

Deep learning is essential for voice computing, but it's not the only technology needed.

Collaboration between tech giants like Amazon, Google, and Microsoft could unlock the potential of voice computing for all.

The semi-literate population is a major untapped market for AI technology and voice computing.

Transcripts

play00:00

[Music]

play00:19

so the first part of the first hour is

play00:24

supposed to cover the most recent

play00:27

interesting aspects of AI technology and

play00:31

I'd like to tell you about my passion

play00:35

which is which is now reaching the point

play00:39

where I think it can have a profound

play00:42

impact what what I call voice computing

play00:46

and voice computing is when your cannot

play00:50

you can no longer use keyboards can no

play00:53

longer use touch the only thing you have

play00:56

is voice just like when you are

play00:58

interacting with other people like your

play01:01

postdocs and your assistants that's all

play01:04

you have and in which case how does that

play01:07

proceed and what can happen and it turns

play01:10

out we now have all the relevant

play01:12

technologies and I like to go through

play01:15

that and if we do do the create such a

play01:19

voice computing it can have profound

play01:21

impact on the society as a whole so let

play01:26

me take you through it so the bottom of

play01:30

the pyramid represents about three

play01:32

billion people who have less than 2.5

play01:35

dollars of income every day and these

play01:40

are what I call semi-literate many of

play01:45

them are illiterate they can't read any

play01:47

language that's part of that look also

play01:50

called

play01:50

literacy divide and there are others who

play01:55

can read but they really cannot

play01:56

understand comprehend and so I call both

play02:01

of these groups together semi-literate

play02:05

if you're a semi-literate person on this

play02:08

planet the only way you can use the

play02:11

computers is to speak to them you can't

play02:13

read what is on the screen

play02:15

you can't you know understand you can't

play02:17

follow but but on the other hand these

play02:20

three billion people lead perfectly

play02:23

normal lives you know and so the

play02:26

question is should they be deprived of

play02:29

our technology if their knife not how

play02:31

can they use it and the only way they

play02:34

can use it is to voice computing it

play02:36

turns out and by voice computing imagine

play02:41

for you in your lab you're you know

play02:45

you're a professor and you have a

play02:47

postdoc and you have a secretary you

play02:49

just tell them what to do and they go do

play02:51

it if they can't do it they come back

play02:54

and say I'm sorry I can't do it did you

play02:56

mean this or not they have a

play02:57

clarification dialogue and I imagine

play03:01

computing and computer a intelligent

play03:03

agents being exactly like that you don't

play03:06

tell them all sorts of unnecessary

play03:09

information and the question is are we

play03:12

there the other questions what do we

play03:14

have and what do we need to do to get

play03:16

there so why is computing where there's

play03:21

no keyboard attached can help these

play03:24

semi-literate people in all kinds of

play03:26

ways that you may not have thought about

play03:28

they can help you to read newspapers

play03:31

they can help you read foreign language

play03:34

movies listen to Khan Academy lectures

play03:38

vote online and do shopping online and

play03:43

the interesting aspect of the all of

play03:46

these is to have a speech recognition

play03:53

translation and synthesis technologies

play03:56

speech to speech and having apps that

play04:01

can enable different aspects of it so

play04:04

for example a mobile app for

play04:07

entertainment at education it would have

play04:10

to be able to understand speech and and

play04:14

for example YouTube you can actually get

play04:16

transcriptions of what is being said

play04:17

then you have to translate it and then

play04:21

do the speech synthesis all in real-time

play04:24

you so basically you've heard about

play04:28

dubbing

play04:29

in movies automatic dynamic real-time

play04:32

dubbing is possible today in fact in 9

play04:37

2012 wreck crashes gave a talk in

play04:39

English in in in China and it got

play04:44

translated into Chinese and synthesized

play04:46

into Chinese all in real time in that

play04:50

particular case it was kind of a domain

play04:55

that was understood and they've tried it

play04:57

and it was but I it's not yet ready for

play05:01

primetime it should be I don't know why

play05:03

not and but I think it will be there

play05:06

soon enough and the same thing if you

play05:10

have such technology if I can't read a

play05:13

newspaper I can say I want to read this

play05:15

newspaper in my local language I see

play05:17

some quicker I can't read this screen

play05:20

I can't touch on it and say but I can

play05:22

touch and say there's a nice picture

play05:24

what is this headline about it will read

play05:26

the headline and now you can go and say

play05:29

can you now read me the rest of the

play05:30

story as they go to the next so the

play05:33

whole issue of designing an app and a

play05:37

newspaper reading app that will read to

play05:40

you literate people so suddenly they all

play05:43

can not only the one of the interesting

play05:45

side effects of it is they will learn to

play05:48

sight-read they ever see see the words

play05:50

and if you do the proper interface it

play05:53

highlights the words as we are reading

play05:55

then you essentially learn to read

play05:58

without having to be taught how to read

play06:01

in subsets so one of the areas I work in

play06:05

is digital democracy

play06:07

I believe we are entering a phase where

play06:10

almost all countries in the next 50

play06:12

years will be avoiding online and

play06:15

different countries have different

play06:17

constitutional requirements so that may

play06:19

or may not happen soon but

play06:21

authentication authorization audit

play06:24

technologies we understand fully well

play06:27

and it will be done and the same is true

play06:29

with respect to online shopping

play06:31

essentially it empowers a rural

play06:35

illiterate person to get the same

play06:36

benefits of market economies that the

play06:41

economies now and benefit from from

play06:44

Amazon and so on you're getting the best

play06:47

product at the lowest possible price and

play06:49

Amazon is getting to one day and for for

play06:53

our delivery times and I believe they

play06:55

will solve that same problem in rural

play06:58

India and rural Africa so that you'll be

play07:01

able to do the same thing and learning

play07:04

without a teacher is perhaps the most

play07:06

interesting app that happens with this

play07:09

because we're getting more and more

play07:11

online content and if I can have a

play07:15

Socratic dialogue with an agent which

play07:19

knows a lot more than I do and which

play07:21

would explain to me when I don't

play07:24

understand otherwise I'm reading so the

play07:26

whole idea of learning without a teacher

play07:29

because there are not enough good

play07:31

teachers especially in the third world

play07:33

countries the two you know kind of give

play07:36

you the opportunity so given if all of

play07:40

these things are real

play07:41

I claim they're not only real there near

play07:44

term then the illiterate populations

play07:48

will be the biggest source of customers

play07:52

for speech base up apps in the future

play07:55

and I estimate that to be a trillion

play07:58

dollars a year how did I get that number

play08:01

three billion times $1 a day which is

play08:05

the disposable income is essentially

play08:09

three billion dollars a day times 365

play08:11

days is where you get the trillion

play08:14

dollars from ok so there is not to be

play08:17

sneezed at so the question is what

play08:20

technology exists speech to speech

play08:24

speech in translate speech out into

play08:27

another language already exists in

play08:29

Microsoft Facebook Google I haven't seen

play08:33

the Google version yet but they have the

play08:35

best language translation capability of

play08:37

anybody but I'm sure all of them have it

play08:41

this point there but the unfortunate

play08:44

thing is most of them mistakenly think

play08:46

the biggest market are the commercial

play08:48

languages like French German English

play08:51

Japanese and Chinese but they're most

play08:55

people are literate they don't need the

play08:57

technology that I have a Translate app

play09:01

on my command I phone I never use it

play09:04

because I what if we can't read it right

play09:08

then you use it every day if you use it

play09:13

every day that is the killer app if

play09:15

three billion people are using it you

play09:16

can't think of any better killer app

play09:18

than that right so there is a mistaken

play09:21

understanding of the market

play09:23

opportunities but I think it will be

play09:25

fixed itself so the apps tailored to

play09:29

semi-literate populations of the kind

play09:31

that I told you will have a profound

play09:34

impact but the user interface is very

play09:37

important it has to be one minute

play09:40

learning time one-click not three clicks

play09:44

to do something and you know I said two

play09:48

clicks here sometimes you have to say

play09:49

what do you want to doing and then do it

play09:51

so you may need to install spoken

play09:54

dialogue there's no keyboard there's no

play09:57

tax no you know everything has to be

play10:00

speech only imagine your assistant or

play10:02

imagine your postdoc and that's what

play10:04

you'll be doing here if you can tell

play10:07

that your postdoc to do something and

play10:09

then this app should do exactly the

play10:11

kinda same behave the same way and all

play10:14

these apps will require speech

play10:16

recognition spoken dialogue switch to

play10:18

speech translation and all of those

play10:21

technologies exist already question

play10:23

answering dialogue already exists so if

play10:28

I take you know I don't know how many of

play10:31

you have used Amazon echo Alexa agent I

play10:36

created another agent called Asha it

play10:40

turns out in order for the systems we

play10:42

create you know quickly recognized it

play10:44

has to be a two syllable name there

play10:46

should be a fricative end in between

play10:49

within with that it's easier to reliably

play10:52

recognize

play10:53

so inertia is a common name in many of

play10:56

the many cultures so you can do say our

play11:00

shop Amy Shakespeare Hamlet and it'll

play11:03

play too in local language you can say I

play11:06

shall read me the Eenadu newspaper it'll

play11:09

do that or you can say I shall order me

play11:12

eggs and bread and whatever it'll get

play11:15

delivered to you say you know the same

play11:17

day or next day and I shall you know I

play11:20

want to talk to my grandson in Seattle

play11:22

and it'll do that and I shall my mobile

play11:26

device with a thousand rupees or

play11:28

whatever and again all these things are

play11:31

routine and they are already being done

play11:34

you know if you take Siri or Cortana

play11:37

Alexa they're all doing it and so the

play11:40

question is why can't we do it for the

play11:44

orphan languages what we need our orphan

play11:48

you know languages there are many

play11:50

languages that are not considered

play11:54

commercially viable

play11:55

therefore it turns out to build a

play11:59

translation afternoon to Marathi

play12:01

you need to understand Marathi you need

play12:04

to have a database all of that requires

play12:07

certain amount of investment the tools

play12:10

exist the technology exists they just

play12:12

haven't been done for these languages

play12:13

and when I talk to them everybody agrees

play12:18

is a good idea but nobody wants to step

play12:21

up to the plate right so anyway there's

play12:24

a whole set of architectural issues I

play12:27

won't go into here but it has to be non

play12:29

intrusive autonomic that is if you have

play12:33

a half- on your iPhone you have to tap

play12:36

it this has to be always on and always

play12:38

working and always learning and if it's

play12:42

always on you run out of battery very

play12:44

quickly therefore you need either a

play12:46

desktop one like Amazon echo or some

play12:51

other new technology we haven't invented

play12:52

yet and it monitors analyzes and learns

play12:57

from experience and it shares data

play13:00

suitably anonymized

play13:03

so that it's learning preferences by

play13:06

observing other similar people's choice

play13:08

through social networks learning tasks

play13:11

similarity and uses similarity learning

play13:14

by error correction and simply learning

play13:18

by through clarification by walk so none

play13:21

of these are what you we now call deep

play13:23

learning which we'll talk about

play13:25

but deep learning is an essential part

play13:27

of the Cystic knowledge e that has been

play13:30

invented for speech so I think what

play13:33

we'll see is that the implications of

play13:36

all of these so no single company seems

play13:39

to want to make investments in these

play13:43

orful languages but there are many

play13:46

languages that more than twenty million

play13:48

people speak so the market size is

play13:50

reasonably big so we ought to be able to

play13:53

look at these languages and so for every

play13:56

for a literate population it will become

play13:59

a life mine and use every used every day

play14:02

that's what is the definition of a

play14:04

killer app if you touch it every day

play14:06

then you have an app that makes sense I

play14:09

have a lot of apps on my iPhone I have

play14:11

not touched even one thing ever in a

play14:12

year so then it doesn't matter

play14:15

in conclusion two billion semi-literate

play14:18

population two billion plus in the world

play14:22

are a major untapped market and AI

play14:26

technology stretched pressure especially

play14:29

effective use of speech technology and

play14:31

voice computing is the only option only

play14:36

option to support their needs

play14:38

and we have all the needed tools and

play14:40

technology and if there's an issue of

play14:44

foreign languages costs associated with

play14:46

it local net governments will probably

play14:50

be willing to support and here is one

play14:53

area where Amazon Google Microsoft

play14:56

should all collaborate share the data

play14:59

there's no nothing magical quick they'll

play15:01

all collect the same stupid data and I

play15:04

know they're all kind of proprietary we

play15:06

should not keep them some preparatory we

play15:08

should kind of shed it

play15:09

so with that thank you

play15:12

[Applause]

play15:18

[Music]

play15:33

you

play15:35

[Music]

Rate This

5.0 / 5 (0 votes)

Related Tags
Voice ComputingAI TechnologySemi-LiteracyDigital DivideSpeech RecognitionLanguage TranslationReal-Time DubbingEducational AppsDigital DemocracyMarket Opportunities