Natural Language Processing: Crash Course Computer Science #36

CrashCourse
22 Nov 201711:49

Summary

TLDR本视频介绍了计算机科学中的自然语言处理(NLP)技术,它是计算机科学与语言学的交叉领域。NLP旨在让计算机能够理解和生成人类语言。视频首先解释了自然语言与编程语言的不同之处,强调了自然语言的复杂性和多样性。随后,介绍了如何通过词性标注、短语结构规则和解析树来分析句子结构,以及如何利用这些技术来处理和回应信息。视频还探讨了聊天机器人的发展历程,从基于规则的系统到现代基于机器学习的系统,并提到了聊天机器人在客户服务中的应用。此外,还讨论了语音识别技术,包括早期的系统、算法的进步以及深度神经网络在提高语音识别准确性方面的作用。最后,视频介绍了语音合成技术,展示了从早期的机械式语音合成到现代更自然、流畅的语音输出的演变,并预测了语音技术在未来可能成为与屏幕、键盘等传统输入输出设备一样普遍的交互形式。

Takeaways

  • 📚 自然语言处理(NLP)是一门结合计算机科学和语言学的跨学科领域,旨在让计算机能够理解语言。
  • 🤖 计算机语言与人类自然语言不同,后者拥有更大的词汇量和更复杂的语法结构。
  • 🌐 早期NLP的基本问题之一是将句子分解成小块,以便于计算机处理。
  • 📝 句子的结构可以通过构建解析树来理解,这有助于计算机识别词性并理解句子构造。
  • 🔍 语音搜索和命令处理依赖于语言的解析和生成,计算机通过类似乐高的方式处理自然语言任务。
  • 🚫 计算机在处理过于复杂或含糊的句子时可能会失败,无法正确解析句子或捕捉意图。
  • 📈 计算机生成自然语言文本时,会使用语义网络中的实体关系来构建信息性句子。
  • 🤖 早期聊天机器人基于规则,但现代方法使用机器学习,通过大量真实对话数据训练。
  • 👥 聊天机器人和对话系统在过去五十年中取得了显著进步,现在可以非常逼真。
  • 🎤 语音识别是计算机从声音中提取单词的领域,它已经研究了几十年。
  • 📊 深度神经网络是目前语音识别系统中准确性最高的技术,它们通过分析声波的频谱来工作。
  • 🔊 语音合成是让计算机输出语音的能力,它与语音识别相反,涉及将文本分解成音素并连续播放。

Q & A

  • 计算机视觉和自然语言处理(NLP)有什么不同?

    -计算机视觉赋予计算机视觉信息的识别和理解能力,而自然语言处理(NLP)则是让计算机能够理解语言。计算机视觉处理的是视觉数据,如图像和视频,而NLP处理的是文本和语音数据。

  • 自然语言处理(NLP)中的基础问题是什么?

    -NLP中的一个基础问题是将句子分解成更小的、更易于处理的部分,这通常涉及到词性标注和解析树的构建,以帮助计算机理解句子的结构和意义。

  • 什么是词性标注,它在NLP中有什么作用?

    -词性标注是识别文本中每个单词的语法类别(如名词、动词等)的过程。在NLP中,词性标注有助于计算机理解单词在句子中的语法功能,从而更好地解析和理解句子。

  • 如何使用短语结构规则来构建解析树?

    -通过应用短语结构规则,可以识别句子中各个成分的语法角色,并将这些成分按照层级结构组织起来,形成解析树。解析树不仅标注了每个单词可能的词性,还揭示了句子的构造方式。

  • 为什么自然语言处理对于计算机来说是一个挑战?

    -自然语言处理对计算机来说是挑战,因为人类语言包含大量多样的词汇、多种含义的单词、不同口音的说话者以及各种有趣的文字游戏。此外,人类在书写和说话时也会犯错,如模糊不清、遗漏关键细节或发音错误,而计算机需要能够理解和处理这些复杂性。

  • 早期的聊天机器人是如何工作的?

    -早期的聊天机器人主要是基于规则的,专家会编码数百条规则,将用户可能说出的话映射到程序应该如何回复。例如,20世纪60年代中期在麻省理工学院创建的ELIZA就是一个采用基本句法规则来识别书面交流中的内容,然后将其反馈给用户的聊天机器人。

  • 现代聊天机器人与早期版本相比有哪些进步?

    -现代聊天机器人使用基于机器学习的方法,利用大量真实人类对话来训练。这使得它们在客户服务应用等领域变得更加高效和有说服力,并且能够更准确地理解和回应用户输入。

  • 为什么计算机生成自然语言文本时需要使用语义网络?

    -计算机生成自然语言文本时使用语义网络,是因为这样可以将数据存储在实体之间有意义的关系网中。这些关系网提供了构建信息性句子所需的所有成分,使得文本生成过程更加高效和准确。

  • 语音识别技术是如何从声音中提取单词的?

    -语音识别技术通过分析声音信号的波形,将其转换为频谱图,从而识别出不同的频率成分,这些成分对应于声音中的共振。通过识别这些共振,即声音的特定模式,计算机可以识别出构成单词的音素,进而将语音转换为文本。

  • 什么是语言模型,它在语音识别中扮演什么角色?

    -语言模型是包含关于单词序列统计信息的模型,它在语音识别中帮助提高转录的准确性。例如,如果语音识别器在“happy”和“harpy”之间不确定,语言模型会根据统计数据选择更可能的选项“happy”,因为“she was”后面紧跟形容词比名词更常见。

  • 语音合成是如何工作的?

    -语音合成是将文本句子分解为其音素成分,然后将这些声音连续地从计算机扬声器中播放出来的过程。现代语音合成技术已经变得非常先进,能够产生听起来非常自然的声音,尽管它们仍然与人类声音有所不同。

  • 为什么语音技术可能很快成为与屏幕、键盘等物理输入输出设备一样常见的交互形式?

    -语音技术的普及正在创造一个正反馈循环,人们越来越多地使用语音交互,这反过来为像谷歌、亚马逊和微软这样的公司提供了更多的数据来训练他们的系统。随着准确性的提高,人们会更频繁地使用语音,这又进一步促进了准确性的提升。预计语音技术将很快变得与我们今天使用的其他物理输入输出设备一样普遍。

Outlines

00:00

📚 自然语言处理(NLP)简介

本段落介绍了自然语言处理(NLP)的基本概念,包括计算机如何理解人类语言。强调了计算机语言与自然语言之间的差异,以及自然语言的复杂性,如词汇量大、一词多义、口音多样性等。提到了NLP的早期问题是如何将句子分解为更易于处理的小部分,这涉及到词性标注和语法规则。此外,还讨论了解析树的构建,以及如何通过这种方式理解句子的结构。最后,介绍了NLP在语音搜索和命令处理中的应用,以及其在理解复杂或含糊语言时的局限性。

05:01

🤖 聊天机器人和语音识别的发展

第二段落讨论了聊天机器人的发展历程,从基于规则的系统如ELIZA,到现代基于机器学习的聊天机器人,它们通过分析大量真实对话来学习。提到了谷歌的知识图谱,这是一个存储大量事实和实体关系的数据库。同时,也探讨了语音识别技术的历史,从早期的数字识别系统到现代使用深度神经网络的实时语音识别系统。解释了声波信号如何被转换成频谱图,以及如何通过分析声音的共振模式(形式)来识别不同的元音和单词。此外,还讨论了语音合成技术,即将文本转换为语音输出的过程,以及如何通过结合语言模型来提高语音识别的准确性。

10:02

🔊 语音技术的前景和影响

最后一个段落展望了语音技术的未来,预测它将变得和屏幕、键盘等物理输入输出设备一样普遍。强调了语音用户界面在手机、汽车、家庭中的普及,以及这种普及如何形成正反馈循环,促进了语音识别技术的改进。提到了合成计算机语音的改进,以及它们与人类语音的接近程度。最后,指出了语音技术对于机器人等无需物理键盘即可与人类交流的设备的重要性,并预告了下周将继续探讨机器人相关的话题。

Mindmap

Keywords

💡自然语言处理(Natural Language Processing,NLP)

自然语言处理是一门结合计算机科学和语言学的跨学科领域,旨在使计算机能够理解和使用人类语言。在视频中,NLP是核心主题,它涉及将语言分解成计算机可以处理的小块,以及如何利用语法规则来解析和生成语言。例如,视频中提到了NLP的早期问题是如何将句子分解成更易于处理的部分,以及如何使用短语结构规则来构建解析树。

💡词性标注(Parts of Speech)

词性标注是指识别句子中每个单词的语法类别,如名词、动词、形容词等。视频中提到了九种基本的英语词性,这些词性对于理解句子结构至关重要。例如,通过识别“名词短语”和“动词短语”,计算机可以构建句子的解析树,理解句子的主体和动作。

💡解析树(Parse Tree)

解析树是一种数据结构,用于表示句子的语法结构。通过短语结构规则,计算机可以构建一个解析树,它不仅为每个单词标注可能的词性,还揭示了句子是如何构造的。视频中提到,通过解析树,我们可以知道句子的名词焦点是“蒙古人”,并且知道它们执行的动作是“从某物上升起”,这里的“某物”是“叶子”。

💡语音识别(Speech Recognition)

语音识别是指将人类的语音转换为计算机可读的文本的过程。视频中提到了语音识别技术的发展,从20世纪50年代的Audrey系统只能识别数字,到今天使用深度神经网络的高精度系统。语音识别通过分析声波的波形和频率,识别出构成单词的音素,并将这些音素转换为文本。

💡语音合成(Speech Synthesis)

语音合成是将文本转换为口语的过程,与语音识别相反。通过将文本分解为音素,然后连续播放这些声音,计算机可以生成口语。视频中提到了语音合成技术的进步,从早期的机械式发音到现在使用深度学习技术生成更自然、更接近人类语音的合成声音。

💡机器学习(Machine Learning)

机器学习是一种使计算机系统利用数据来改进性能的技术。在视频中,机器学习被用于训练聊天机器人和改进语音识别系统。通过分析大量的人类对话数据,机器学习模型可以学习语言的模式和结构,从而提高聊天机器人的自然性和语音识别的准确性。

💡聊天机器人(Chatbots)

聊天机器人是一种计算机程序,能够通过文本或语音与人类进行交流。视频中讨论了聊天机器人的发展历程,从基于规则的系统如ELIZA,到今天使用机器学习技术的更先进的对话系统。现代聊天机器人可以用于客户服务,通过学习现有的对话样本来提高其回复的自然性和准确性。

💡深度神经网络(Deep Neural Networks)

深度神经网络是一种机器学习模型,它模仿人脑的处理方式来分析和学习数据。在视频中,深度神经网络被用于提高语音识别的准确性。通过学习大量的语音数据,这些网络可以识别语音中的复杂模式,并提高将语音转换为文本的能力。

💡频谱图(Spectrogram)

频谱图是一种数据可视化工具,用于显示声音信号的频率随时间的变化。视频中提到,通过频谱图,我们可以更清晰地看到不同声音的频率分布,这有助于计算机识别语音中的不同音素。频谱图通过颜色的亮度来表示不同频率的强度,从而揭示语音的共振模式。

💡知识图谱(Knowledge Graph)

知识图谱是一种结构化的语义知识库,它将实体(如人、地点、事物)以及它们之间的关系进行编码。在视频中,Google的知识图谱被用作一个例子,它包含了大量的事实和关系,这些信息可以用来生成自然语言文本。知识图谱为聊天机器人提供了丰富的背景知识,使其能够生成信息丰富且连贯的回答。

💡语言模型(Language Model)

语言模型是一种统计模型,用于预测一系列单词的概率分布。在视频中,语言模型被用于提高语音识别的准确性。通过考虑单词序列的统计信息,语言模型可以帮助识别器在不确定的情况下选择更可能的单词,比如在“she was”后面更可能是形容词“happy”而不是名词“harpy”。

Highlights

计算机视觉赋予了计算机视觉和理解视觉信息的能力。

本集讨论如何让计算机理解语言,这是自计算机诞生以来的愿望。

自然语言处理(NLP)是计算机科学和语言学的交叉学科。

句子被分解为小块以便于计算机处理,这是NLP早期的基本问题之一。

句子结构规则的开发帮助计算机理解语言的语法。

使用句子结构规则可以构建解析树,揭示句子结构。

计算机通过类似乐高的方式处理语言,能够回答问题和执行命令。

计算机在处理过于复杂的语言时可能会失败,无法正确解析句子或捕捉意图。

计算机使用语义网络生成自然语言文本,特别是当数据以有意义的关系链接时。

早期的聊天机器人是基于规则的,后来发展到基于机器学习的现代方法。

聊天机器人和更先进的对话系统在过去五十年中取得了显著进步。

语音识别是计算机从声音中获取单词的领域,研究已有数十年。

语音识别系统使用深度神经网络,这是目前最准确的技术。

频谱图帮助计算机从声波中识别出不同的频率成分。

语音识别软件通过模式匹配来识别构成单词的声音片段,即音素。

语言模型提高了转录的准确性,它包含有关词序的统计信息。

语音合成是让计算机输出语音的过程,与语音识别相反。

语音用户界面的普及正在创造一个正反馈循环,提高了语音交互的准确性。

许多人预测语音技术将成为与屏幕、键盘、触控板等物理输入输出设备一样常见的交互形式。

Transcripts

play00:03

Hi, I’m Carrie Anne, and welcome to Crash Course Computer Science!

play00:06

Last episode we talked about computer vision – giving computers the ability to see and

play00:09

understand visual information.

play00:11

Today we’re going to talk about how to give computers the ability to understand language.

play00:15

You might argue they’ve always had this capability.

play00:17

Back in Episodes 9 and 12, we talked about machine language instructions, as well as

play00:21

higher-level programming languages.

play00:23

While these certainly meet the definition of a language, they also tend to have small

play00:26

vocabularies and follow highly structured conventions.

play00:30

Code will only compile and run if it’s 100 percent free of spelling and syntactic errors.

play00:35

Of course, this is quite different from human languages – what are called natural languages

play00:38

– containing large, diverse vocabularies, words with several different meanings, speakers

play00:43

with different accents, and all sorts of interesting word play.

play00:46

People also make linguistic faux pas when writing and speaking, like slurring words

play00:51

together, leaving out key details so things are ambiguous, and mispronouncing things.

play00:54

But, for the most part, humans can roll right through these challenges.

play00:58

The skillful use of language is a major part of what makes us human.

play01:01

And for this reason, the desire for computers to understand and speak our language has been

play01:05

around since they were first conceived.

play01:08

This led to the creation of Natural Language Processing, or NLP, an interdisciplinary field

play01:13

combining computer science and linguistics.

play01:15

INTRO

play01:24

There’s an essentially infinite number of ways to arrange words in a sentence.

play01:28

We can’t give computers a dictionary of all possible sentences to help them understand

play01:32

what humans are blabbing on about.

play01:34

So an early and fundamental NLP problem was deconstructing sentences into bite-sized pieces,

play01:39

which could be more easily processed.

play01:41

In school, you learned about nine fundamental types of English words: nouns, pronouns, articles,

play01:46

verbs, adjectives, adverbs, prepositions, conjunctions, and interjections.

play01:50

These are called parts of speech.

play01:52

There are all sorts of subcategories too, like singular vs. plural nouns and superlative

play01:56

vs. comparative adverbs, but we’re not going to get into that.

play01:59

Knowing a word’s type is definitely useful, but unfortunately, there are a lot words that

play02:03

have multiple meanings – like “rose” and “leaves”, which can be used as nouns

play02:07

or verbs.

play02:08

A digital dictionary alone isn’t enough to resolve this ambiguity, so computers also

play02:12

need to know some grammar.

play02:14

For this, phrase structure rules were developed, which encapsulate the grammar of a language.

play02:18

For example, in English there’s a rule that says a sentence can be comprised of a noun

play02:22

phrase followed by a verb phrase.

play02:24

Noun phrases can be an article, like “the”, followed by a noun or they can be an adjective

play02:28

followed by a noun.

play02:30

And you can make rules like this for an entire language.

play02:32

Then, using these rules, it’s fairly easy to construct what’s called a parse tree,

play02:36

which not only tags every word with a likely part of speech, but also reveals how the sentence

play02:40

is constructed.

play02:41

We now know, for example, that the noun focus of this sentence is “the mongols”, and

play02:46

we know it’s about them doing the action of “rising” from something, in this case,

play02:50

“leaves”.

play02:51

These smaller chunks of data allow computers to more easily access, process and respond

play02:55

to information.

play02:56

Equivalent processes are happening every time you do a voice search, like: “where’s

play02:59

the nearest pizza”.

play03:01

The computer can recognize that this is a “where” question, knows you want the noun

play03:04

“pizza”, and the dimension you care about is “nearest”.

play03:07

The same process applies to “what is the biggest giraffe?” or “who sang thriller?”

play03:12

By treating language almost like lego, computers can be quite adept at natural language tasks.

play03:17

They can answer questions and also process commands, like “set an alarm for 2:20”

play03:21

or “play T-Swizzle on spotify”.

play03:23

But, as you’ve probably experienced, they fail when you start getting too fancy, and

play03:27

they can no longer parse the sentence correctly, or capture your intent.

play03:31

Hey Siri... methinks the mongols doth roam too much, what think ye on this most gentle

play03:37

mid-summer’s day?

play03:38

Siri: I’m not sure I got that.

play03:40

I should also note that phrase structure rules, and similar methods that codify language,

play03:44

can be used by computers to generate natural language text.

play03:47

This works particularly well when data is stored in a web of semantic information, where

play03:52

entities are linked to one another in meaningful relationships, providing all the ingredients

play03:56

you need to craft informational sentences.

play03:59

Siri: Thriller was released in 1983 and sung by Michael Jackson

play04:02

Google’s version of this is called Knowledge Graph.

play04:05

At the end of 2016, it contained roughly seventy billion facts about, and relationships between,

play04:11

different entities.

play04:12

These two processes, parsing and generating text, are fundamental components of natural

play04:17

language chatbots - computer programs that chat with you.

play04:19

Early chatbots were primarily rule-based, where experts would encode hundreds of rules

play04:23

mapping what a user might say, to how a program should reply.

play04:27

Obviously this was unwieldy to maintain and limited the possible sophistication.

play04:30

A famous early example was ELIZA, created in the mid-1960s at MIT.

play04:35

This was a chatbot that took on the role of a therapist, and used basic syntactic rules

play04:39

to identify content in written exchanges, which it would turn around and ask the user

play04:44

about.

play04:44

Sometimes, it felt very much like human-human communication, but other times it would make

play04:49

simple and even comical mistakes.

play04:50

Chatbots, and more advanced dialog systems, have come a long way in the last fifty years,

play04:55

and can be quite convincing today!

play04:57

Modern approaches are based on machine learning, where gigabytes of real human-to-human chats

play05:01

are used to train chatbots.

play05:02

Today, the technology is finding use in customer service applications, where there’s already

play05:07

heaps of example conversations to learn from.

play05:09

People have also been getting chatbots to talk with one another, and in a Facebook experiment,

play05:14

chatbots even started to evolve their own language.

play05:17

This experiment got a bunch of scary-sounding press, but it was just the computers crafting

play05:21

a simplified protocol to negotiate with one another.

play05:24

It wasn’t evil, it’s was efficient.

play05:26

But what about if something is spoken – how does a computer get words from the sound?

play05:30

That’s the domain of speech recognition, which has been the focus of research for many

play05:34

decades.

play05:35

Bell Labs debuted the first speech recognition system in 1952, nicknamed Audrey – the automatic

play05:41

digit recognizer.

play05:42

It could recognize all ten numerical digits, if you said them slowly enough.

play05:46

5…

play05:48

9…

play05:49

7?

play05:50

The project didn’t go anywhere because it was much faster to enter telephone numbers

play05:54

with a finger.

play05:55

Ten years later, at the 1962 World's Fair, IBM demonstrated a shoebox-sized machine capable

play06:00

of recognizing sixteen words.

play06:02

To boost research in the area, DARPA kicked off an ambitious five-year funding initiative

play06:06

in 1971, which led to the development of Harpy at Carnegie Mellon University.

play06:11

Harpy was the first system to recognize over a thousand words.

play06:15

But, on computers of the era, transcription was often ten or more times slower than the

play06:19

rate of natural speech.

play06:20

Fortunately, thanks to huge advances in computing performance in the 1980s and 90s, continuous,

play06:25

real-time speech recognition became practical.

play06:27

There was simultaneous innovation in the algorithms for processing natural language, moving from

play06:32

hand-crafted rules, to machine learning techniques that could learn automatically from existing

play06:36

datasets of human language.

play06:38

Today, the speech recognition systems with the best accuracy are using deep neural networks,

play06:43

which we touched on in Episode 34.

play06:45

To get a sense of how these techniques work, let’s look at some speech, specifically,

play06:49

the acoustic signal.

play06:50

Let’s start by looking at vowel sounds, like aaaaa…and Eeeeeee.

play06:54

These are the waveforms of those two sounds, as captured by a computer’s microphone.

play06:58

As we discussed in Episode 21 – on Files and File Formats – this signal is the magnitude

play07:02

of displacement, of a diaphragm inside of a microphone, as sound waves cause it to oscillate.

play07:07

In this view of sound data, the horizontal axis is time, and the vertical axis is the

play07:12

magnitude of displacement, or amplitude.

play07:14

Although we can see there are differences between the waveforms, it’s not super obvious

play07:18

what you would point at to say, “ah ha! this is definitely an eeee sound”.

play07:22

To really make this pop out, we need to view the data in a totally different way: a spectrogram.

play07:27

In this view of the data, we still have time along the horizontal axis, but now instead

play07:31

of amplitude on the vertical axis, we plot the magnitude of the different frequencies

play07:35

that make up each sound.

play07:37

The brighter the color, the louder that frequency component.

play07:40

This conversion from waveform to frequencies is done with a very cool algorithm called

play07:44

a Fast Fourier Transform.

play07:46

If you’ve ever stared at a stereo system’s EQ visualizer, it’s pretty much the same

play07:50

thing.

play07:51

A spectrogram is plotting that information over time.

play07:54

You might have noticed that the signals have a sort of ribbed pattern to them – that’s

play07:57

all the resonances of my vocal tract.

play08:00

To make different sounds, I squeeze my vocal chords, mouth and tongue into different shapes,

play08:04

which amplifies or dampens different resonances.

play08:06

We can see this in the signal, with areas that are brighter, and areas that are darker.

play08:10

If we work our way up from the bottom, labeling where we see peaks in the spectrum – what

play08:14

are called formants – we can see the two sounds have quite different arrangements.

play08:18

And this is true for all vowel sounds.

play08:20

It’s exactly this type of information that lets computers recognize spoken vowels, and

play08:24

indeed, whole words.

play08:25

Let’s see a more complicated example, like when I say: “she.. was.. happy”

play08:31

We can see our “eee” sound here, and “aaa” sound here.

play08:34

We can also see a bunch of other distinctive sounds, like the “shh” sound in “she”,

play08:38

the “wah” and “sss” in “was”, and so on.

play08:40

These sound pieces, that make up words, are called phonemes.

play08:43

Speech recognition software knows what all these phonemes look like.

play08:46

In English, there are roughly forty-four, so it mostly boils down to fancy pattern matching.

play08:51

Then you have to separate words from one another, figure out when sentences begin and end...

play08:55

and ultimately, you end up with speech converted into text, allowing for techniques like we

play08:59

discussed at the beginning of the episode.

play09:02

Because people say words in slightly different ways, due to things like accents and mispronunciations,

play09:06

transcription accuracy is greatly improved when combined with a language model, which

play09:11

contains statistics about sequences of words.

play09:13

For example “she was” is most likely to be followed by an adjective, like “happy”.

play09:17

It’s uncommon for “she was” to be followed immediately by a noun.

play09:21

So if the speech recognizer was unsure between, “happy” and “harpy”, it’d pick “happy”,

play09:26

since the language model would report that as a more likely choice.

play09:29

Finally, we need to talk about Speech Synthesis, that is, giving computers the ability to output

play09:34

speech.

play09:35

This is very much like speech recognition, but in reverse.

play09:38

We can take a sentence of text, and break it down into its phonetic components, and

play09:42

then play those sounds back to back, out of a computer speaker.

play09:45

You can hear this chaining of phonemes very clearly with older speech synthesis technologies,

play09:50

like this 1937, hand-operated machine from Bell Labs.

play09:53

Say, "she saw me" with no expression.

play09:56

She saw me.

play09:59

Now say it in answer to these questions.

play10:01

Who saw you?

play10:02

She saw me.

play10:04

Who did she see?

play10:05

She saw me.

play10:07

Did she see you or hear you?

play10:09

She saw me.

play10:11

By the 1980s, this had improved a lot, but that discontinuous and awkward blending of

play10:15

phonemes still created that signature, robotic sound.

play10:18

Thriller was released in 1983 and sung by Michael Jackson.

play10:23

Today, synthesized computer voices, like Siri, Cortana and Alexa, have gotten much better,

play10:27

but they’re still not quite human.

play10:29

But we’re soo soo close, and it’s likely to be a solved problem pretty soon.

play10:33

Especially because we’re now seeing an explosion of voice user interfaces on our phones, in

play10:37

our cars and homes, and maybe soon, plugged right into our ears.

play10:41

This ubiquity is creating a positive feedback loop, where people are using voice interaction

play10:45

more often, which in turn, is giving companies like Google, Amazon and Microsoft more data

play10:50

to train their systems on...

play10:52

Which is enabling better accuracy, which is leading to people using voice more, which

play10:56

is enabling even better accuracy… and the loop continues!

play10:59

Many predict that speech technologies will become as common a form of interaction as

play11:03

screens, keyboards, trackpads and other physical input-output devices that we use today.

play11:07

That’s particularly good news for robots, who don’t want to have to walk around with

play11:11

keyboards in order to communicate with humans.

play11:13

But, we’ll talk more about them next week.

play11:16

See you then.

Rate This

5.0 / 5 (0 votes)

Related Tags
自然语言处理计算机科学语言理解机器学习语音识别语音合成人工智能语言模型深度学习聊天机器人技术发展
Do you need a summary in English?