How we teach computers to understand pictures | Fei Fei Li

TED
23 Mar 201518:02

Summary

TLDRFei-Fei Li discusses advancements in computer vision and artificial intelligence, highlighting the challenges of teaching machines to interpret visual information like humans. Through her work with Stanford's Vision Lab and the ImageNet project, Li illustrates how vast data sets help train computers to recognize objects, generate sentences, and understand complex visual scenes. Despite progress, machines still struggle with deeper comprehension. Li envisions a future where computers assist in healthcare, safety, and exploration, emphasizing the potential of AI to improve human life by augmenting our ability to see and understand the world.

Takeaways

  • 👶 A three-year-old child can easily describe what they see in photos, demonstrating how natural it is for humans to interpret visual information.
  • 🧠 Despite technological advancements, computers still struggle to interpret visual data in the way humans do because they lack true understanding.
  • 🚗 Computer vision is essential for applications like self-driving cars, which need to differentiate between various objects to function safely.
  • 👁️ Vision is not just about the eyes but involves complex brain processing, which has evolved over millions of years.
  • 🔬 Fei-Fei Li's research at Stanford's Vision Lab focuses on teaching computers to see and understand like humans through computer vision and machine learning.
  • 🐱 Simple object recognition for computers is challenging due to the infinite variations in appearance, positioning, and context of objects like cats.
  • 📊 The ImageNet project, launched in 2007, created a massive dataset of labeled images to help train computer vision algorithms, drawing on millions of images sourced from the internet.
  • 💡 The combination of big data (ImageNet) and convolutional neural networks (a type of machine learning algorithm) has led to significant progress in object recognition.
  • 🧩 Computer vision algorithms have evolved from recognizing individual objects to generating human-like sentences that describe entire scenes.
  • 🤖 Although there have been advancements, current AI still struggles with more nuanced understanding, like context, emotions, or cultural significance in images.

Q & A

  • What is the main task that a three-year-old child is an expert at, according to Fei-Fei Li?

    -A three-year-old child is an expert at making sense of what they see, describing the world based on visual perception.

  • What is the current limitation of advanced machines and computers, despite technological progress?

    -Despite technological progress, advanced machines and computers still struggle with understanding and interpreting visual information like humans do.

  • Why is it difficult for computers to interpret visual information, such as distinguishing a crumpled paper bag from a rock on the road?

    -It's difficult because computers do not naturally understand the meaning behind visual data. Cameras capture pixels, but those pixels lack the semantic meaning needed to interpret complex situations accurately.

  • How does Fei-Fei Li's research aim to improve computer vision?

    -Fei-Fei Li's research aims to teach computers to see and understand visual information by leveraging large datasets and machine learning algorithms, similar to how a child learns from real-world experiences.

  • What was the significance of the ImageNet project in advancing computer vision?

    -The ImageNet project provided an extensive dataset of 15 million labeled images, enabling computers to learn from a vast range of visual examples and significantly improving the accuracy of object recognition algorithms.

  • Why did Fei-Fei Li emphasize the importance of providing computers with 'training data' similar to what a child experiences?

    -She emphasized that instead of focusing solely on improving algorithms, it's crucial to expose computers to large quantities of real-world examples, just like a child who learns by seeing millions of images throughout early development.

  • What role did convolutional neural networks play in advancing computer vision?

    -Convolutional neural networks, which mimic the structure of the human brain with layers of interconnected neurons, became a breakthrough architecture in computer vision, enabling better object recognition when trained with the massive data from ImageNet.

  • What limitations still exist in current computer vision systems, as demonstrated in the TED talk?

    -Current computer vision systems still make mistakes, such as confusing objects like a toothbrush for a baseball bat or misinterpreting artistic images. These limitations show that computers are far from understanding the world with the nuance and depth of human perception.

  • How does Fei-Fei Li envision the future of visual intelligence in machines?

    -She envisions a future where machines collaborate with humans, assisting in tasks like diagnosing patients, navigating disaster zones, and discovering new materials. Machines with visual intelligence will enhance human capabilities in ways previously unimaginable.

  • What example does Fei-Fei Li give to illustrate the deeper understanding that computers currently lack in visual perception?

    -She gives the example of her son Leo's birthday cake picture. While a computer can identify objects like 'a person and a cake,' it lacks the deeper context—such as knowing the cake is an Italian Easter cake or understanding the boy's emotional connection to his shirt, which was a gift.

Outlines

00:00

👀 The Challenge of Teaching Computers to See

The paragraph introduces the concept of computer vision as a frontier in computer science, comparing the human ability to make sense of visual information with the struggle that advanced machines face in performing the same task. It highlights the importance of computer vision in various applications such as self-driving cars, environmental monitoring, and security but points out the limitations in current technology. The speaker, Fei-Fei Li, gives an overview of her research journey in computer vision and machine learning, emphasizing the need to teach computers to see objects as a foundational step towards achieving artificial intelligence that can understand and interpret visual data like humans.

05:04

🐱 The Complexity of Object Recognition

This paragraph delves into the complexity of recognizing objects, using the example of a cat to illustrate the challenge. It discusses how early attempts to model objects were simplistic and failed to account for the vast variations in appearance and perspective. The speaker then shares a pivotal realization that children learn to see through experience and exposure to a vast number of real-world examples. This insight led to the creation of the ImageNet project, which aimed to amass a large dataset of labeled images to train computer vision algorithms. The project's success in collecting and labeling millions of images is detailed, along with the challenges faced in securing funding and support for this novel approach.

10:07

🧠 The Neural Network Revolution in Computer Vision

The paragraph explains the architecture of neural networks, drawing an analogy between the brain's neurons and the nodes in a neural network. It discusses how these networks are organized in layers, similar to the brain's structure, and how they are trained using massive datasets like ImageNet. The paragraph highlights the breakthroughs in object recognition that were achieved through the use of convolutional neural networks (CNNs), which were fed by the extensive data from ImageNet. The speaker describes how these algorithms can now identify objects in images with a high degree of accuracy and even generate descriptions of scenes, marking a significant advancement in computer vision.

15:08

🚀 Advancing from Object Recognition to Scene Understanding

In this paragraph, the speaker discusses the next steps in computer vision: teaching computers not just to recognize objects but to understand the context and narrative of a scene. The paragraph describes the integration of visual data with natural language processing to generate descriptive sentences about images. The speaker shares examples of the computer's progress, including both its successes and its humorous mistakes. The paragraph concludes with a vision for the future where computers with visual intelligence can collaborate with humans, enhancing our capabilities in various fields such as medicine, transportation, and exploration.

Mindmap

Keywords

💡Computer Vision

Computer vision is a field of artificial intelligence that enables computers to interpret and understand the visual world. In the video, it is central to the theme as the speaker discusses the challenges and advancements in teaching computers to see and make sense of images, much like humans do. The script mentions how computer vision is crucial for applications such as self-driving cars and drones, but also highlights the difficulties in achieving accurate object recognition.

💡Machine Learning

Machine learning is a subset of artificial intelligence that allows computers to learn from data and improve their performance over time without being explicitly programmed. The video emphasizes the role of machine learning in advancing computer vision, particularly through the use of large datasets and algorithms like convolutional neural networks. The speaker's work with ImageNet is a prime example of how machine learning is applied to teach computers to recognize objects in images.

💡ImageNet

ImageNet is a large visual database designed for use in visual object recognition software research. The video describes the ImageNet project, which the speaker co-founded, as a significant step in computer vision. It involved collecting and labeling millions of images to create a comprehensive dataset that could be used to train machine learning models to recognize a vast array of objects.

💡Convolutional Neural Networks (CNNs)

CNNs are a class of deep learning algorithms particularly good at analyzing visual imagery. They are mentioned in the video as the winning architecture for object recognition tasks, thanks to their ability to process hierarchical layers of visual information. The speaker explains how CNNs, when trained on the massive ImageNet dataset, have achieved remarkable results in recognizing and classifying objects in images.

💡Big Data

Big data refers to the large volume of structured and unstructured data that is too complex to be processed by traditional data management tools. In the context of the video, big data is crucial for training computer vision models. The speaker discusses how the ImageNet project leveraged big data by collecting nearly a billion images from the internet to create a dataset that could effectively train machine learning algorithms.

💡Object Recognition

Object recognition is the ability of a computer to identify and classify objects in an image or video. It is a key aspect of computer vision and is central to the video's narrative. The speaker discusses the evolution of object recognition from simple shape-based models to sophisticated algorithms capable of recognizing objects in various poses and contexts, as demonstrated by the ImageNet dataset.

💡Neural Networks

Neural networks are computing systems inspired by the human brain that are used in machine learning applications. The video explains how neural networks, with their interconnected nodes organized in layers, mimic the structure of the human brain to process information. They are fundamental to the development of computer vision systems, as they enable the recognition and understanding of complex patterns in visual data.

💡Deep Learning

Deep learning is a subset of machine learning that focuses on neural networks with many layers, or 'deep' architectures. The video touches on deep learning as the driving force behind the success of CNNs in computer vision tasks. Deep learning allows for the creation of models that can learn increasingly complex patterns from large datasets, such as ImageNet, leading to significant advancements in object recognition.

💡Algorithms

Algorithms are step-by-step procedures for calculations, typically used in computing. In the video, algorithms are discussed as the mathematical models that drive machine learning and computer vision. The speaker's work involves designing and refining algorithms to enable computers to learn from data and improve their ability to recognize and understand visual information.

💡Artificial Intelligence (AI)

AI refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. The video is centered around AI, particularly in the context of computer vision and machine learning. The speaker's quest to give computers visual intelligence is a prime example of AI development, aiming to enhance machines' ability to process and understand visual information on par with human capabilities.

💡Data Annotation

Data annotation is the process of labeling data with information that describes or categorizes it. In the video, data annotation is highlighted as a critical step in the ImageNet project, where nearly a billion images were labeled by crowdsourced workers. This process is essential for creating datasets that can effectively train machine learning models in computer vision tasks.

Highlights

A three-year-old child describes images, highlighting the innate ability of humans to make sense of visual information.

Despite technological advancements, machines still struggle with basic visual understanding tasks that even young children can perform.

Fei-Fei Li discusses the challenges of computer vision, emphasizing the complexity of visual processing in machines compared to human brains.

Computer vision involves teaching machines to recognize objects, people, and understand relationships, emotions, and actions from visual data.

The ImageNet project was launched in 2007 to provide a large dataset of labeled images to improve machine learning algorithms for object recognition.

ImageNet collected nearly a billion images from the internet and used crowdsourcing to label these images, creating one of the largest datasets of its kind.

The convolutional neural network (CNN), an algorithm inspired by the human brain, became a successful approach for object recognition when combined with ImageNet data.

CNNs consist of millions of interconnected nodes organized in hierarchical layers, which process visual data similarly to how the human brain functions.

The ImageNet dataset allowed CNNs to achieve remarkable results in identifying objects in images, leading to significant advances in computer vision.

Computer vision models can now generate sentences that describe images, showing progress towards integrating vision and language in machines.

Despite advances, current computer vision models still make mistakes, such as misidentifying objects due to insufficient or biased training data.

The next step in computer vision is to move beyond object recognition to understanding context, stories, and complex scenes as humans do.

Fei-Fei Li envisions a future where machines with visual intelligence assist in healthcare, transportation, disaster response, and exploration.

The goal of computer vision is not just to create intelligent machines, but to collaborate with them to enhance human capabilities and explore new possibilities.

Fei-Fei Li emphasizes her personal motivation to advance computer vision: to create a better future for the next generation, represented by her son Leo.

Transcripts

play00:14

Let me show you something.

play00:18

(Video) Girl: Okay, that's a cat sitting in a bed.

play00:22

The boy is petting the elephant.

play00:26

Those are people that are going on an airplane.

play00:30

That's a big airplane.

play00:33

Fei-Fei Li: This is a three-year-old child

play00:35

describing what she sees in a series of photos.

play00:39

She might still have a lot to learn about this world,

play00:42

but she's already an expert at one very important task:

play00:46

to make sense of what she sees.

play00:50

Our society is more technologically advanced than ever.

play00:54

We send people to the moon, we make phones that talk to us

play00:58

or customize radio stations that can play only music we like.

play01:03

Yet, our most advanced machines and computers

play01:07

still struggle at this task.

play01:09

So I'm here today to give you a progress report

play01:13

on the latest advances in our research in computer vision,

play01:17

one of the most frontier and potentially revolutionary

play01:21

technologies in computer science.

play01:24

Yes, we have prototyped cars that can drive by themselves,

play01:29

but without smart vision, they cannot really tell the difference

play01:33

between a crumpled paper bag on the road, which can be run over,

play01:37

and a rock that size, which should be avoided.

play01:41

We have made fabulous megapixel cameras,

play01:44

but we have not delivered sight to the blind.

play01:48

Drones can fly over massive land,

play01:51

but don't have enough vision technology

play01:53

to help us to track the changes of the rainforests.

play01:57

Security cameras are everywhere,

play02:00

but they do not alert us when a child is drowning in a swimming pool.

play02:06

Photos and videos are becoming an integral part of global life.

play02:11

They're being generated at a pace that's far beyond what any human,

play02:15

or teams of humans, could hope to view,

play02:18

and you and I are contributing to that at this TED.

play02:22

Yet our most advanced software is still struggling at understanding

play02:27

and managing this enormous content.

play02:31

So in other words, collectively as a society,

play02:36

we're very much blind,

play02:38

because our smartest machines are still blind.

play02:43

"Why is this so hard?" you may ask.

play02:46

Cameras can take pictures like this one

play02:49

by converting lights into a two-dimensional array of numbers

play02:53

known as pixels,

play02:54

but these are just lifeless numbers.

play02:57

They do not carry meaning in themselves.

play03:00

Just like to hear is not the same as to listen,

play03:04

to take pictures is not the same as to see,

play03:08

and by seeing, we really mean understanding.

play03:13

In fact, it took Mother Nature 540 million years of hard work

play03:19

to do this task,

play03:21

and much of that effort

play03:23

went into developing the visual processing apparatus of our brains,

play03:28

not the eyes themselves.

play03:31

So vision begins with the eyes,

play03:33

but it truly takes place in the brain.

play03:38

So for 15 years now, starting from my Ph.D. at Caltech

play03:43

and then leading Stanford's Vision Lab,

play03:46

I've been working with my mentors, collaborators and students

play03:50

to teach computers to see.

play03:54

Our research field is called computer vision and machine learning.

play03:57

It's part of the general field of artificial intelligence.

play04:03

So ultimately, we want to teach the machines to see just like we do:

play04:08

naming objects, identifying people, inferring 3D geometry of things,

play04:13

understanding relations, emotions, actions and intentions.

play04:19

You and I weave together entire stories of people, places and things

play04:25

the moment we lay our gaze on them.

play04:28

The first step towards this goal is to teach a computer to see objects,

play04:34

the building block of the visual world.

play04:37

In its simplest terms, imagine this teaching process

play04:42

as showing the computers some training images

play04:45

of a particular object, let's say cats,

play04:48

and designing a model that learns from these training images.

play04:53

How hard can this be?

play04:55

After all, a cat is just a collection of shapes and colors,

play04:59

and this is what we did in the early days of object modeling.

play05:03

We'd tell the computer algorithm in a mathematical language

play05:07

that a cat has a round face, a chubby body,

play05:10

two pointy ears, and a long tail,

play05:12

and that looked all fine.

play05:14

But what about this cat?

play05:16

(Laughter)

play05:18

It's all curled up.

play05:19

Now you have to add another shape and viewpoint to the object model.

play05:24

But what if cats are hidden?

play05:27

What about these silly cats?

play05:31

Now you get my point.

play05:33

Even something as simple as a household pet

play05:36

can present an infinite number of variations to the object model,

play05:41

and that's just one object.

play05:44

So about eight years ago,

play05:47

a very simple and profound observation changed my thinking.

play05:53

No one tells a child how to see,

play05:56

especially in the early years.

play05:58

They learn this through real-world experiences and examples.

play06:03

If you consider a child's eyes

play06:06

as a pair of biological cameras,

play06:08

they take one picture about every 200 milliseconds,

play06:12

the average time an eye movement is made.

play06:15

So by age three, a child would have seen hundreds of millions of pictures

play06:21

of the real world.

play06:23

That's a lot of training examples.

play06:26

So instead of focusing solely on better and better algorithms,

play06:32

my insight was to give the algorithms the kind of training data

play06:37

that a child was given through experiences

play06:40

in both quantity and quality.

play06:44

Once we know this,

play06:46

we knew we needed to collect a data set

play06:49

that has far more images than we have ever had before,

play06:54

perhaps thousands of times more,

play06:56

and together with Professor Kai Li at Princeton University,

play07:00

we launched the ImageNet project in 2007.

play07:05

Luckily, we didn't have to mount a camera on our head

play07:09

and wait for many years.

play07:11

We went to the Internet,

play07:12

the biggest treasure trove of pictures that humans have ever created.

play07:17

We downloaded nearly a billion images

play07:20

and used crowdsourcing technology like the Amazon Mechanical Turk platform

play07:25

to help us to label these images.

play07:28

At its peak, ImageNet was one of the biggest employers

play07:33

of the Amazon Mechanical Turk workers:

play07:36

together, almost 50,000 workers

play07:40

from 167 countries around the world

play07:44

helped us to clean, sort and label

play07:48

nearly a billion candidate images.

play07:52

That was how much effort it took

play07:55

to capture even a fraction of the imagery

play07:59

a child's mind takes in in the early developmental years.

play08:04

In hindsight, this idea of using big data

play08:08

to train computer algorithms may seem obvious now,

play08:12

but back in 2007, it was not so obvious.

play08:16

We were fairly alone on this journey for quite a while.

play08:20

Some very friendly colleagues advised me to do something more useful for my tenure,

play08:25

and we were constantly struggling for research funding.

play08:29

Once, I even joked to my graduate students

play08:32

that I would just reopen my dry cleaner's shop to fund ImageNet.

play08:36

After all, that's how I funded my college years.

play08:41

So we carried on.

play08:43

In 2009, the ImageNet project delivered

play08:46

a database of 15 million images

play08:50

across 22,000 classes of objects and things

play08:55

organized by everyday English words.

play08:58

In both quantity and quality,

play09:01

this was an unprecedented scale.

play09:04

As an example, in the case of cats,

play09:08

we have more than 62,000 cats

play09:11

of all kinds of looks and poses

play09:15

and across all species of domestic and wild cats.

play09:20

We were thrilled to have put together ImageNet,

play09:23

and we wanted the whole research world to benefit from it,

play09:27

so in the TED fashion, we opened up the entire data set

play09:31

to the worldwide research community for free.

play09:36

(Applause)

play09:41

Now that we have the data to nourish our computer brain,

play09:45

we're ready to come back to the algorithms themselves.

play09:49

As it turned out, the wealth of information provided by ImageNet

play09:54

was a perfect match to a particular class of machine learning algorithms

play09:59

called convolutional neural network,

play10:02

pioneered by Kunihiko Fukushima, Geoff Hinton, and Yann LeCun

play10:07

back in the 1970s and '80s.

play10:10

Just like the brain consists of billions of highly connected neurons,

play10:16

a basic operating unit in a neural network

play10:20

is a neuron-like node.

play10:22

It takes input from other nodes

play10:25

and sends output to others.

play10:28

Moreover, these hundreds of thousands or even millions of nodes

play10:32

are organized in hierarchical layers,

play10:36

also similar to the brain.

play10:38

In a typical neural network we use to train our object recognition model,

play10:43

it has 24 million nodes,

play10:46

140 million parameters,

play10:49

and 15 billion connections.

play10:52

That's an enormous model.

play10:55

Powered by the massive data from ImageNet

play10:58

and the modern CPUs and GPUs to train such a humongous model,

play11:04

the convolutional neural network

play11:06

blossomed in a way that no one expected.

play11:10

It became the winning architecture

play11:12

to generate exciting new results in object recognition.

play11:18

This is a computer telling us

play11:20

this picture contains a cat

play11:23

and where the cat is.

play11:25

Of course there are more things than cats,

play11:27

so here's a computer algorithm telling us

play11:29

the picture contains a boy and a teddy bear;

play11:32

a dog, a person, and a small kite in the background;

play11:37

or a picture of very busy things

play11:40

like a man, a skateboard, railings, a lampost, and so on.

play11:45

Sometimes, when the computer is not so confident about what it sees,

play11:51

we have taught it to be smart enough

play11:53

to give us a safe answer instead of committing too much,

play11:57

just like we would do,

play12:00

but other times our computer algorithm is remarkable at telling us

play12:05

what exactly the objects are,

play12:07

like the make, model, year of the cars.

play12:10

We applied this algorithm to millions of Google Street View images

play12:16

across hundreds of American cities,

play12:19

and we have learned something really interesting:

play12:22

first, it confirmed our common wisdom

play12:25

that car prices correlate very well

play12:28

with household incomes.

play12:31

But surprisingly, car prices also correlate well

play12:35

with crime rates in cities,

play12:39

or voting patterns by zip codes.

play12:44

So wait a minute. Is that it?

play12:46

Has the computer already matched or even surpassed human capabilities?

play12:51

Not so fast.

play12:53

So far, we have just taught the computer to see objects.

play12:58

This is like a small child learning to utter a few nouns.

play13:03

It's an incredible accomplishment,

play13:05

but it's only the first step.

play13:08

Soon, another developmental milestone will be hit,

play13:12

and children begin to communicate in sentences.

play13:15

So instead of saying this is a cat in the picture,

play13:19

you already heard the little girl telling us this is a cat lying on a bed.

play13:24

So to teach a computer to see a picture and generate sentences,

play13:30

the marriage between big data and machine learning algorithm

play13:34

has to take another step.

play13:36

Now, the computer has to learn from both pictures

play13:40

as well as natural language sentences

play13:43

generated by humans.

play13:47

Just like the brain integrates vision and language,

play13:50

we developed a model that connects parts of visual things

play13:56

like visual snippets

play13:58

with words and phrases in sentences.

play14:02

About four months ago,

play14:04

we finally tied all this together

play14:07

and produced one of the first computer vision models

play14:11

that is capable of generating a human-like sentence

play14:15

when it sees a picture for the first time.

play14:18

Now, I'm ready to show you what the computer says

play14:23

when it sees the picture

play14:25

that the little girl saw at the beginning of this talk.

play14:31

(Video) Computer: A man is standing next to an elephant.

play14:36

A large airplane sitting on top of an airport runway.

play14:41

FFL: Of course, we're still working hard to improve our algorithms,

play14:45

and it still has a lot to learn.

play14:47

(Applause)

play14:51

And the computer still makes mistakes.

play14:54

(Video) Computer: A cat lying on a bed in a blanket.

play14:58

FFL: So of course, when it sees too many cats,

play15:00

it thinks everything might look like a cat.

play15:05

(Video) Computer: A young boy is holding a baseball bat.

play15:08

(Laughter)

play15:09

FFL: Or, if it hasn't seen a toothbrush, it confuses it with a baseball bat.

play15:15

(Video) Computer: A man riding a horse down a street next to a building.

play15:18

(Laughter)

play15:20

FFL: We haven't taught Art 101 to the computers.

play15:25

(Video) Computer: A zebra standing in a field of grass.

play15:28

FFL: And it hasn't learned to appreciate the stunning beauty of nature

play15:32

like you and I do.

play15:34

So it has been a long journey.

play15:37

To get from age zero to three was hard.

play15:41

The real challenge is to go from three to 13 and far beyond.

play15:47

Let me remind you with this picture of the boy and the cake again.

play15:51

So far, we have taught the computer to see objects

play15:55

or even tell us a simple story when seeing a picture.

play15:59

(Video) Computer: A person sitting at a table with a cake.

play16:03

FFL: But there's so much more to this picture

play16:06

than just a person and a cake.

play16:08

What the computer doesn't see is that this is a special Italian cake

play16:12

that's only served during Easter time.

play16:16

The boy is wearing his favorite t-shirt

play16:19

given to him as a gift by his father after a trip to Sydney,

play16:23

and you and I can all tell how happy he is

play16:27

and what's exactly on his mind at that moment.

play16:31

This is my son Leo.

play16:34

On my quest for visual intelligence,

play16:36

I think of Leo constantly

play16:39

and the future world he will live in.

play16:42

When machines can see,

play16:44

doctors and nurses will have extra pairs of tireless eyes

play16:48

to help them to diagnose and take care of patients.

play16:53

Cars will run smarter and safer on the road.

play16:57

Robots, not just humans,

play17:00

will help us to brave the disaster zones to save the trapped and wounded.

play17:05

We will discover new species, better materials,

play17:09

and explore unseen frontiers with the help of the machines.

play17:15

Little by little, we're giving sight to the machines.

play17:19

First, we teach them to see.

play17:22

Then, they help us to see better.

play17:24

For the first time, human eyes won't be the only ones

play17:29

pondering and exploring our world.

play17:31

We will not only use the machines for their intelligence,

play17:35

we will also collaborate with them in ways that we cannot even imagine.

play17:41

This is my quest:

play17:43

to give computers visual intelligence

play17:46

and to create a better future for Leo and for the world.

play17:51

Thank you.

play17:53

(Applause)

Rate This

5.0 / 5 (0 votes)

相关标签
Computer VisionAI TechnologyMachine LearningVisual IntelligenceImageNet ProjectArtificial IntelligenceFuture InnovationHuman CollaborationTech ResearchFei-Fei Li
您是否需要英文摘要?