Geoffrey Hinton in conversation with Fei-Fei Li — Responsible AI development
Summary
TLDR在多伦多的MaRS Discovery District举行的一场激动人心的对话中,多伦多大学的Geoffrey Hinton教授和斯坦福大学的Fei-Fei Li教授共同探讨了人工智能(AI)的未来及其对社会的影响。作为深度学习领域的先驱,Hinton和Li分享了他们对于AI技术快速发展的看法,特别是聊天机器人技术如何在短时间内成为公众关注的焦点。他们深入讨论了AI在教育、医疗、农业等领域的应用潜力,同时也对AI带来的风险,包括就业、假信息传播、自动化武器等问题进行了分析。最后,他们呼吁对AI进行负责任的开发和应用,以确保技术进步能够造福全人类。
Takeaways
- 🧠 Geoffrey Hinton和Fei-Fei Li讨论了人工智能的当前状态和未来展望。
- 🔍 他们讨论了大型语言模型(LLM)和基础模型的潜力和挑战。
- 🌐 Fei-Fei Li强调了人文主义在人工智能发展中的重要性,呼吁在技术创新中考虑人类福祉。
- 🤖 Geoffrey Hinton表达了对超智能带来的潜在风险的担忧,特别是它可能对人类构成的威胁。
- 🏛️ 他们讨论了公共部门和学术界在人工智能研究和开发中的作用,以及私营部门的巨大影响力。
- 📚 强调了数据的重要性,以及大数据如何推动深度学习和人工智能的进步。
- 💡 讨论了人工智能在解决全球性问题(如健康、教育和气候变化)中的应用潜力。
- 🤝 强调了跨学科合作的重要性,以及如何通过结合多个领域的专长来克服人工智能领域的挑战。
- 📈 他们探讨了未来人工智能技术的发展趋势,以及如何平衡创新与伦理和社会责任。
- 🚀 讨论了启发年轻一代科学家和工程师追求在人工智能领域的大胆和有意义的工作的重要性。
Q & A
Meric Gertler 在致辞中提到了哪些土地的传统守护者?
-Meric Gertler 提到了 Huron-Wendat、Seneca 和 Mississaugas of the Credit 作为多伦多大学所在地的传统守护者。
Geoffrey Hinton 被誉为什么?
-Geoffrey Hinton 被誉为深度学习的教父。
Fei-Fei Li 在斯坦福大学担任什么角色?
-Fei-Fei Li 是计算机科学的首席 Sequoia 教授,并且是人类中心 AI 研究所的联合主任。
Schwartz Reisman 创新园区的建立是由谁使之成为可能?
-Schwartz Reisman 创新园区的建立是由 Heather Reisman 和 Gerry Schwartz 的慷慨和远见使之成为可能。
ImageNet 数据集是由谁创立的?
-ImageNet 数据集是由 Fei-Fei Li 创立的。
AlexNet ImageNet 挑战的背后主要贡献者是谁?
-AlexNet ImageNet 挑战的背后主要贡献者包括 Ilya Sutskever、Alex Krizhevsky 和 Geoffrey Hinton。
ImageNet 挑战对计算机视觉领域产生了什么影响?
-ImageNet 挑战极大地推动了计算机视觉领域的发展,证明了深度神经网络在图像识别任务中的超凡效果。
2012年 ImageNet 挑战的胜利如何影响了神经网络的接受度?
-2012年 ImageNet 挑战的胜利显著提高了神经网络的接受度,让以前批评和怀疑神经网络的研究人员开始采用这种技术。
Fei-Fei Li 的书《我所见的世界》主要讲述了什么?
-Fei-Fei Li 的书《我所见的世界》主要讲述了她对于大数据的理解、深度学习的潜力以及人工智能技术的道德责任。
Geoffrey Hinton 对于人工智能未来的看法是怎样的?
-Geoffrey Hinton 对于人工智能未来持有担忧态度,特别是关于超智能带来的存在风险和道德挑战。
Outlines
😀 开场致辞
Meric Gertler 在多伦多大学的 MaRS Discovery District 举办的 Radical AI 创始人活动上致开幕辞,欢迎来宾,并简要介绍活动背景。他强调了多伦多大学在人工智能领域的领导地位,特别提到 Geoffrey Hinton 和他的学生对深度学习的贡献。Gertler 随后介绍了即将推出的 Schwartz Reisman Innovation campus,强调其将如何推动创新和科技发展。
🤖 AI技术的进步与影响
活动中讨论了 Geoffrey Hinton 和 Fei-Fei Li 在人工智能领域的贡献,特别是在深度学习和计算机视觉方面。提到了他们的工作对学术界和工业界的影响,以及他们在推动技术创新和理解人工智能潜力方面的角色。此外,还强调了对于技术伦理、社会责任以及AI对未来人类社会影响的关注。
🌐 ImageNet的诞生与影响
Fei-Fei Li 讲述了 ImageNet 数据库的创建过程,包括最初的挑战、目标和对机器学习领域的深远影响。ImageNet 通过提供大量标注图片,极大地推动了计算机视觉和深度学习技术的发展。这一段还提及了与 ImageNet 相关的重要事件,如 ImageNet Challenge,以及这些活动是如何促进深度学习模型,特别是卷积神经网络在图像识别任务中的突破性进展。
💡 AI技术的转折点
本段回顾了 AlexNet 在 ImageNet 挑战中的胜利以及其对计算机视觉领域影响的描述。讨论了这一事件是如何标志着深度学习和神经网络在图像识别中取得领先地位的转折点,以及它是如何引发学术界和工业界对深度学习技术的广泛关注和投资。
📚 从学术到实践
深入讨论了 AI 技术从学术研究到实际应用的转变,特别是在语音识别和图像识别方面。通过具体案例,如 Geoffrey Hinton 和他的学生在语音识别领域的工作,以及后来这些技术如何被应用到实际产品中,例如 Google 的 Android 系统。
🚀 ChatGPT的兴起
讨论了 OpenAI 发布 ChatGPT 及其对大众的影响,特别是如何让广泛的用户群体直接体验到 AI 技术的强大能力。ChatGPT 被视为一个使 AI 技术大众化的关键时刻,引发了对人工智能潜力和风险的广泛讨论。
🤝 AI技术的社会责任
本段强调了 AI 技术发展的社会责任和伦理考量。Geoffrey Hinton 和 Fei-Fei Li 讨论了他们对 AI 技术可能带来的正面和负面影响的看法,包括就业、隐私、歧视和偏见等问题。他们强调了需要采取措施来确保 AI 技术的发展符合人类的最佳利益。
📈 AI未来的展望
在对话的最后部分,Geoffrey Hinton 和 Fei-Fei Li 分享了他们对 AI 未来发展的展望,包括技术潜力、社会影响以及需要解决的关键挑战。他们强调了跨学科合作的重要性,以及未来研究人员在推动技术进步的同时,也需关注其对社会的影响。
Mindmap
Keywords
💡人工智能
💡深度学习
💡ImageNet
💡基准测试
💡聊天机器人
💡社会影响
💡伦理
💡责任感
💡可解释性
💡转型风险
Highlights
Geoffrey Hinton and Fei-Fei Li discuss the evolution and impact of AI, highlighting its potential to transform various sectors.
Acknowledgement of the land by Meric Gertler, emphasizing the University of Toronto's commitment to recognizing its history.
The significance of the MaRS Discovery District and the University of Toronto's role in the AI community is highlighted.
Geoffrey Hinton's pivotal role in deep learning and AI, earning him the nickname 'Godfather of Deep Learning'.
Fei-Fei Li's contributions to AI, particularly in computer vision and her leadership at the Human-Centered AI Institute.
The collaboration between academia and industry in advancing AI research and innovation.
The introduction of the Schwartz Reisman Innovation campus as a hub for AI thought leadership.
Discussion on the dual nature of AI: its incredible potential for innovation and the growing concerns over its societal impact.
The story of the ImageNet database creation by Fei-Fei Li and its monumental impact on AI research.
The breakthrough moment of AlexNet in the ImageNet competition, showcasing the power of deep neural networks.
The transformative effect of AI and deep learning on fields such as drug discovery, medical diagnostics, and more.
Concerns and ethical considerations about AI shaping the future of humanity.
The conversation shifts towards addressing the urgency of responsible AI development and regulation.
Both speakers reflect on their personal journeys into AI research, highlighting their motivations and challenges.
The discussion emphasizes the importance of collaboration between academia, industry, and government in shaping the future of AI.
The audience engagement through questions reflects the community's interest in understanding and contributing to responsible AI.
Transcripts
Well, good afternoon everyone.
Got to love the buzz in the room here today.
Welcome to the MaRS Discovery District,
this wonderful complex
for this very special radical AI founders event,
co-hosted by the University of Toronto.
My name is Meric Gertler
and it's my great privilege to serve as president
of the University of Toronto.
Before we begin, I want to acknowledge the land
on which the University of Toronto operates.
For thousands of years, it has been the traditional land
of the Huron-Wendat, the Seneca,
and the Mississaugas of the Credit.
Today, this meeting place is still the home
to many Indigenous people from across Turtle Island,
and we are very grateful to have the opportunity
to work and to gather on this land.
Well, I'm truly delighted to welcome you all
to this discussion between Geoffrey Hinton,
university professor Emeritus at the University of Toronto,
known to many as the Godfather of Deep Learning,
and Fei-Fei Li the inaugural Sequoia Professor
in Computer Science at Stanford University,
where she's Co-Director of Human-centered AI Institute.
I want to thank Radical Ventures
and the other event partners for joining with U of T
to create this rare and special opportunity.
Thanks in large part to the groundbreaking work
of Professor Hinton and his colleagues,
the University of Toronto has been at the forefront
of the academic AI community for decades.
Deep learning is one of the primary breakthroughs
propelling the AI boom,
and many of its key developments were pioneered
by Professor Hinton and his students at U of T.
This tradition of excellence,
this long tradition continues into the present.
Our faculty, students and graduates,
together with partners at the Vector Institute
and at universities around the world
are advancing machine learning and driving innovation.
Later this fall, our faculty, staff, students,
and partners will begin moving into phase one
of the beautiful new Schwartz Reisman Innovation campus
just across the street.
You may have noticed a rather striking building
at the corner with the official opening plan
for early next year.
This facility will accelerate innovation and discovery
by creating Canada's largest
university based innovation hub,
made possible by a generous and visionary gift
from Heather Reisman and Gerry Schwartz.
The innovation campus will be a focal point
for AI thought leadership,
hosting both the Schwartz Reisman Institute
for Technology and Society,
led by Professor Gillian Hadfield
and the Vector Institute.
It is already clear that artificial intelligence
and machine learning are driving innovation
and value creation across the economy.
They're also transforming research
in fields like drug discovery, medical diagnostics,
and the search for advanced materials.
Of course, at the same time,
there are growing concerns over the role
that AI will play in shaping humanity's future.
So today's conversation clearly addresses
a timely and important topic,
and I am so pleased that you have all joined us
on this momentous occasion.
So without further ado,
let me now introduce today's moderator, Jordan Jacobs.
Jordan is managing partner
and co-founder of Radical Ventures,
a leading venture capital firm
supporting AI-based ventures here in Toronto
and around the world.
Earlier he co-founded Layer 6 AI
and served as Co-CEO prior to its acquisition
by TD Bank Group, which he joined as Chief AI Officer.
Jordan serves as a Director of the Canadian Institute
for Advanced Research,
and he was among the founders of the Vector Institute,
a concept that he dreamed up
with Tomi Poutanen, Geoff Hinton,
Ed Clark, and a few others.
So distinguished guests,
please join me in welcoming Jordan Jacobs.
(audience applauding)
- Come on up.
Thanks very much, Meric.
I wanted to start by thanking a number of people
who've helped to make this possible today,
University of Toronto and Meric,
Melanie Woodin, Dean of Arts and Science,
and a number of partners that have brought this to fruition.
So this is the first in our annual four part series
of founder AI masterclasses that we run at Radical.
This is the third year we've done it,
and today's the first one of this year.
We do it in person and online.
So we've got thousands of people watching this online.
So if you decide you need to start coughing,
maybe head outside.
We do that in partnership with the Vector Institute
and thank them very much
for their participation and support
with the Alberta Machine Intelligence Institute in Alberta,
and with Stanford AI, thanks to Fei-Fei.
So thank you all of you for being excellent partners.
We're hoping that this is gonna be
a really interesting discussion.
This is the first time that Geoff and Fei-Fei,
who I like to think of as friends, and I get to talk to,
but this is the first time
they're doing this publicly together.
So it's, I think, gonna be
a really interesting conversation.
Let me quickly do some deeper explanations
of their background.
Geoff is often called the Godfather
of Artificial Intelligence.
He's won the touring award.
He is a Professor Emeritus University of Toronto,
co-founder of the Vector Institute,
also mentored in a lot of the people
who have gone on to be leaders in AI globally,
including at the big companies
and many of the top research labs in the world in academia.
So when we say godfather, it really is,
there are many kinds of children and grandchildren of Geoff
who are leading the world in AI
and that all comes back to Toronto.
Fei-Fei is the founding Director of the Stanford Institute
for Human-Centered AI,
Professor at Stanford.
She's an elected member of the National Academy
of Engineering in the US,
the National Academy of Medicine
in the American Academy of Arts and Science.
During a sabbatical from Stanford in 2017/18,
she stepped in for a role as a Vice-President at Google
as Chief Scientist of AI/ML at Google Cloud.
There's many, many other things we could say about Fei-Fei
but she also has an amazing number of students
who have gone on to be leaders in the field globally.
And really importantly,
and so for those of you who haven't heard yet,
Fei-Fei has a book coming out in a couple of weeks.
It is called, it's coming out on November 7th,
it's called "The Worlds I See,
Curiosity, Exploration, and Discovery at the Dawn of AI."
I've read it, it's fantastic.
You should all go out and buy it.
I'll read you the back cover slip that Geoff wrote
'cause it's much better than what I could say about it.
So here's Geoff's description.
"Fei-Fei Li was the first computer vision researcher
to truly understand the power of big data,
and her work opened the floodgates for deep learning.
She delivers an urgent, clear-eyed account
of the awesome potential
and danger of the AI technology that she helped to unleash.
And her call for action and collective responsibility
is desperately needed at this pivotal moment in history."
So I urge you all to go and pre-order the book
and read it as soon as it comes out.
With that, thanks Fei-Fei and Geoff for joining us.
- Thank you, Jordan.
(audience applauding)
- Okay, so I think it's not an exaggeration
to say that without these two people,
the modern age of AI does not exist,
certainly not in the way that it's played out.
So let's go back to what I think is the big bang moment.
AlexNet ImageNet, maybe Geoff,
do you want to take us through from your perspective
that moment which is 11 years ago now?
- Okay, so in 2012, two of my very smart graduate students
won a competition, a public competition,
and showed that deep neural networks
could do much better than the existing technology.
Now, this wouldn't have been possible
without a big data set that you could train them on.
Up to that point, there hadn't been a big data set
of labeled images,
and Fei-Fei was responsible for that data set.
And I'd like to start by asking Fei-Fei
whether there were any problems
in putting together that data set?
(audience laughing)
- Well, thank you Geoff, and thank you Jordan,
and thank you University Toronto for this,
it's really fun to be here.
So yes, the data set that Geoff you're mentioning
is called ImageNet.
And I began building it 2007
and spent the next three years pretty much
with my graduate students building it.
And you asked me was there a problem building it,
where do I even begin?
(Fei-Fei laughing)
Even at the conception of this project
I was told that it really was a bad idea.
I was a young Assistant Professor.
I remember it was my first year actually
as a Assistant Professor at Princeton
and for example, a very respected mentor
of mine in the field,
if you know the academic jargon,
these are the people who will be writing
my tenure evaluations, actually told me
really out of their good heart
that please don't do this after I told them
what this plan is back in 2007.
- So that would've been Jitendra right?
(audience laughing)
- The advice was that,
"You might have trouble getting tenure if you do this."
And then I also tried to invite other collaborators
and nobody in machine learning or AI
wanted to even go close to this project,
and of course no funding. - Sorry.
(audience laughing)
- Just describe ImageNet to us
for the people who are not familiar with what it was.
- Yeah, so ImageNet was conceived around 2006, 2007,
and the reason I conceived ImageNet was actually twofold.
One is that,
and Geoff, I think we share similar background,
I was trained as a scientist,
to me, doing science is chasing after North Stars.
And in the field of AI, especially visual intelligence,
for me, object recognition,
the ability for computers
to recognize there's a table in the picture
or there's a chair is called object recognition,
has to be a North star problem in our field.
And I feel that we need to really put a dent
in this problem.
So I want to define that North Star problem,
that was one aspect of ImageNet.
Second aspect of ImageNet was recognizing
that machine learning was really going in circles
a little bit at that time,
that we were making really intricate models
without the kind of data to drive the machine learning.
Of course, in our jargon,
it's really the generalization problem, right?
And I recognize that we really need to hit a reset,
and rethink about machine learning
from a data driven point of view.
So I wanted to go crazy
and make a data set that no one has ever seen
in terms of its quantity and diversity and everything.
So ImageNet after three years was a curated data set
of internet images that's totaled
15 million images across 22,000 concepts,
object category concepts.
And that was the data set
- Just for comparison,
at the same time in Toronto
we were making a data set called CIFAR-10
that had 10 different classes and 60,000 images,
and it was a lot of work,
with general generously paid for by CIDAR
at five cents an image.
- And so you turn the data set into a competition,
just walk us through a little bit
of what that meant,
and then we'll kind of fast forward to 2012.
- Right.
So we made the data set in 2009.
We barely made it into a poster in a academic conference.
And no one paid attention.
So it was a little desperate at that time.
And I believe this is the way to go.
And we open sourced it,
but even with open source,
it wasn't really picking up.
So my students and I thought,
well let's get a little more drive up the competition.
Let's create a competition
to invite the worldwide research community
to participate in this problem of object recognition
through ImageNet.
So we made a ImageNet competition
and the first feedback we got
from our friends and colleagues is, it's too big.
And at that time you can not fit it into a hard drive,
let alone memory.
So we actually created a smaller data set
called the ImageNet challenge data set,
which is only 1 million images
across 1000 categories instead of 22,000 category,
and that was unleashed in 2010, I think.
You guys noticed it in 2011, right?
- Yes. - Yeah.
- And so in my lab we already had deep neural networks
working quite well for speech recognition.
And then Ilya Sutskever said,
"What we've got really ought to be able
to win the ImageNet competition."
And he tried to convince me that we should do that.
And I said, well, you know, it's an awful lot of data.
And he tried to convince his friend Alex Krizhevsky,
and Alex wasn't really interested.
So Ilya actually pre-processed all the data
to put it in just the form Alex needed it in.
- You shrunk the size of the images.
- Yes. - Yeah.
- He shrunk the images a bit.
- Yeah, I remember.
- And got it pre-processed just right for Alex,
and then Alex eventually agreed to do it.
Meanwhile, in Yann LeCun's lab in New York,
Yann was desperately trying to get his students
and postdocs to work on this data set.
'Cause he said, "The first person
to apply convolutional nets to this data that's gonna win."
And none of his students were interested.
They were all busy doing other things.
And so Alex and Ilya got on with it,
and we discovered
by running on the previous year's competition
that we were doing much better than the other techniques.
And so we knew we were gonna win the 2012 competition.
And then there was this political problem,
which is we thought if we showed that neural networks
win this competition,
the Computer Vision people,
Jitendra in particular will say,
well that just shows it's not a very good data set.
So we had to get them to agree ahead of time
that if we won the competition,
we'd proved that neural networks worked.
So actually called up Jitendra
and we talked about data sets we might run on.
And my objective was to get Jitendra to agree
that if we could do ImageNet,
then neural nets really worked.
And after some discussion
and him telling me to do other data sets,
we eventually agreed, okay, if we could do ImageNet
then we'd have shown neural nets work.
Jitendra remembers it as he suggested ImageNet
and he was the one who told us to do it,
but it was actually a bit the other way round.
And we did it and it was amazing.
We got just over half the error rate
of the standard techniques.
And the standard techniques have been tuned for many years
by very good researchers.
- I remember standard technique at that time,
the previous year is support vector machine
with sparsification. - Right.
- That was, so you guys submitted your competition results,
I think it was late August or early September.
And I remember either getting a phone call,
or getting an email late one evening from my students
who was running this
because we hold the test data
we were running on the server side.
The goal is that we have to process all the entries
so that we select the winners,
and then by, I think it was beginning of October that year
that Computer Vision Fields International Conference,
ICCV 2012 was happening in Florence, Italy.
We already booked a workshop,
annual workshop at the conference.
We will be announcing the winner,
it's the third year.
So a couple of weeks before we have to process the teams.
Because it was the third year
and frankly the previous two years
results didn't excite me,
and I was a nursing mother at that time.
So I decided not to go to the third year,
so I didn't book any tickets.
I'm just like, too far from me.
And then the results came in,
that evening, phone call or email,
I really don't remember, came in.
And I remember saying to myself, darn it Geoff,
now I have to get a ticket to Italy.
Because I knew that was a very significant moment,
especially with a convolutional neural network,
which I learned as a graduate student,
as a classic algorithm.
And of course by that time
there was only middle seats economy class
flying from San Francisco to Florence
with a one stop layover.
So it was a grueling trip to go to Florence-
- I'm sorry. - But I wanted to be there.
(audience laughing)
Yeah, but you didn't come.
- No
(audience laughing)
Well, it was a grueling trip.
- But did you know that would be a historical moment?
- Yes, I did actually.
- You did, and you still didn't come.
But you sent Alex.
- Alex, yes. - Yeah.
- Who ignored all your advice?
- Who ignored my email for multiple times,
'cause I was like, Alex, this is so cool,
please do this visualization, this visualization.
He ignored me.
But Yann LeCun came and it was because,
for those of you who have attended
these academic conference workshops
tend to book these smaller rooms.
We booked a very small room,
probably just the middle section here.
And I remember Yann had to stand in the back of the room
because it was really packed,
and Alex eventually showed up
'cause I was really nervous
that he wasn't even gonna show up.
But as you predicted at that workshop
ImageNet was being attacked.
At that workshop there were people vocally attacking,
this is a bad dataset.
- In the room?
- In the room .
- During the presentation? - In the room.
- But not Jitendra,
'cause Jitendra has already agreed that it counted.
- Yeah, I don't think Jitendra was in the room,
I don't remember.
But I remember it was such a strange moment for me
because as a machine learning researcher,
I knew history was in the making,
yet ImageNet was being attacked.
It was just a very strange,
it was exciting moment.
And then I had to hop in the middle seat
to get back to San Francisco
because then the next morning.
- So you mentioned a few people
that I want to come back to later.
So Ilya who's founder and chief scientist at OpenAI,
and Yann LeCun who subsequently went on
to be head of AI at Facebook now Meta,
and there's a number of other interesting people in the mix.
But before we go forward
and kind of see what that boom moment created,
let's just go back for a little bit.
Both of you started in this
with kind of a very specific goal in mind
that is an individual and I think a iconoclastic,
and you had to persevere through the moments
that you just described,
but kind of throughout your careers.
Can you just go back, Geoff maybe and start,
give us a background
to why did you want to get into AI in the first place?
- I did psychology as an undergraduate.
I didn't do very well at it.
And I decided they were never going to figure out
how the mind worked
unless they figured out how the brain worked.
And so I wanted to figure out how the brain worked
and I wanted to have an actual model that worked.
So you can think of understanding the brain
as building a bridge.
There's experimental data
and things you can learn from experimental data,
and there's things that will do the computations you want,
things that will recognize objects.
And they were very different.
And I think of it as you want to build this bridge
between the data and the competence,
the ability to do the task.
And I always saw myself
as starting at the end of things that work,
but trying to make them more and more like the brain,
but still work.
Other people tried to stay with things
justified by empirical data,
and try and have theories that might work.
But we're trying to build that bridge
and not many people were trying to build a bridge.
Terry Sejnowski was trying to build a bridge
from the other end,
and so we got along very well.
A lot of people doing,
trying to do computer vision,
just wanted something that worked,
they didn't care about the brain.
And a lot of people who care about the brain
wanted to understand how neurons work and so on,
but didn't want to think much
about the nature of the computations.
And I still see it as we have to build this bridge
by getting people who know about the data
and people who know about what works to connect.
So my aim was always to make things that could do vision,
but do vision in the way that people do it.
- Okay, so we're gonna come back to that
'cause I want to ask you about
the most recent developments
and how you think that they relate to the brain.
Fei-Fei, so Geoff just to kind of put a framework
on where you started, UK to the US to Canada,
by mid to late '80, you come to Canada in '87,
along that route, funding and interest in neural nets,
and the way the approach that you're taking
kind of goes like this,
but I'd say mostly like this-
- It went up and down.
- Fei-Fei you started your life in a very different place.
Like can you walk us through
a little bit of how you came to AI?
- Yeah, so I started my life in China,
and when I was 15-year-old,
my parents and I came to Parsippany, New Jersey.
So I became a new immigrant
and where I started was first English
as second language classes,
'cause I didn't speak the language,
and just working in laundries,
and restaurants and and so on.
But I had a passion for physics.
I don't know how it got into my head.
And I wanted to go to Princeton
because all I know was Einstein was there,
and I got into Princeton,
he wasn't there by the time I got into Princeton.
- You're not that old. - Yeah.
But there was a statue of him.
And the one thing I learned in physics,
beyond all the math and all that
is really the audacity to ask the craziest questions,
like the smallest particles of the atom world,
or the boundary of space time and beginning of universe.
And along the way I discover brain
as a third year Roger Penrose and those books.
Yeah, you might have opinions,
but at least I've read those books.
- It was probably better that you didn't.
(audience laughing)
- Well it at least got me interested in brain.
And by the time I was graduating
I wanted to ask the most audacious question as a scientist.
And to me the absolute most fascinating audacious question
of my generation that was 2000 was intelligence.
So I went to Caltech to get a dual,
pretty much a dual PhD
in neuroscience with Christof Koch,
and in AI with Pietro Perona.
So I so echo Geoff, what you said about bridge
because that five years allow me
to work on computational neuroscience
and look at how the mind works,
as well as to work on the computational side,
and try to build that computer program
that can mimic the human brain.
So that's my journey, it starts from physics.
- Okay, so your journeys intersect at ImageNet 2012.
- By the way, I met Geoff
when I was a graduate student. - Right, I remember,
I used to go visit Pietro's lab.
- Yeah.
- In fact he actually offered me a job
at Caltech when I was 17.
- You would've been my advisor.
- No, I would not, not when I was 17.
- Oh, okay.
- Okay, so we intersected at ImageNet,
I mean in the field everyone knows
that ImageNet is this big bang moment
and subsequent to that first the big tech companies come in
and basically start buying up your students and you,
and to get them into the companies.
I think they were the first ones
to realize the potential of this.
I would like to talk about that for a moment,
but kind of fast forwarding,
I think it's only now since ChatGPT
that the rest of the world is catching up
to the power of AI.
Because finally you can play with it.
You can experience it,
in the boardroom they can talk about it,
and then go home,
and then the 10-year-old kid
has just written a dinosaur essay
for fifth grade with ChatGPT.
So that kind of transcendent experience
of everyone being able to play with it,
I think has been a huge shift.
But in the period in between which is 10 years,
there is kind of this explosive growth of AI
inside the big tech companies,
and everyone else is not really noticing what's going on.
Can can you just talk us through your own experience?
Because you experienced
a kind of a ground zero post ImageNet.
- It's difficult for us to get into the frame
of everybody else not realizing what was going on,
'cause we realized what was going on.
So a lot of the universities
you'd have thought would be right at the forefront
were very slow in picking up on it.
So MIT for example, and Berkeley,
I remember going even talking in Berkeley
in I think 2013
when already AI was being very successful
in Computer Vision.
And afterwards a graduate student came up to me
and he said, "I've been here like four years
and this is the first talk
I've heard about neural networks.
They're really interesting."
- Well, they should have gone to Stanford.
- Probably, probably.
But the same with MIT,
they were rigidly against having neural nets.
And the ImageNet moment started to wear them down
and now they're big proponents of neural nets.
But it's hard to imagine now,
but around 2010 or 2011
there was the Computer Vision people,
very good Computer Vision people
who were really adamantly against neural nets.
They were so against it that, for example,
one of the main journals, the IEEE PAM recognition-
- PAM? - PAM.
Had a policy not to referee papers
on neural nets at one point.
Just send them back, don't referee them,
it's a waste of time, it shouldn't be in PAM.
And Yann LaCun sent a paper to a conference
where he had a neural net that was better at identifying,
at doing segmentation of pedestrians
than the state of the art.
And it was rejected.
And it was one of the reasons it was rejected
was one of the referees said,
"This tells us nothing about vision."
'Cause they had this view of how computer vision works,
which is you study the nature of the problem of vision,
you formulate an algorithm that'll solve it,
you figure out how to implement that algorithm,
and then you publish a paper.
In fact, it doesn't work to it
- I have to defend my field, not everybody,
- Not everybody.
- So there are people who are-
- But most of them were adamantly against neural nets.
And then something remarkable happened
after the ImageNet competition, which is,
they all changed within about a year.
All the people who have been the biggest critics
of neural nets started doing neural nets,
much to our chagrin,
and some of them did it better than us.
So this (indistinct) in Oxford, for example,
made a better neural net very quickly.
But they behaved like scientists ought to behave,
which is that the strong belief this stuff was rubbish.
Because of ImageNet we could eventually show
that it wasn't and then they changed.
So that was very comforting.
- And just to carry it forward,
so what you're trying to show,
you're trying to label using the neural nets,
these 15 million images accurately,
you've got them all labeled in the background
so you can measure it.
The error rate when you did it
dropped from 26% the year before,
I think to 16% or so.
- Yep. - I think it's 15.3.
- Okay. And then it subsequently keeps-
- 15.32.
(audience laughing)
- I knew you'd remember. - Which randomization?
- Geoff doesn't forget.
And then in subsequent years
people are using more powerful neural nets
and it continues to drop
to the point where it surpasses-
- 2015.
So there's a Canadian,
very smart Canadian undergrad who joined my lab,
his name is Andrej Karpathy.
And he got bored one summer and said,
"I want to measure how humans do."
So you should go read his blog.
So he had all these like human doing image
that test parties,
he had to bribe them with pizza I think.
with my students in the lab.
And they got to a accuracy about 5%, and that-
Was it five or 3.5?
- Three. - Three.
3.5 I think.
- So humans basically make mistakes about
3% of the time? - Right, right.
And then I think 2016, I think a resonant passed it.
- Yeah. - Right, it was resonant,
is that year's winning algorithm
passed the human performance.
- And then ultimately you had to retire the competition
because it was so much better than humans that had-
- We had to retire 'cause we run out funding.
- Okay, alright.
It's a different reason.
- A bad reason. - Still run outta funding
- Instantly that student started life
at the University of Toronto. - Yes.
- Where he went to your lab,
and then he went to be head of research at Tesla.
- Okay, first of all, he came to Stanford
to be a PhD student.
And yesterday night we were talking,
actually there was a breakthrough dissertation,
in the middle of this.
And then he became part of the founding team of OpenAI.
- But then he went to Tesla. - And then he went to Tesla.
- And then he thought better of it.
- He's back.
But I do want to answer your question of that 10 years.
- Well there's a couple of developments along the way.
- Right. - Transformers.
- Right.
- So the transformer paper is written, the research done,
paper written inside Google,
another Canadian is a co-author there, Aidan Gomez,
who's now the CEO and co-founder of Cohere,
who I think was a 20-year-old intern at Google Brain
when co-authored the paper.
So there's a tradition
of Canadians being involved in these breakthroughs.
But Geoff, you were at Google when the paper was written,
was there an awareness inside Google
of how important this would be?
- I don't think there was, maybe the authors knew,
but it took me several years
to realize how important it was.
And at Google people didn't realize
how important it was until BERT
so BERT used transformers,
and BERT then became a lot better
at a lot of natural language processing benchmarks
for a lot of different tasks.
And that's when people realized transformers were special.
- So 2017 the transformer paper was published.
I also joined Google,
and I think you and I actually met
on my first week. - Right.
- I think most of 2017 and 2018
was neuro-architecture search. - Right.
- I think that was Google's bet.
- Yep. - And there was a lot
of GPUs being used.
So it was a different bet.
- So just to explain that neural architecture search
essentially means this,
you get yourself a whole lot of GPUs,
and you just try lots of different architectures
to see which works best and you automate that.
It's basically automated evolution
for neural net architectures.
- It's like hyper parameter to new.
- Yeah. - Yeah.
- And it led to some- - Good way.
- Quite big improvements. - Yeah.
- But nothing like transformers.
And transformers were a huge improvement
for natural language
- Neural architecture search was mostly the ImageNet.
- Yeah. - Yeah.
- So I'll tell you our experience of transformers.
So we were doing our company Layer 6 at the time,
I think we saw a pre-read of the paper
and we were in the middle of a fundraising
and a bunch of acquisition offers and read the paper.
And I mean, not just me,
but my partner told me who had studied with you,
and Maksims Volkovs who came out of the group lab.
And we thought this is the next iteration of neural nets,
we should sell the company,
start a venture fund and invest in these companies
that are gonna be using transformers.
So we figured it would take five years
to get adopted beyond Google.
And then from that moment forward,
it would be 10 years for all the software
in the world to get replaced
or embedded with this technology.
We made that decision five years and two weeks
before ChatGPT came out.
So I'm glad to see we were good at predicting,
but I have to give credit to my co-founders
who I thought I understood what the paper was,
but they were able to explain it fully.
- I should just correct you,
I don't think Tomi ever studied with me.
He wanted to come study with me,
but a colleague in my department told him
if he came to work with me,
that would be the end of his career
and he should go do something else.
- So he took the classes,
and this is my partner who in the late '90s
was doing a master's at U of T,
and he wanted to go study with Geoff, studied neural nets.
And his girlfriend, now wife's father,
who was a engineering professor, said, "Don't do that,
neural nets are a dead end."
So instead he took the classes
but wrote his thesis in cryptocurrency.
(audience laughing)
Okay, so-
- Are you still gonna talk about the 10 years?
Because I think there's something important.
- Yeah, so go ahead.
- So I do think there's something important the world
overlooked this 10 years between ImageNet, AlexNet
and ChatGPT.
Most of the world sees this as a tech 10 years,
or we see it as a tech 10 years,
in the big tech there's things brewing.
I mean it took sequence to sequence transformer,
but things are brewing.
But I do think for me personally
and for the world, it's also a transformation
between tech to society.
I actually think personally, I grew from a scientist
to a humanist in this 10 years.
Because having joined Google for that two years
in the middle of the transformer papers,
I begin to see the societal implication of this technology.
It was post AlphaGo moment
and very quickly we got to the AlphaFold moment.
It was where bias it was creeping out,
there was privacy issues.
And then we're starting to see the beginning
of disinformation and misinformation.
And then we're starting to see the talks
of job within a small circle,
not within in a big public discourse.
It was when I grew personally anxious,
I feel, you know 2018-
Oh, oh, it was also right after Cambridge Analytica.
So that huge implication of technology, not AI per se,
but it's algorithm driven technology on election,
that's when I had to make a personal decision
of staying at Google or come back to Stanford.
And I knew the only reason I would come back
to Stanford was starting this human-centered AI institute
to really, really understand the human
side of this technology.
So I think this is a very important 10 years,
even though it's kind of not in the eyes of the public,
but this technology is starting to really creep into
the rest of our lives.
And of course 2022,
it's all shown under the daylight
how profound this is.
- There's an interesting footnote
to what happened during that period as well,
which is ultimately you and Ilya and Alex joined Google,
but before that there was a big Canadian company
that had the opportunity to get access to this technology.
Do you want us, I've heard this story
but I don't think it's ever been shared publicly.
Maybe do you want to share that story for a second?
- Okay, so the technology that we were using
for the ImageNet,
we developed it in 2009 for doing speech recognition,
for doing the acoustic modeling, bit of speech recognition.
So you can take the sound wave
and you can make a thing called a spectrogram,
which just tells you at each time
how much energy that is at each frequency.
So you're probably used to seeing in spectrograms.
And what you'd like to do is look at a spectrogram
and make guesses about which part
of which phonamium is being expressed
by the middle frame of the spectrogram.
And two students, George Dahl
and another student
who I shared with Gerald Penn called Abdo,
he had a longer name, we all called him Abdo,
who was a speech expert, George was a learning expert.
Over the summer of 2009,
they made a model that was better
than what 30 years of speech research
had been able to produce,
and big, big teams working on speech research.
And the model was slightly better,
not as big as the ImageNet gap, but it was better.
And that model was then ported to IBM and to Microsoft
by George went to Microsoft
and Abdo went to IBM,
and those big speech groups started using neural nets then.
And I had a third student
who'd been working on something else,
called Navdeep, Navdeep Jaitly.
And he wanted to take this speech technology
to a big company,
but he wanted to stay in Canada
for complicated visa reasons.
And so we got in touch with Blackberry, RIM,
and we said we've got this new way
of doing speech recognition
and it works better than the existing technology
and we'd like a student to come to you over the summer
and show you how to use it,
and then you can have the best speech recognition
in your cell phone.
And they said after some discussions,
a fairly senior gap Blackberry said,
"We are not interested."
So our attempt to give it to Canadian industry failed.
And so then Navdeep took it to Google,
and Google were the first to get it into a product.
So in 2012, around the same time
as we won the ImageNet competition,
George and Abdo's speech recognition acoustic model,
the acoustic model was in,
there was a lot of work making it a good product
and making it have low latency and so on,
that came out in the Android.
And there was a moment when the Android suddenly became
as good as Siri at speech recognition
and that was a neural net.
And I think for the people high up in the big companies,
that was another ingredient.
They saw it get this dramatic result for vision,
but they also saw that it was already out in a product
for speech recognition was working very well there too.
So I think that combination of it does speech,
it does vision, clearly it's gonna do everything.
- We won't say anymore
about Blackberry. - It was a shame.
It was a shame that Canadian industry didn't-
I think we might have still had Blackberries
if that happened.
(audience laughing)
- Alright, we'll leave that one there.
(audience laughing)
I thought it was a story, I've heard this story before,
but I thought it was important for the rest of the world
to know some of what went on behind the scenes,
why this technology didn't stay in Canada
even though it was offered for free.
Okay, so let's advance forward.
We now have post transformers,
Google is starting to use this
and develop it in a number of different ways.
OpenAI, where your former student Ilya had left Google,
been a founder of OpenAI with Elon Musk and Sam Altman,
Greg Brockman and a few others.
Ilya is the chief scientist,
and Andrej your student as a co-founder.
So they are working together a very small team
to basically take turns,
well initially the idea was we're gonna build AGI,
artificial general intelligence,
ultimately the transformer paper comes out,
they start to adopt at some point transformers,
and they start to make extraordinary gains internally,
they're not really sharing publicly
in what they're able to do in language understanding
and a number of other things.
They had efforts going on in robotics that spun out.
Pieter Abbeel ended up spinning out Covariant,
a company we subsequently invested in and other things.
But so the language part of it advances,
and advances and advances.
People outside OpenAI don't really know
to the extent what's going on.
And then ChatGPT comes out November 30th last year.
So 10 months ago. - Well, GPT-2
caught the attention of some of us.
I think actually, I think by the time GPT-2 came out,
my colleague Percy Liang, an LP Professor at Stanford,
I remember he came to me and say,
"Fei-Fei I have a whole different realization
of how important this technology is."
So to the credit of Percy,
he immediately asked HAI to set up a center to study this.
And I don't know if this is contentious in Toronto,
Stanford is the university that coined the term
foundation models,
and some people call it LLM- large language model.
But going beyond language, we call it a foundation model.
We created the center of research
for foundation model before, I think before 3.5 came out.
So definitely before ChatGPT. - Just describe
what a foundation model is
just for those who are not familiar.
- That's actually a great question.
Foundation model, some people feel
it has to have transformer in it.
I don't know if you use- - No, it just has to be
a very big huge amount of data.
- Very large, pretrained with huge amount of data.
And I think one of the most important thing
of a foundation model is the generalizability
of multiple tasks.
You're not training it for example, machine translation.
So in NLP, machine translation is a very important task,
but the kind of foundation model
like GPT is able to do machine translation,
is able to do conversation, summarization,
and blah blah blah.
So that's a foundation model
and we're seeing that now in multimodality.
We're seeing a vision, in robotics, in video and so on.
So we created that.
But you're right, the public sees this in the-
- 10 months ago. - What did you say?
- October 30th. - November, I think.
- November.
- One other very important thing about foundation models,
which is for a long time in cognitive science,
the general opinion was that these neural nets,
if you give 'em enough training data,
they can do complicated things,
but they need an awful lot of training data.
They need to see thousands of cats.
And people are much more statistically efficient.
That is they can learn to do these things on much less data.
And people don't say that so much anymore
because what they were really doing was comparing
what an MIT undergraduate can learn to do
on the limited amount of data
with what a neural net that starts with random weights
can learn to do on a limited amount of data.
- Yeah,
that's an unfair comparison. - And if you want to make
a fair comparison,
you take a foundation model that is a neural net
that's been trained on lots and lots of stuff
and then you give it a completely new task,
and you ask how much data does it need
to learn this completely new task?
And that's called few shot learning
'cause it doesn't take much.
And then you discover these things
are statistically efficient.
That is, they compare quite favorably with people
in how much data they need to learn to do a new task.
So the old kind of innatist idea
that we come with lots of innate knowledge,
and that makes us far superior to these things,
you just learn everything from data.
People have pretty much given up on that now
because you take a foundation model
that had no innate knowledge but a lot of experience
and then you give it a new task,
it learns pretty efficiently.
It doesn't need huge amounts of data.
- You know, my PhD is in one-shot learning,
but it's very interesting,
even in Beijing framework you could pre-train,
but it's only in the neuro network
kind of pre-training really can get you this multitask.
- Right.
- Okay, so this basically gets productized in ChatGPT,
the world experiences it, which is only 10 months ago,
although for some of us
it feels like- - Seems longer.
- Much longer. - It feels like forever.
- Because you suddenly you have this,
you had this big bang that happened a long time ago
that I think for a long time
no one really saw the results of it,
suddenly, I mean my comparison would be
there's planets that are formed,
and stars that are visible,
and everyone can experience
the results of what happened 10 years before,
and then transformed, etc.
So the world suddenly becomes very excited about
what I think feels to a lot of people like magic.
Something that they can touch and they can experience
and gives them back a feedback
in whatever way they're asking for it.
Whether they're putting in text prompts
and asking for an image to be created,
or video, or texts,
and asking for more texts to come back
and answer things that you would never be able to expect
and getting those unexpected answers.
So it feels a little bit like magic.
My personal view is that,
we've always moved the goal line in AI.
AI is always the thing that we couldn't do,
it's always the magic.
And as soon as we get there
then we say that's not AI at all,
or there's people around that say, that's not AI at all.
We move the the goal line.
In this case what was your reaction when it came out?
I know part of your reaction is you quit Google
and decided to do different things,
but when you first saw it, what did you think?
- Well, like Fei-Fei said,
GPT-2 made a big impression on us all.
And then there was a steady progression,
also I'd seen things within Google before GPT-4
and GPT-3.5 that were just as good like PaLM
So that in itself didn't make a big effort.
It was more PaLM made an impression on me within Google
'cause PaLM could explain why a joke was funny,
and I'd always just use that as a,
we'll know that it really gets it
when it can explain why a joke is funny.
And PaLM could do that.
Not for every joke but for a lot of jokes.
- And so- - Incidentally these things
are quite good now at explaining why jokes are funny
but they're terrible at telling jokes,
and there's a reason which is
they generate text one word at a time.
So if you ask them to tell a joke,
what they do is they're trying to tell a joke.
So they're gonna try and tell stuff that sounds like a joke.
So they say, a priest and a badger went into a bar
and that sounds a bit like the beginning of a joke
and they keep going telling stuff
that sounds like the beginning of a joke.
But then they get to the point
where they need the punchline.
And of course they haven't thought ahead,
they haven't thought what's going to be the punchline.
They're just trying to make it sound like
they lead into a joke,
and then they give you a pathetically weak punchline,
'cause they have to come up with some punchline.
So although they can explain jokes
'cause they get to see the whole joke
before they say anything,
they can't tell jokes, but we'll fix that.
- Okay, so I was going to ask you
if comedian is a job of the future or not.
You think soon?
- Probably not.
- All right. - So anyway-
- So what was your reaction to it?
And again, you've seen things
behind the scenes along the way.
- A couple of reaction.
My first reaction is of all people I thought I knew
the power of data,
and I was still old by the power of data.
That was a technical reaction.
I was like, darn it, I should have made a bigger ImageNet.
No, but maybe not,
but that was really- - You still could.
- Funding is the problem.
Yeah, so that was first.
Second, when I saw the public awakening moment to AI
with ChatGBT,
not just the GPT-2 technology moment,
I generally thought,
thank goodness we've invested in human centered AI
for the past four years.
Thank goodness we have built a bridge
with the policy makers, with the public sector,
with the civil society.
We have not done enough,
but thank goodness that that conversation had started.
We were participating it,
we were leading some part of it.
For example, we as a institute at Stanford,
we're leading a critical national AI research cloud bill
that is still going through Congress right now.
- [Geoff] Not right now actually.
- Senate, Senate, it's by camera,
so at least it's moving the senate
because we predicted the societal moment
for this tech.
We don't know when it would come,
but we knew it would come,
and it was just a sense of urgency honestly.
I feel that this is the moment
we really have to rise to,
not only our passion as technologist,
but responsibility as humanists.
- And so you both,
I think the common reaction of you both has been,
we have to think about both the opportunities of this,
but also the negative consequences of it.
- So for me, there was something I realized
and didn't realize until very late,
and what got me much more interested in the societal impact
was like Fei-Fei said, the power of data.
These big chatbots have seen thousands
of times more data than any person could possibly see.
And the reason they can do that
is 'cause you can make thousands
of copies of the same model,
and each copy can look at a different subset of the data,
and they can get a gradient from that
of how to change their parameters,
and they can then share all those gradients.
So every copy can benefit
from what all the other copies extracted from data,
and we can't do that.
If suppose you had 10,000 people
and they went out and they read 10,000 different books,
and after they've each read one book,
all of them know what's in all the books.
We could get to be very smart that way,
and that's what these things are doing
and so it makes them
far superior to us. - And there is education.
There's some schooling that we're trying to do that
but not in the way. - Yes.
But education's just hopeless,
I mean hardly worth paying for.
(audience laughing)
- Except University of Toronto and Stanford.
(audience laughing)
- I've tried to explain to friends
that Geoff has a very sarcastic sense of humor
and if you spend enough time around it, you'll get it.
But I'll leave it to you to decide
whether that was sarcastic.
- So the way we exchange knowledge, roughly speaking,
this is something of a simplification,
but I produce a sentence
and you figure out what you have to change in your brain,
so you might have said that,
that is if you trust me.
We can do that with these models too.
If you want one neural net architecture
to know what another architecture knows,
which is a completely different architecture,
you can't just give it the weights.
So you get one to mimic the output of the other,
that's called distillation
and that's how we learn from each other.
But it's very inefficient,
it's limited by the bandwidth of a sentence,
which is a few hundred bits.
Whereas if you have these models,
these digital agents which have a trillion parameters,
each of them looks at different bits of data
and then they share the gradients,
they're sharing a trillion numbers.
So you are comparing an ability to share knowledge
that's in trillions of numbers
with something that's hundreds of bits.
They're just much, much better than us at sharing.
- So I guess Geoff that-
So I agree with you at the technology level,
but it sounded like for you
that's the moment that got you feeling very negative.
- That's the moment I thought, we are history, yeah.
- Yeah, I'm less negative than you.
I'll explain later,
but I think that's where we- - Well let one sec actually,
let's talk about that.
Explain why you are optimistic
and let's understand why you are more pessimistic.
- I'm pessimistic 'cause the pessimists are usually right.
(audience laughing)
- I thought I was a pessimist too.
We have this conversation.
So I don't know if I should be called an optimist.
I think I'm-
Look when you came to a country when you're 15
now speaking a single bit of language and starting from $0,
there's something very pragmatic in my thinking.
I think technology,
our human relationship with technology is a lot messier
than an academia typically would predict,
'cause we come to academia in the ivory tower,
we want to make a discovery,
we want to build a piece of technology,
but we tend to be purist.
But when the technology like AI hit the ground
and reach the societal level,
it is inevitably messily entangled with what humans do.
And this is where maybe you call it optimism
is my sense of humanity.
I believe in humanity.
I believe in the, not only the resilience of humanity,
but also of collective will,
the arc of history is dicey sometimes.
But if we do the right thing,
we have a chance, we have a fighting chance
of creating a future that's better.
So what I really feel is not
delusional optimism at this point,