Provably Safe AI – Steve Omohundro
Summary
TLDRThe speaker, Ste Mundro, discusses the urgent need for provably safe AI, arguing that current safety measures are insufficient. He highlights the rapid development and accessibility of powerful AI models, emphasizing the risks of unaligned AI and the potential for AI-driven manipulation and cyber threats. Mundro advocates for a security mindset, leveraging mathematical proof and the laws of physics to ensure AI safety, and suggests that provable software, hardware, and social mechanisms could form the foundation of a robust human infrastructure resistant to AI threats, ultimately choosing the path of human thriving.
Takeaways
- 🧠 Powerful AI is already here: The script discusses the presence of advanced AI, including models like Meta's Llama with billions of parameters, indicating that we are in the era of significant AI capabilities.
- 🚀 Open Source AI models are gaining momentum: The release of models like Llama 3 has led to widespread access and downloads, emphasizing the importance of considering safety in open-source AI development.
- 💡 Current AI safety approaches are insufficient: The speaker argues that while current approaches to AI safety are valuable, they are not enough to address the challenges posed by rapidly advancing AI technologies.
- 🔒 The need for mathematical proof in AI safety: The script suggests that for guaranteed safety, we should rely on mathematical proof and the laws of physics, moving beyond just alignment and regulation.
- 🌐 AI's impact on society and infrastructure: The discussion highlights the potential risks of AI, such as manipulation, bribery, and cyber attacks, and the need to harden human infrastructure against these threats.
- 🤖 The potential of provable software and hardware: The speaker introduces the concept of developing software and hardware that can be mathematically proven to be safe and reliable.
- 🕊️ Choosing the path of human thriving: The script concludes with a call to action to choose a path that leads to human flourishing through the development of secure and beneficial AI technologies.
- 🔢 The rise of large language models (LLMs): The transcript mentions the increasing capabilities of LLMs, their persuasive power, and the potential for them to be used in manipulative ways.
- 🛡️ The importance of restructuring decision-making: To mitigate risks, the script suggests reorganizing how decisions are made to prevent AI manipulation and to ensure safety.
- 🌟 Rapid advancements in AI agents: The development of AI agents capable of autonomous tasks, like gene editing, indicates a future where AI capabilities expand rapidly, necessitating robust safety measures.
- ⏱️ The urgency of establishing AI safety: Timelines presented in the script suggest that significant AI influence on the world could occur within a few years, emphasizing the need for immediate action in AI safety.
Q & A
What is the main topic of Ste Mundro's talk?
-The main topic of Ste Mundro's talk is 'provably safe AI', discussing the current state of AI safety, the insufficiency of existing approaches, and the need for mathematical proof and the laws of physics to ensure guaranteed safety.
Why did Meta release their LLaMA models?
-The script does not provide specific reasons for Meta's release of the LLaMA models, but it mentions that Meta released models with 8 billion, 70 billion, and 400 billion parameters, indicating a push towards powerful AI models.
What is the significance of the 400 billion parameter LLaMA model?
-The 400 billion parameter LLaMA model is significant because it has performance similar to the best models from labs and raises concerns about its potential open-source release, which could lead to widespread access to such powerful AI capabilities.
What is the potential impact of powerful AI running on inexpensive hardware like Raspberry Pi?
-The potential impact is that millions of units of inexpensive hardware like Raspberry Pi, which can run powerful AI, could lead to a significant increase in the accessibility and distribution of AI capabilities, posing challenges for safety and control.
What is the current state of AI's persuasive abilities compared to humans, according to a paper mentioned in the script?
-According to a paper mentioned in the script, current large language models (LLMs) are 81.7% more persuasive than humans, indicating a potential for AI to be used in manipulative ways.
Why should humans not directly control risky actions involving AI?
-Humans should not directly control risky actions involving AI because they can be manipulated by AI, which could lead to undesirable outcomes or be exploited for malicious purposes.
What is the role of alignment in ensuring AI safety?
-Alignment involves making sure that AI models have values and motivations that are consistent with human interests. However, the script suggests that alignment alone is insufficient for safety due to the potential for misuse of open-source models.
What are some of the top methods for preventing AI misuse mentioned in the script?
-The top methods mentioned include alignment, red teaming, restricting AI to non-agentic tools, limiting system power, and pausing or halting AI progress.
What is the concept of 'provable software' in the context of AI safety?
-'Provable software' refers to the use of mathematical proof to ensure that software meets specific requirements and is safe to use, even if the source of the software or its underlying motivations are untrusted.
What are the five important logical systems mentioned for underlying theorem provers?
-The five important logical systems mentioned are propositional logic, first-order logic, Zermelo-Fraenkel set theory, type theory, and dependent type theory.
How can mathematical proof and the laws of physics provide absolute constraints on super intelligent entities?
-Mathematical proof and the laws of physics provide absolute constraints because even the most powerful AI cannot prove a false statement or violate fundamental physical principles, such as creating matter from nothing or exceeding the speed of light.
What is the potential impact of AI on critical infrastructure if we continue with current security measures?
-If we continue with current security measures, we may face increasing AI-powered cyber attacks, disruption of critical infrastructure, and a range of other security threats that could undermine the stability and safety of various systems.
What are 'approvable contracts' and how do they contribute to hardware security?
-Approvable contracts are small modules or devices that can perform secure computation, guaranteed to be the intended computation, and communicate securely with other such devices. They contribute to hardware security by deleting cryptographic keys upon tampering attempts, ensuring the integrity and confidentiality of the operations they perform.
What is the significance of using formal physical world models in developing secure hardware?
-Formal physical world models are crucial for developing secure hardware as they allow for the creation of designs that are provably safe against a defined class of adversaries, ensuring that the hardware behaves as intended even under attack.
What are the core components that can be considered for building trusted systems using 'provable hardware'?
-The core components include trusted sensing, trusted computation, trusted memory, trusted communication, trusted randomness, trusted raw materials, trusted actuators, and trusted deletion and destruction.
How can provable hardware be used to create new social mechanisms?
-Provable hardware can be used to create new social mechanisms by providing a secure foundation for activities like voting, surveillance, identity verification, economic transactions, and governance, ensuring that these mechanisms are transparent, tamper-proof, and aligned with societal goals.
What is the potential outcome if we choose to build on 'provable technology' and develop a robust human infrastructure?
-Choosing to build on 'provable technology' and developing a robust human infrastructure could lead to the elimination of cyber attacks, creation of reliable infrastructure, enhancement of media to support humanity, promotion of peace and prosperity, empowerment of citizens, and long-term environmental and societal flourishing.
Outlines
🤖 Introduction to Provable Safe AI
The speaker, Ste mundro, opens the discussion on provably safe AI, emphasizing the insufficiency of current AI safety measures. He introduces the concept of using mathematical proof and the laws of physics to ensure AI safety, outlining the agenda for the presentation which includes discussing the prevalence of powerful AI, the need for guaranteed safety, and the potential of provable software, hardware, and social mechanisms. The talk also hints at the broader implications of AI on society and the importance of choosing a path that leads to human thriving.
🚀 The Emergence of Powerful Open Source AI
This paragraph delves into the reality of powerful AI systems that are now available in open source, such as Meta's llama models, and their impact on accessibility and safety. The speaker discusses the rapid development and dissemination of these models, the investment in GPU technology, and the potential dangers of AI manipulation and misuse. The paragraph also highlights the challenges posed by AI's persuasive capabilities and the need to restructure decision-making processes to mitigate risks associated with AI manipulation.
🛡️ The Imperative of Proven Infrastructure Safety
The speaker argues that with the rise of open source AI, relying solely on alignment for AI safety is insufficient. He suggests that the infrastructure must be hardened against potential threats, even from unaligned AI models. Mundro discusses the potential for AI to be used in cyber attacks, impersonation, and other harmful applications, and the importance of developing a human infrastructure that can withstand these challenges, emphasizing the need for mathematical proof in ensuring safety.
🔐 The Potential and Perils of AI in Cybersecurity
This section explores the dual-use nature of AI in cybersecurity, with its capacity to both defend and attack systems. The speaker cites examples such as Gidra and G3O, which simplify sophisticated reverse-engineering tasks, and studies showing AI's ability to exploit vulnerabilities. The paragraph underscores the growing trend of using AI for both protective and adversarial cyber measures and the need to anticipate the widespread availability of AI that can compromise common systems.
🌟 The Rapid Evolution of AI Agents and Timelines
The speaker discusses the rapid development of AI agents, which are becoming increasingly autonomous and capable of performing complex tasks. He references various studies and papers that are pushing the boundaries of AI agent development. Mundro also touches on the timelines for AI development, citing a tool from Open Philanthropy that estimates the speed of AI advancement and its potential impact on the world's economy and work by 2027.
🛑 Current Methods and the Need for a Security Mindset
This paragraph examines the current methods used to ensure AI safety, such as alignment, red teaming, restricting AI tools, limiting system power, and pausing AI progress. The speaker argues that while these methods are important, they may not be sufficient due to the proliferation of open source AI models. He advocates for adopting a security mindset, similar to that used in engineering for safety, and the need for mathematical proof and physical laws to provide robust safety guarantees.
📚 The Framework for Guaranteed Safe AI
The speaker introduces a framework for ensuring robust and reliable AI systems through the use of mathematical proof and the laws of physics. He discusses the importance of using universal logical languages for expressing precise statements and the role of proof checkers in verifying these proofs. The paragraph also highlights the need for a security mindset in AI development, emphasizing the use of formal methods and the potential of provable software.
🔬 The Importance of Formal Physical World Models
This section delves into the necessity of formal physical world models for hardware security, emphasizing the need for a formal safety specification and the generation of a formal system design that is provably safe. The speaker discusses the importance of the standard model of particle physics and general relativity as the foundation for these models and the potential for creating trusted components through this approach.
🤖 Transforming Hardware and Social Mechanisms with Provable Technology
The speaker outlines the potential for transforming various aspects of society and technology through provable hardware, including secure computation, communication, and robotics. He discusses the concept of 'approvable contracts' and how they can be used to create secure networks, robots, and supply chains. Mundro also touches on the potential for new social mechanisms that leverage this technology to create a more secure and beneficial society.
🌱 The Choice for Human Thriving Through Provable Infrastructure
In the concluding paragraph, the speaker presents the choice between continuing with current technology practices, which could lead to various negative outcomes, or embracing provable technology to build a human infrastructure that is resilient to AI threats. He envisions a future where provable technology can lead to the elimination of cyber attacks, reliable infrastructure, enhanced media, peace, prosperity, and long-term human flourishing.
Mindmap
Keywords
💡Provable Safe AI
💡Untrusted AI
💡Theorem Proving
💡Cryptography
💡Quantum Computing
💡Formal Methods
💡Provable Hardware
💡Tamper Evidence
💡
💡Zero-Knowledge Proofs
💡Provable Contracts
💡Human-AI Alignment
💡Open Source AI
💡Provable Social Mechanisms
Highlights
The current AI safety approaches are insufficient, and mathematical proof and the laws of physics are needed for guaranteed safety.
Powerful AI is already present in open source, exemplified by Meta's release of their LLaMA models.
Open source AI models are becoming increasingly accessible and powerful, posing a threat if not aligned properly.
The potential for AI to be used in manipulative and harmful ways, such as in cyber attacks and impersonation, is growing.
Current LLMs are more persuasive than humans, indicating a potential for AI manipulation in various sectors.
Humans should not directly control risky actions due to the risk of AI manipulation.
The development of tools like G3O, which simplifies complex reverse engineering, indicates increased cyber threat capabilities.
Large language models can autonomously exploit vulnerabilities, suggesting a future where AI could initiate cyber attacks.
The rapid development of AI agents capable of automating complex tasks like CRISPR gene editing is highlighted.
Estimations by Open Philanthropy suggest AI could significantly impact economic activities within the next few years.
The importance of aligning AI with human values to prevent malicious use by humans or the AI itself is underscored.
Current methods of preventing AI misuse, such as alignment, red teaming, and restricting to non-agentic AI, may not be sufficient.
A security mindset involving modeling the system and its adversaries is proposed for creating safe AI systems.
The use of mathematical proof for creating provable software that meets specific requirements is discussed.
Five important logical systems foundational to theorem proving in AI safety are introduced.
The rapid advancement in AI theorem proving, drawing parallels with game AI developments, is noted.
The necessity of secure hardware in the context of AI safety, including the protection against tampering and spying, is emphasized.
Approvable contracts, devices that perform secure computation and self-destruct upon tampering, are proposed.
The potential for provable hardware to transform social mechanisms, such as voting and surveillance, is highlighted.
A call to choose the path of human thriving through the development of provable technology and robust human infrastructure is made.
Transcripts
okay great uh can everybody hear me is
that sound
okay don't know if I I can see
anybody oh great thank you excellent hi
my name is Ste mundro and thank you so
much for uh for coming uh today I'd like
to talk about provably safe Ai and I'd
like to go through the slides and then
we can discuss uh all the concepts
afterward um so let me start here the
agenda for today is I'm going to argue
that first power but unsafe AI is
already here uh that current AI safety
approaches are important and valuable
but they're insufficient to the T
problems that we face that uh I'm going
to argue that we need to use
mathematical proof in the laws of
physics to get guaranteed safety and
I'll talk about how you could do
provable software provable Hardware
provable Social mechanisms and argue at
the end that we must choose the path of
human
thriving so to start we actually already
have very powerful well we have very
powerful um AI in the big Labs but we're
starting to get very powerful AI in open
source and a few weeks ago uh meta re uh
started releasing their llama 3 models
uh they have an 8 billion parameter a 70
billion parameter and a 400 billion
parameter model and uh they apparently
spent $30 billion on the gpus uh to do
this the the largest model has a similar
performance to the very best models from
the labs from gp4 Turbo Claud three Opus
Gemini Ultra and in the first week uh
1.2 million downloads of the system uh
came um just to sort of get a sense of
what the impact of this type of model
and they're not the only ones uh you
know there's the Falcon model out of Abu
Dhabi there's the mistol models lots and
lots of Open Source models are
progressively getting better and better
they're not quite as good uh as the very
best uh models in in the commercial Labs
but they're getting very close the Llama
3 8 billion parameter model uh somebody
got running on a Raspberry Pi 5 which
you can buy at Amazon right now for $93
and uh apparently not the Raspberry Pi
but raspberry pies in general have have
sold 61 million units so if you just
think about the impact we're going to
have pretty powerful AIS running on $100
uh computers uh and they'll probably be
hundreds of millions of them so that
that that's should be in the back of our
minds as we're thinking about safety uh
the 70 billion parameter model that
one's getting you know very serious not
quite as good as the very best models
but up you know the best models of a
year ago or something and this is a
group that has shown how you can
fine-tune them for any task any
specialty on a home video cards so using
two of the RTX 490 you can have a
machine which is about
$7,000 and uh using you know a fine
tuning method called Kora you can get
extremely high
performance the 400
billion parameter model is the one that
is maybe even better than a lot of the
top commercial models and I was really
nervous about that one going open source
fortunately the rumor from Jimmy apples
if you've ever seen him not necessarily
a reliable source his rumor is that they
won't be open sourcing that and that he
says that Dustin moscowitz who does the
open philanthropy Foundation may have
been responsible for that so if so thank
you Dustin and it gives us a little bit
more breathing
room so what is the lesson from all
these open powerful open source models
uh I would say basically that we can't
AI safety cannot rely only on alignment
because uh in addition to the wonderful
aligned models from the labs there will
be hundreds of millions of models that
are not necessarily aligned and anyone
in any country can cheaply fine-tune an
open source model to create a
world-class specialized model for
anything you can imagine Cyber attack
impersonation manipulation pathogen
synthesis and so on so I believe that
for True safety we need to harden the
infrastructure the human infrastructure
so that even in the presence of all of
these uh potentially unsafe models uh we
we still Thrive and and everything goes
well um here's a paper showing that uh
current llms are
81.7% more persuasive than humans so
that's a bit disturbing U that suggests
people are going to start using llms for
you know trying to sell things for
trying try to convince you you know who
you should vote for other persuasive
things more Darkly we may get a AI
manipulation bribery blackmail extortion
intimidation and so on um and so what's
the lesson from that uh humans should
not directly control risky actions
because then you put them in a position
of being manipulated by AIS and so we
need to restructure the way we make
decisions so that uh this type of
manipulation can't directly cause
problems
um the NSA in in the United States
released a few years ago something
called gidra a very powerful reverse
engineering tool that lets you you know
take code from any kind of a piece of
software or hardware and figure out you
know what its structure is and you can
help use that to help protect it or you
can use that to attack it uh and so but
it's very you know sophisticated tool
that's hard hard to use well somebody
very helpfully created this g3o which is
a large language model model that knows
all about gidra and you can talk to it
in English and it'll do it all for you
and so that type of thing suggests that
many groups will soon have access to
nation state level Cyber attack
capabilities um similarly uh this study
showed that large language model agents
can auton autonomously exploit one day
vulnerabilities so if some system a
router or your operating system has a
flaw in it uh and that flaw is published
like they call those one days zero days
are the ones that you know haven't been
published yet uh the llm could read that
generate the code and attack it and so
that's uh that's a disturbing
development there's a huge amount of
work going on right now in using llms
both to prevent cyber attacks and also
to to do cyber attacks this survey paper
here at the bottom uh surveys 180 papers
doing that so I think the lesson we
should get from this is that we should
expect widely available open- Source AIS
which can exploit the vulnerabilities of
every common
system um this is one of many many
papers which are taking Frontier Leading
Edge large language models and turning
trying to turn them into agents so this
one is uh an agent for automating the
design of crisper Gene editing
experiments and they took several copies
of uh large language models and they
hook them up in a certain way and then
it does reasoning it has goals it you
know can uh operate every all the big
labs are working on this and uh as you
get a system which does something if a
new large language model comes out you
can just drop them in and so as language
models improve agents should improve
very rapidly and so the lesson I think
we should take from this is that
powerful agent models are likely to take
off very rapidly in the next year or
two so what are our timelines how long
you know what's going to happen where's
it going to go uh open philanthropy has
been doing a lot of study of uh trying
to really rigorously estimate timelines
and take off times and they built this
wonderful tool at takeoffs speed.com
which lets you put in various
assumptions about you know what the
different costs and so on are and they
use their model which is based on all
kinds of historical data and they'll
show you what the outcome of that is um
Daniel Koka tahal I don't think I said
his name right uh used to be at open AI
he's one of the recent AI safety people
that resigned and he's kind of famous
because he resigned in a way where he's
he
uh he didn't sign their their
non-disclosure thing um so very
concerned about AI safety and he gave a
talk a few months ago where he used this
model he put in everything he knows as a
you know as a safety person at open Ai
and this is what he came up with and
it's a little disturbing um for him that
this line is called the wake-up call
line where you know there's enough
happening that people start getting
concerned uh and that's 2025 in his
model um this line is when 20 % of the
world's um uh economic activity can be
automated by AI systems and for his
model that's in 2026 and then um this
line is uh when when 100% of the world's
uh work can be done can be done by these
models and for him that's around 2027 so
you know who knows if he's exactly right
but uh it's reasonable assumptions and
we're talking you know two or three
years before very significant
uh influence on on the
world so what do we do about this well
as this wonderful conference is I'm a
fantastic interesting talks and
interesting ideas uh a really nice
summary I think of the current thinking
is Dan Hendrick's book introduction to
AI safety ethics and Society totally
free at this um this URL and uh the two
biggest sources of problems are
malicious humans using AI to do
malicious things and then AI which
itself is malicious you know the gold
driven agents and I think we need to
worry about both of them my sense is
that malicious humans are the most
immediate threat um because you know
they're already using them for you know
trying to get more clicks on on Twitter
and trying to extort people and you know
all kinds of uh bad bad behaviors uh and
they list the top existential threats as
things like biot terrorism nuclear
weapons lethal autonomous weapons and
cyber attacks so very good very nice the
what are the basic methods of preventing
that and
um I would say the top five methods that
that I've been looking at at least are
alignment trying to make sure these
models have values which are aligned
with humans red teaming trying to attack
the models to force them into doing bad
things and seeing how easily they can do
that restricting them to non- agentic AI
tools limiting system power you know the
United States has put limits on if you
if you train a model on more than a
certain amount of compute flops you've
got to notify the the government pausing
or halting AI progress there's you know
the the pause AI group there are various
letters that that argue for that I think
all of these efforts are fantastic
really important very good unfortunately
I don't think any of these will solve
the problem many because of these open
source models so alignment you know you
align a corporate model great what about
all the open source models that various
groups are are playing with red teeming
red teaming can show the presence of
problems it can never show the absence
of problems
restricting to non- agentic AI tools
well I think we've got 100 groups who
are already not doing that limiting
system power um that could be that's a
potentially a good thing except many of
these models run on you know cheap
Hardware the Raspberry Pi and so uh
there's some limits in that pausing and
halting AI progress I think that's great
uh the trouble is if you're going to
pause it you need to do something during
that pause to make the world make it a
better situation when you finished
pausing it's a is that thing so what I'm
going to talk about hopefully are what
we can do in that in that kind of a
pause I would argue that we really need
to take a security mindset and this book
by Nancy levenson engineering a safer
world is a very nice study of that in
everyday things you know how to make
sure airplanes don't fall out of the sky
uh unfortunately we're seeing that more
and more in the news um basically you
need to model the system model the harms
that you're trying to avoid model what
your adversaries capabilities are and
then create design that have safety
guarantees against that
adversary so a group of us um just a few
I know a week or two ago wrote this
paper towards guaranteed safe AI a
framework for ensuring robust and
reliable AI systems uh which lays that
out and uh makes those pieces more
formal and then takes a bunch of other
proposals and shows where they lie on
the spectrum of how strong their
assumptions are and their guarantees are
and uh I think it's great I think it
starts putting everybody under a uh you
know the same same tent um but against
the strongest AI adversaries if we're
really dealing with super intelligent uh
entities unfortunately I think most of
those methods won't really provide
guarantees in that case that there are
only two things that provide absolute
constraints on super intelligent
entities and that's mathematical proof
and the laws of physics well why is this
well it's because even the most powerful
AI can't prove a false statement uh even
the most powerful AI can't create matter
out of nothing they can't go faster than
the speed of light they can't make
entropy decrease so the basic structure
of the universe provides tight
constraints and if we can use those
constraints for human safety that would
be a great thing and so that's the
proposal that Max techmark and I did in
this paper from a few months ago
provably Safe Systems the only path to
controllable AGI and I'll I'll sketch
some of the the uh ideas there I think
this is just the bare beginning there
are many many many uh opportunities for
expanding this and uh so this is really
more a call to please start you know
thinking in this direction and uh
inventing new new ways of uh doing
things safely so let me start with
provable software uh so all of this is
based on mathematical proof so let me
just give the what do we need for
mathematical proof uh we have these
Universal logical languages which allow
you to express any precise statement so
you know these logics came from from
natural language from human language
where human language has ways of
describing things but human language is
very fuzzy and it has probabilistic
things and so the logicians have sort of
extracted the concepts from natural
language and put them in a form where
you can make say things which are
absolutely precise and it turns out
they've now gotten to the point where
all of mathematics physics computer
science engineering economics can all be
expressed in these languages and that
any statement which is true in all
models in one of these these formal
languag has a proof and proofs are these
um sort of sequences of statements that
can be checked and there are small fast
proof Checkers which can check these
proofs with absolute reliability and so
that combination of characteristics I
think is very very powerful for AI
safety so here is an example of the kind
of the simplest way that you might use
this kind of thing let's say you have an
untrusted AI somebody trained it you
don't know who let's say it's running on
hardware and some place you know in the
middle of the desert somewhere you don't
trust them uh you don't know what its
motivations are you don't know if it
might have some you know hidden agenda
all of that can you still use that AI to
do uh work that's of value to you in a
way that you can trust it so here is a
mechanism for doing that first of all
the human never talks directly with the
untrusted AI because then the AI could
manipulate you it could kind Tri kind of
trick you and all kinds of terrible
things instead the human poses their
problem or their software requirement if
they need software or systems
requirement if they're trying to build a
a hardware or a social system they
express it in this in one of these
precise languages the precise statement
is then given to the untrusted AI and
it's allowed to solve it using any
technique it wants it can use search it
can use neural Nets it can use
reinforcement anything you like uh and
it can run on untrusted Hardware it can
actually send jobs off to other
untrusted AI so terrible horrible from a
you know alignment perspective but
nonetheless let's say it succeeds if it
succeeds it gives you the solution but
in addition to the solution it also
gives you a proof that is a solution you
as a human reive the solution and the
proof you can now run your proof Checker
which is a teeny reliable piece of code
there are you know 300 line python
programs that check one of these systems
called metam you if it checks the
solution then it doesn't matter what the
source of it was you have an absolute
guarantee that it meets your
requirements and so that's an example of
how to move from from untrusted
potentially dangerous AIS and yet use
that to build trusted
infrastructure so there are five
important logical systems that are
underlying a lot of the theorem provs
today and I'll just briefly say what
they are there's tons and tons of
literature on them the simplest one is
called propositional logic was invented
in 1847 this is basically given a
Boolean circuit is there an input that
produces true as the output and there
are very powerful they call them it's
called satisfiability there are sat
solvers Microsoft has one called Z3
that's quite good in 1885 that was
extended by including functions and
variables and quantifiers and uh that's
first order logic and first order logic
can really Express anything that can be
proven and there are some pretty good
first order logic provs one called
vampire prover uh in 1922
mathematicians uh built a first order
Theory which could express all of
mathematics and therefore all of you
know engineering and and physics and so
on and that's now called zerof Frankle
set theory and there is a system called
metamath which pretty directly
implements that in 1940 uh type theory
was sort of a parallel uh set of
developments to set theory and uh it's
closer to comput computation and
programming and so on the software side
computer scientists often like type
Theory and so in 1940 Church invented
something that's now called Simple type
Theory and Isabel is a their improver
that's based on that and then in the
1980s um people wanted a richer
expressive capability they developed
dependent type Theory and the two
hottest systems I would say today are
Koch and lean Koch is more for the
computer science lean is more for the
mathematicians um and they're both based
on this dependent type Theory all of
these last three are basically equally
uh expressive and you can convert any
any statement in any one of them and any
proof in any one of them to the others
so it's more a matter of taste which one
you want to use AI theorum provs are
moving ahead very rapidly and the reason
is I think it's quite anal theorem provs
are very analogous to game AI so we've
had huge development in you know playing
chess playing go playing Atari and uh in
the case of gaming eyes we know what the
legal moves are and we know when you've
won same with a theum prover we know
what steps you can take in a in a
theorem uh in a proof that are valid and
you know when you finished proving it uh
in the 1990s IBM had deep blue for
playing chess that basically just you
searched it just searched they built
special purpose hardware and they were
able to beat the human uh world champion
with that um later Deep Mind developed
Alpha go where they trained a neural net
on human PL games of go and then they
combined that with Monte carler tree
search and the combination was able to
beat the world's best uh chess uh go
player then then they said well let's
not train it on human games and they
created Alpha zero and it just played
itself and it learned from its own
self-play and was able to beat Alpha go
they also said well let's let's train it
on chess and Demis hassabis here is
famous for having said uh that Alpha
zero starting from scratch became the
greatest chess playing entity that's
ever existed in nine hours so I think
that's an indicator of how rapidly
things can move when they're able to
generate their own training data um
stockfish is a used to be a search-based
chess player I think it's open source uh
and but they Incorporated all the ideas
of alpha zero and I believe stockfish is
now the world's best chess player and
then somebody just recently trained in
llm on stockfish and they created a
large language model for playing chess
that uses no search so I think that
progression is something that's very
interesting to keep in mind it looks to
me like the theorem provs are undergoing
that the classical theorem provs like
vampire and Z3 they're all based just on
search with maybe a little bit of heris
discs in there uh in 2020 open AI uh
published about gpf f for formal uh in
which they uh trained a uh large
language model on 36,000 meta math
theorems and they were able to prove 56%
of the heldout theorems in 2022 meta did
hypertree Proof search which was an
alpha zero style Monte Carlo tree search
Transformer and they uh trained on all
of uh archive math math papers and they
were able to prove 82% of meta maath
theorems uh in the time since then there
have been a whole bunch of Open Source
provs lean Dojo Lemma reprover Koch Jim
and that area is really hot a new paper
just came out yesterday or day before
yesterday which looks quite
interesting um we for security we need
to use cryptography and the world uses a
lot of cryptography the the net is based
on cryptography uh public key
cryptography lets you exchange
information unfortunately it is
vulnerable to uh Quantum Quantum
Computing and so the world is trying to
sort of upgrade the public key
infrastructure for post Quantum
cryptography but it's not looking so
good here's an estimate of when Quantum
Computing will be a problem there
there's a some somewhat um uh there's
cryptography based on one-way function
symmetric cryptography AES Jau and so on
and that is more resistant against
Quantum computing but it's still not
proven correct and then there is
information theoretic cryptography which
is not very widely used right now
because it's slightly more inconvenient
but it's provably safe and so uh I
suspect we should at least make the
foundation of the systems we're building
based on information theoretic
cryptography so why what is all this
proof AI proof what what what value does
it have there's a big area in computer
science called formal methods where they
try and you know check programs and make
sure that they're uh uh correct they
check the programs in advance what we're
go what we're proposing here is that you
have ai systems which are generating
programs as a part of the natural
operation of systems correct this is
nice you know yeah the program has no
flaws but humans can do that you know
you think it through well enough and you
test it a bunch you can kind of get to a
pretty high level of correctness if
you're in a security situation then even
if you have say 0.1% of the inputs are
flawed attacker can find those and
exploit those chip design it's even more
critical because if you have a flaw on a
chip you got to you know redo it and
change it Intel had that problem a few
years ago and they now use lots of
formal methods um the other benefit is
that it's not just correctness and
security but it gives you a certificate
which is a social thing that says this
is is correct and that can be used to uh
combine multiple independent systems you
can trust work done by untr trusted
parties and so it's a very powerful I
believe we need to extend this to
hardware and so let me just give some
examples with some a few lessons about
today's Hardware somebody has an AI
system that from the sound of you typing
on your keyboard it can extract your
password so that says to me the lesson
is today's password-based cryptography
is very vulnerable uh this is somebody
who took a design for a chip and by
adding one transistor deeply in the
middle of it they uh basically make it
so that an obscure your instruction
sequence uh adds a little bit of charge
to a capacitor and if you do that
instruction sequence enough times it
charges the capacitor up and it opens up
a back door to the Chip And so you got
how do you find that if you somebody
some employee you know is uh doing
working on your chip so the lesson for
me is that both Fabs and Manufacturing
must be
secured uh the supply chains that we
have today all kinds of stories about
adversaries intercepting Hardware on
Route and
making changes to it so the lesson is
today's Supply chains are insecure
here's an example of a guy who took a $2
microcontroller chip little teeny thing
and stuck it into a Cisco firewall using
I think they say $200 of tools that's
the little addition and that opens up a
back door for this firewall uh that
probably nobody's going to notice in
today's world uh unfortunately that's
been happening in military hardware
here's a story about uh you know
counterfeit Cisco gear ending up in a
hardware that's in combat operations so
the lesson is today's military hardware
is insecure row Hammer is an exam very
important but kind of shocking thing
that all of our drram memory chips today
if you access them in a certain way it
can cause bits to bits to flip and
people are using that to violate
security I believe the only way to deal
with this is using uh mathematical proof
uh our physical locks uh are really
terrible there's a great YouTube channel
called the lockpicking lawyer where
every episode is uh he gets you know
locks people send him or all kinds of
locks and he breaks them he opens them
in about you know a few seconds
typically every front door lock of
almost every home is subject to
something called a bump key uh many
safes are vulnerable most cars are
vulnerable so our our physical security
infrastructure is pretty much a disaster
right now uh we're getting lots and this
seems to be the year of the humanoid
robot there are about 20 humanoid robot
companies
we got all kinds of drones we have drone
boats submarine drones uh miniature
drones big drones autonomous land
vehicles and that's only going forward
more and more quickly and people are
figuring out how to use llms to uh
operate them so how do we use
mathematical proof for Hardware security
first of all we need formal physical
world models we need a formal model of
the powers of of an adversary we need a
formal safety specification and then we
generate a formal system system design
that is provably safe in that world
model against that class of adversaries
and so um typically in different
engineering disciplines they use models
they use fairly formal models today
which are built in kind of a stack so
like for chip design you have uh the
design of the chip at the circuit level
you know this gate you know you have a
gate here and it goes there then you
have the physical design which is how is
that circuit laid out on the chip
physically and and then below that you
have the physical properties like what
are the electromagnetic fields so like
rammer is a problem that the
electromagnetic fields on the chip are
causing bit flips that shouldn't be
there and so uh you need a model which
can model that and down at the very
bottom of all these Stacks is uh the
basic fundamental laws of physics and
fortunately we're very lucky to have a
complete model of the laws of physics uh
called the standard model of particle
physics plus general relativity here's
the equation with everything in it uh
that is believed to be completely valid
for energies less than about 10 to the
11th electron volts and away from black
holes neutron stars in the early
Universe where quantum gravity and so on
might be important Sean Carroll this is
from Sean Carroll's paper the quantum
field theory in which the everyday World
subv he's got lots and lots of books and
papers very interesting if you're
interested in this and he argues that um
the world we live in say around the
solar system uh this core Theory
completely describes everything of
course there may be other particles and
there may be an underlying reality which
is different but our everyday experience
of Life only depends on this core Theory
and so for safety that's what we can
rest on and I believe we need to use uh
these kinds of models these kinds of
formal models to develop trusted
components which we can compose into
more powerful uh systems and so the core
components and my thinking are trusted
sensing trusted trusted computation
trusted memory trusted communication
trusted Randomness trusted raw materials
trusted actuators and trusted deletion
and destruction and by combining these
in various ways you can get trusted
tamper sensing trusted provenance of the
history of some physical object trusted
3D printers where you can actually make
things that you're guaranteed about
their structure trusted manufacturing
trusted Supply chains trusted networking
trusted energy encryption hiding energy
in a way that an adversary can't extract
it trusted robots uh let me just go
through a few of these quickly uh how do
we get a physical material in a known
State you know we've seen you can insert
back doors into things you can hide them
in chips you can do put them all over
the place what if your raw materials
have have little Nanobots hidden inside
them or something like that well
fortunately from the laws of physics
turns out we know what the strongest
chemical bond is something called
protonated hydrogen dinitrogen I don't
know what that is uh steel melts at
1500° tungsten melts at 3,000 de all IAL
structure is destroyed by 10,000 de so
if you really really really want to be
sure just melt whatever it is at a high
temperature and now you've got something
in a known State and you can build up
from that uh similarly for fluids and
gases you can distill them in different
ways um to have a device where you're
sure no one has attacked it know today's
computers are quite vulnerable and
somebody could sneak in and uh you know
read your hard drive or whatever um you
need something uh you need anti-tamper
systems and here's an example of one
which which I think is quite nice you
incase whatever the thing you're trying
to protect uh in a a container and you
have a radio transmitter and a radio
receiver inside that container and it
learns the signature whatever is in that
container and if it that signature
changes that means something has
happened there's some kind of a a
tampering going on and in their
experiments they can detect a 16 mm
insertion of a needle with a diameter is
.1 mm so that's an example of with a
very simple first order thing you can
get detect very subtle uh attempts at
attacks uh Apple has been building this
apple secure Enclave into all of their
products Macs MacBooks iPads iPhone
Apple watch Apple TV homad that has all
the elements you really need for good
cryptography it has a truly random uh
physical random number generator on it
it has a unique ID which is not you
can't read off the chip for identity it
has Hardware encryption has encrypted
memory and it has some level of tamper
protection zeroz I think is a critical
technique it's actually back from the
1960s which is the idea that if you
detect tampering you delete your
cryptographic keys and so that means if
you got a system with sensitive
important information in it and an
attacker tries to attack it you probably
can't prevent them from blowing it up or
cutting it open but if you detect any
tampering you can delete the
cryptographic keys and so all the
information in it is useless and the
adversary can't take over the system and
so that's a very important uh primitive
property uh of this and you can do
similar things for things which take
physical actions like if you have a
robot you could have fuses and its
actuators that get blown in the case of
a tamper detection or in a biological
laboratory for example you could have
acid that gets mixed into the biological
samples under detecting of of tampering
so by composing these pieces you can
build up much more uh interesting and
complex pieces of Hardware if you
combine tamper sensing with zeroz you
get something we call approvable
contract this is a little module little
device that can do comp secure
computation which is guaranteed to be
the computation you think it is can do
provably correct cryptography and
communicate with other approvable
contracts and if anybody tries to attack
it or open it it deletes the keys and so
it's totally secure in that sense out of
those you can build proably secure
networks by combining provable contracts
with information theoretic cryptography
you can do provable uh provenance you
can make provable robots that only do
exactly what you think they're going to
do that nobody can get in there and and
take it over you can have provable
materials you can do provable
manufacturing and you can build provable
Supply chains and so using these kinds
of ideas you can build up an an
infrastructure human infrastructure
which is not vulnerable to attack by
even the most powerful
agis so uh using once you have that
Hardware you can use it to do new kinds
of social mechanisms as well and we're
just barely beginning to think about
this area I think there's huge
opportunity um so here are just a few
examples of things we do today and what
the new uh approvable version of it
might look like so today we have voting
but voting is insecure people are always
saying you know people are cheating on
the ballots and they're counting them
wrong and so on uh using those provable
contracts we can have proven aggregation
of uh individual semantic preferences in
in a sort of guaranteed way today we
have surveillance you know surveillance
cameras and either you have privacy or
your all your information is available
to everyone uh in this world of provable
Hardware you can create controlled
sensing say cameras which provably do
not reveal any information about what
they're looking at unless say they see a
gun and then they reveal it for that
would be an example you could do today
many people are calling for human in the
loop particularly around autonomous
weapons and so on the trouble is as we
talked about humans are vulnerable to
manipulation threats and bribery and so
on so I don't think we want humans in
the loop uh I rather shift it to the
human creates the loop so humans decide
what the rules are for operating
something but you don't want a human in
the middle of it while things are
happening I believe uh today we have
humans running factories but then the
humans are manipul manipulable uh
instead we need provable provable robots
maybe combined with human teleoperation
then we can have Factory spaces where we
have absolute guarantees that they're
doing what they're supposed to be doing
uh today we have biometric identity um
we can uh extend that to provable
contract identity with provenance today
we have all kinds of social dilemmas and
molok and you know um we we we've got
the the um prisoners dilemma using these
provable contracts you can put joint
provable constraints over multiple
agents and so you can guarantee that
agents work together in the way that
they would like to and so I think you
can really transform the nature of
economic interaction using some of these
Technologies today we have laws but
they're only you know partially uh
enforced uh here we can have meta
contracts which are provably guaranteed
to govern uh contracts underneath them
today economics goes for profit
maximization often at the expense of
individuals uh we can design a new sort
of social benefit maximizing system that
includes the ex alities of actions today
we have arms races and uh with this type
of Technology we could have guaranteed
joint agreements so I think the
potential is there for huge benefits uh
but it certainly needs huge lots of
flushing out so what happens next if we
keep on with today's sloppy technology
and the AIS get better and different
groups you know have open source
versions I think we're going to see
increasing AI powered cyber attacks
we're going to see disruption of
critical infrastructure we're going to
have social media is going to become
even more
dehumanizing U we're going to have ai
powered crime AI powerered politics
which ignores the citizens environmental
damage and AAS to the bottom if we start
building this provable technology and
really develop the human infrastructure
in this way we can eliminate all cyber
attacks we can build reliable
infrastructure we can create media which
enhances our Humanity we can create
peace and prosperity we can have
empowered citizens can rebuild the
environment and we can go for long
longterm human flourishing so I think
the choice is clear we should be you
know we are at a fork in the road uh One
path I think does not lead to a good
outcome the other path I think
potentially is a kind of Utopia so I
would say let's choose the path of human
thriving so thank you
Voir Plus de Vidéos Connexes
How to get empowered, not overpowered, by AI | Max Tegmark
Bruce Scheiner AI and Trust
The AI Dilemma: Navigating the road ahead with Tristan Harris
AGI Before 2026? Sam Altman & Max Tegmark on Humanity's Greatest Challenge
Nick Bostrom What happens when our computers get smarter than we are?
Ilya Sutskever Breaks Silence: The Mission Behind SSI - Safe Superintelligence Explained
5.0 / 5 (0 votes)