Nicholas Carlini - Black-hat LLMs | [un]prompted 2026

unprompted

25 Mar 202626:28

Summary

TLDRIn this talk, Nicholas from Anthropic discusses the rapid advancements in language models and their growing impact on cybersecurity. He highlights how these models can autonomously find and exploit vulnerabilities in software, including critical systems like the Linux kernel. While these models hold great potential for defenders, they also pose significant risks if misused. Nicholas urges the audience to address the dual-use nature of these tools and collaborate in ensuring their responsible use. He emphasizes the importance of immediate action, as we are entering a transitionary period where the balance between attackers and defenders is rapidly changing.

Takeaways

😀 Language models are rapidly improving and can now autonomously find and exploit zero-day vulnerabilities in software, a capability that was not possible just a few months ago.
😀 Language models are becoming highly efficient and can potentially disrupt the balance between attackers and defenders in cybersecurity, possibly leading to an era of increased security threats.
😀 Models like Claude are being used to find bugs in important software, including the Linux kernel and web applications like Ghost, showcasing their ability to autonomously detect vulnerabilities.
😀 The increasing sophistication of language models in security research could make them more effective than human vulnerability researchers in finding certain types of bugs, like SQL injection and buffer overflows.
😀 The pace at which language models are improving is exponential, meaning that the capabilities of the best models today will be available to average users in a very short amount of time.
😀 Autonomous language models have already demonstrated the ability to discover complex vulnerabilities in production systems, such as blind SQL injection and heap buffer overflows in critical software.
😀 While language models are very effective at finding bugs, they are also being considered for malicious use, creating a dilemma for ensuring their responsible deployment without stifling useful applications.
😀 There's an urgent need for cybersecurity professionals and researchers to collaborate in creating safeguards to prevent malicious use of these powerful language models, balancing their defensive and offensive capabilities.
😀 Despite the increasing risks, the long-term outlook for security may improve as developers adopt safer programming languages and practices (e.g., Rust) and continue formal verification of software.
😀 The exponential progress in language models means that security professionals need to act quickly to address vulnerabilities and improve defenses, as waiting a year could result in widespread exploitation by malicious actors.
😀 The transition period, as language models evolve from useful tools to potential threats, is critical, and cybersecurity experts must work together to mitigate risks during this phase to prevent long-term damage.

Q & A

What is the main concern raised by the speaker regarding language models in security?
-The main concern raised by the speaker is that language models have become so advanced that they can autonomously identify and exploit vulnerabilities in software, which could significantly impact cybersecurity. This presents both an opportunity and a danger, as malicious actors can also exploit these models for harm.
How has the role of language models in security research changed in recent months?
-Recently, language models have significantly improved their ability to autonomously identify critical security vulnerabilities in major software systems, such as the Linux kernel and web applications. This has shifted the landscape, as these models can now find vulnerabilities faster and more effectively than many human experts.
What examples did the speaker provide to illustrate the capabilities of language models in finding vulnerabilities?
-The speaker mentioned a SQL injection vulnerability in the popular Ghost content management system, which was discovered by the model, as well as a series of heap buffer overflow vulnerabilities in the Linux kernel, which were previously difficult to find manually.
Why is the speed of language models in finding vulnerabilities a cause for concern?
-The speed at which language models can find vulnerabilities is concerning because, as they continue to improve, attackers can exploit these models to discover and exploit zero-day vulnerabilities at an unprecedented rate. This could lead to a situation where defenders are overwhelmed by the sheer volume of vulnerabilities that need to be addressed.
What does the speaker mean by the 'exponential' growth of language models?
-The 'exponential' growth refers to the rapid and consistent improvement in the capabilities of language models. The speaker highlights that models released in the past few months have been able to find vulnerabilities that older models could not, and this improvement is expected to continue at an accelerating pace.
What does the speaker suggest is necessary to address the risks posed by language models in security?
-The speaker urges the community to act quickly and contribute to securing these technologies, as the potential for misuse is high. They call for more people to help find and mitigate vulnerabilities before malicious actors can exploit them, stressing that waiting too long could have disastrous consequences.
How do language models find vulnerabilities, according to the speaker?
-Language models, such as Claude from Anthropic, are able to autonomously analyze software, identify potential vulnerabilities, and report them. The process involves running the model in a controlled environment, instructing it to find vulnerabilities, and then reviewing the reports generated by the model, which often highlight severe issues.
What is the dual-use nature of language models in security, as explained in the talk?
-The dual-use nature of language models refers to the fact that they can be used for both defensive and offensive purposes. While they can help security researchers find and fix vulnerabilities, they can also be misused by malicious actors to exploit these same vulnerabilities. This presents a challenge in how to regulate and safeguard their use.
What specific vulnerability did the speaker demonstrate in the Ghost content management system?
-The speaker demonstrated a blind SQL injection vulnerability in the Ghost CMS, where an attacker could exploit the system to gain access to sensitive information like credentials and API keys, even without authentication.
What does the speaker mean by 'the transitionary period' in cybersecurity?
-The 'transitionary period' refers to the current phase where language models are rapidly improving, but the defensive mechanisms to counteract them are not yet fully developed. The speaker is particularly concerned about this period, as it is when vulnerabilities may be exploited more easily by both defenders and attackers.