How One Line of Code Almost Blew Up the Internet

Kevin Fang
20 Feb 202313:47

Summary

TLDROn February 18, 2017, Cloudflare experienced a critical data leak due to a bug triggered by a migration to a new HTML parser. The bug exposed sensitive customer data, including passwords and cookies, and could have been exploited by attackers. After a swift response by Cloudflare’s engineers, features like email obfuscation were disabled, and the root cause—a buffer overflow due to a parsing error—was identified. Despite the bug’s narrow conditions, the incident highlighted the risks of migrating legacy systems and the importance of robust testing and monitoring to ensure security in large-scale systems.

Takeaways

  • 😀 A severe data leak at Cloudflare was discovered on February 18, 2017, due to a bug in their system.
  • 🛠️ The issue was traced back to Cloudflare’s migration to a new HTML parser, which caused memory overruns and data exposure.
  • 🔐 Sensitive data, such as cookies, passwords, and keys, was accidentally leaked due to the bug in Cloudflare's system.
  • 🚨 The bug was discovered by a researcher from Google's Project Zero team, who contacted Cloudflare about the issue.
  • ⏳ Cloudflare engineers worked swiftly to disable affected features globally, but data cached by search engines was still at risk.
  • 🌍 The bug affected a small percentage of Cloudflare’s websites (0.6%), but could have had much larger consequences.
  • 💡 The bug was triggered by a specific interaction between the old and new parsers, involving unfinished HTML attributes.
  • ⚠️ The migration from an older, stable parser to a new one without a backward compatibility mechanism led to the issue.
  • 🔄 Cloudflare engineers worked through the night to identify and disable the problematic features, deploying a global fix within hours.
  • 🔎 The root cause of the issue was an overrun caused by the improper handling of unfinished HTML attributes at the end of pages.
  • 🔧 The incident underscores the importance of maintaining backwards compatibility, thorough testing, and fuzzing in software development.

Q & A

  • What was the issue that Cloudflare engineers discovered on February 18, 2017?

    -Cloudflare engineers discovered a severe data leak issue related to their system. The problem was caused by certain features of their service, which inadvertently exposed sensitive customer data, including cookies, keys, passwords, and full HTTPS requests.

  • How did the issue with Cloudflare's system come to light?

    -The issue was discovered by a Google engineer working in Project Zero, a security research team. The engineer identified the problem and reached out to Cloudflare, informing them of the potential data leak.

  • What is Cloudflare's primary service, and how does it work?

    -Cloudflare's primary service is its Content Delivery Network (CDN), which caches content across various edge servers worldwide to speed up the delivery of internet content. It reduces load times by serving content from the nearest server to the user.

  • What specific features of Cloudflare’s system caused the data leak?

    -The data leak was caused by the email obfuscation feature, automatic HTTP rewrites, and server-side excludes. These features interacted with Cloudflare’s new HTML parser, which led to the data exposure.

  • What was the initial response from Cloudflare engineers after discovering the data leak?

    -Upon learning about the leak, Cloudflare engineers quickly disabled the email obfuscation feature globally by flipping the global kill switch at 5:22 PM PST, less than an hour after the issue was reported.

  • How did Cloudflare engineers identify the root cause of the bug?

    -The engineers found that the bug was triggered during the migration from an old HTML parser (Ragel) to a new one. The issue occurred due to the interaction of both parsers, where an unfinished tag at the end of a web page could cause a memory overrun, exposing sensitive data.

  • Why was the issue not detected by Cloudflare's own monitoring systems?

    -Cloudflare's monitoring systems did not self-detect the issue, and it took an external party (the Google engineer) to identify the problem. This highlights a potential gap in their internal detection capabilities.

  • What steps did Cloudflare take to address the bug after the initial discovery?

    -Cloudflare engineers worked through the night, disabling the problematic features and deploying global kills for the affected functionalities. They also worked to purge cached data from search engines and continue to investigate the root cause.

  • What is Ragel, and how did it contribute to the bug?

    -Ragel is a parser language used by Cloudflare to parse HTML content. The bug occurred because the new HTML parser (cf-html) did not include the empty last buffer, which was present in the old parser, causing issues when parsing unfinished tags at the end of a page.

  • What are some of the lessons learned from this data leak incident?

    -The incident highlights the importance of maintaining backwards compatibility when making changes to software, especially when old and new systems interact. It also underscores the need for rigorous testing, including fuzzing, static code analysis, and memory management to prevent similar issues.

  • What was the overall impact of the bug, and how did Cloudflare manage it?

    -The impact was relatively small, as the bug required very specific conditions to manifest. Cloudflare worked with search engines to purge cached data and conducted extensive testing to ensure the issue was resolved. There was no evidence of the bug being exploited for attacks.

Outlines

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Mindmap

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Keywords

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Highlights

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Transcripts

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now
Rate This

5.0 / 5 (0 votes)

Related Tags
CloudflareSecurity BugData LeakTech IndustryCloud SecurityIncident ResponseProject ZeroRagel ParserHTTPS IssueEngineering TeamWeb Vulnerabilities