Real men test in production… The truth about the CrowdStrike disaster

Fireship
22 Jul 202405:56

Summary

TLDRIn a recent incident reminiscent of the Y2K bug, millions of Windows machines crashed due to a faulty update from cybersecurity firm CrowdStrike, affecting 8.5 million devices. The video explores possible causes, from a simple coding error to a potential cyber attack or conspiracy. It delves into technical details, revealing a logic error in a channel file update that led to the system crash, and discusses the implications of running critical software without stringent quality control measures. The video also humorously speculates on other theories, including a multi-dimensional plot and the promotion of a different programming language for driver development.

Takeaways

  • 💻 On July 22, 2024, millions of Windows machines crashed due to a faulty update from cybersecurity firm CrowdStrike, affecting 8.5 million devices.
  • 🔄 The incident is eerily similar to a 2010 McAfee antivirus update that caused a widespread outage, with the same CTO, George Kurtz, involved in both events.
  • 👷‍♂️ CrowdStrike's software, Falcon Sensor, operates in the privileged 'ring zero' space, typically reserved for Microsoft, and requires a special certification from Microsoft.
  • 🛑 The crash was caused by a logic error in an update to a configuration file, leading to a system-wide failure, which is unusual for application crashes.
  • 👨‍💻 A professional C++ programmer hypothesized that the issue was due to a null pointer dereference, a common coding mistake that should have been caught.
  • 🔍 The community noted that the code may have been flawed for a while, and the problematic configuration file update was the final straw that exposed the issue.
  • 🚫 The incident highlights the importance of robust quality assurance processes in software development, especially for critical systems.
  • 💡 The video suggests that the root cause of the disaster was likely a lack of quality control within CrowdStrike, rather than a single developer's error.
  • 🕵️‍♂️ Conspiracy theories suggest that the crash was either a foreign spy's infiltration, a rogue employee's message, or a pre-planned test for a future cyber attack.
  • 🌐 The video also touches on the idea that the world economic forum has predicted a worldwide cyber attack, and CrowdStrike's incident might be connected.
  • 🎓 The sponsor, Brilliant, is highlighted as a platform for learning problem-solving skills essential for programming and overcoming complex challenges in software development.

Q & A

  • What caused millions of Windows machines to go down recently?

    -A bad update from cybersecurity firm CrowdStrike caused millions of Windows machines to go down.

  • How many devices were affected by the CrowdStrike update issue?

    -8.5 million devices were affected by the CrowdStrike update issue.

  • Who was the CTO of McAfee during the 2010 incident, and what is his current position?

    -The CTO of McAfee during the 2010 incident was George Kurtz, who is now the CEO of CrowdStrike.

  • What specific mistake did the CrowdStrike update make that caused the system crashes?

    -The CrowdStrike update contained a logic error in Channel file 291, which caused the system crashes.

  • What is the role of the CrowdStrike Falcon sensor?

    -The CrowdStrike Falcon sensor is software that runs in the background on machines, looking for potential security anomalies and executing code via a driver.

  • What mode does the CrowdStrike software run in, and why is this significant?

    -The CrowdStrike software runs in ring zero, or kernel mode, which is the most privileged zone around the CPU usually reserved for process scheduling and direct hardware access.

  • What certification must third-party code have to run in kernel mode on Windows, and did CrowdStrike have this certification?

    -Third-party code must have WHQL certification from Microsoft to run in kernel mode on Windows, and the CrowdStrike driver was WHQL certified.

  • What was the hypothesis of a professional C++ programmer about the cause of the CrowdStrike issue?

    -The hypothesis was that an engineer coded up a null pointer trying to access a memory address that doesn't exist, a rookie coding mistake that could have been fixed with an if statement.

  • What deeper conspiracy theories have emerged regarding the CrowdStrike incident?

    -Some conspiracy theories suggest it was the work of a foreign spy, a rogue employee, or a pre-planned event by the World Economic Forum to test for a real cyber attack in 2026.

  • What lesson about quality control and organizational failures can be learned from the CrowdStrike incident?

    -The incident highlights the importance of multiple layers of protection, quality assurance, continuous integration, and staggered rollouts to prevent such failures from reaching production.

Outlines

00:00

🔍 Y2K Revisited: CrowdStrike's Major Update Failure

Last Friday, millions of Windows machines were affected by a faulty update from cybersecurity firm CrowdStrike, reminiscent of the Y2K experience. The update impacted 8.5 million devices, leading to theories about whether it was a simple mistake, a cyber attack cover-up, or a long-planned false flag operation. The incident echoed a similar event in 2010 when a McAfee update caused a major disruption, with George Kurtz, the CEO of CrowdStrike, having been McAfee's CTO at that time. This coincidence raises questions about systemic failures and the accountability of such high-profile executives.

05:02

🛠️ Understanding the Technical Failures of CrowdStrike

CrowdStrike's recent failure was linked to an update in their Falcon sensor software, specifically a logic error in Channel file 291. Unlike typical application crashes, this incident caused system-wide failures because the software operates in the CPU's privileged ring zero. The fault lay in the driver code, certified by Microsoft, which encountered issues with the updated config file, leading to a widespread crash. An analysis by a C++ programmer suggested a simple coding error might have been the root cause, highlighting deficiencies in the company's quality control and the need for better safeguards in critical software systems.

💡 Root Causes and Conspiracy Theories Surrounding the Incident

The CrowdStrike incident revealed broader organizational failures, with inadequate quality control and potentially deeper issues. While some blamed individual developers, the problem seemed to stem from systemic lapses in testing and deployment processes. Speculation ranged from a foreign spy infiltrating the company to theories about pre-planned cyber attacks linked to the World Economic Forum. The incident underscored the importance of rigorous quality assurance in software development, especially for products operating in critical system areas.

📅 Future Cybersecurity Threats and the Importance of Robust Driver Development

Looking ahead, some conspiracy theories suggest this incident was a prelude to a larger cyber attack predicted for August 12, 2026. Despite the outlandish nature of these claims, they emphasize the need for robust development practices. The video promotes Brilliant, an educational platform that helps users develop problem-solving skills crucial for programming, offering a way to enhance one's ability to tackle complex software challenges effectively.

Mindmap

Keywords

💡Y2K experience

The term 'Y2K experience' refers to the widespread fear and anticipation of computer failures at the turn of the year 2000 due to the 'Year 2000 problem' or 'Y2K bug'. In the video, it is used metaphorically to describe the recent widespread failure of Windows machines, drawing a parallel to the Y2K fears.

💡CrowdStrike

CrowdStrike is a cybersecurity technology company. In the video, it is mentioned as the source of a bad update that caused millions of Windows machines to go down, highlighting the company's significant role in the recent incident discussed.

💡McAfee Antivirus

McAfee Antivirus is a well-known computer security software. The video script references an incident in 2010 where a McAfee update caused a similar widespread computer failure, drawing a connection between past and present events and noting George Kurtz's role in both companies.

💡Windows service host

The Windows service host is a critical system process in Windows operating systems. In the video, it is mentioned as being accidentally removed by a McAfee update in 2010, leading to widespread system failures, illustrating the importance of this component in system stability.

💡Blue screen of death

The 'blue screen of death' (BSOD) is an error screen displayed when Windows encounters a critical system error. The video uses this term to describe the outcome of the McAfee and CrowdStrike incidents, indicating severe system crashes.

💡George Kurtz

George Kurtz is the CEO of CrowdStrike and formerly the CTO of McAfee. The video script highlights his connection to both companies during major system failures, suggesting a pattern of issues related to his leadership.

💡Ring zero

In the context of operating systems, 'ring zero' refers to the most privileged mode of processor operation, typically reserved for the kernel. The video explains that CrowdStrike's software operates in ring zero, which is unusual for third-party software and contributed to the severity of the system crash.

💡WHQL certification

WHQL (Windows Hardware Quality Labs) certification is a Microsoft program that verifies the reliability and stability of hardware and drivers. The video mentions that CrowdStrike's driver had this certification, which raises questions about the certification's effectiveness in preventing such failures.

💡Null pointer

A 'null pointer' in programming is a pointer that does not point to any valid memory location. The video suggests that a null pointer access might have been the cause of the CrowdStrike update failure, indicating a common but critical coding error.

💡Configuration file

A configuration file contains settings and parameters for software. In the video, it is suggested that an error in parsing a configuration file may have led to invalid pointers, contributing to the system crash.

💡Organizational failure

The term 'organizational failure' refers to a breakdown in processes or systems within an organization that leads to negative outcomes. The video concludes that the CrowdStrike incident was likely due to poor quality control and organizational practices, rather than the actions of a single individual.

💡World Economic Forum

The World Economic Forum is an international organization known for its meetings and initiatives on global issues. The video mentions it in the context of conspiracy theories suggesting that the CrowdStrike incident was a test run for a larger, pre-planned cyber attack, linking it to the organization's partnerships and predictions.

Highlights

On July 22nd, 2024, millions of Windows machines went down due to a bad update from cybersecurity firm CrowdStrike.

8.5 million Windows machines were affected by the update.

The incident is reminiscent of a 2010 McAfee Antivirus update that caused a similar issue.

George Kurtz, the CTO of McAfee in 2010, is now the CEO of CrowdStrike.

CrowdStrike released an official statement explaining the technical details of the incident.

The CrowdStrike Falcon sensor is software that runs in the background, looking for security anomalies.

A logic error in an update to channel file 291 caused the system to crash.

CrowdStrike operates in ring zero, the most privileged zone around the CPU.

The CrowdStrike driver was WHQL certified, allowing it to run in ring zero.

A professional C++ programmer hypothesized that the issue was due to a null pointer access.

The driver code has potentially been broken for a long time, with the config file being the final straw.

The incident might not have been caught due to a lack of quality control and organizational failure.

Colonel Kurtz is known for testing in production and is willing to take risks.

CrowdStrike sells a very expensive product that few understand, prioritizing sales over software engineering.

There are theories suggesting the incident was not accidental but the work of a foreign spy or rogue employee.

Some believe the failure was pre-planned as a test run for a real cyber attack scheduled for 2026.

The video sponsor, Brilliant, offers a platform to develop problem-solving skills in programming.

Transcripts

play00:00

last Friday the world finally got the

play00:01

Y2K experience it deserved when millions

play00:04

of Windows machines went down thanks to

play00:06

a bad update from cyber security firm

play00:08

crowd strike 8.5 million to be exact but

play00:11

now the plot is thickened and multiple

play00:12

theories for why this actually happened

play00:14

have emerged a was it just a silly

play00:16

mistake B was it actually a Cyber attack

play00:19

being covered up or C was it a false

play00:21

flag planned centuries ago by our

play00:23

multi-dimensional lizard overlords in

play00:24

today's video we'll try to find out what

play00:26

really happened by taking a deep dive

play00:28

into the technical details but first

play00:30

here's a crazy detail you need to know

play00:32

on April 21st 2010 at approximately

play00:34

1,400 hours a McAfee Antivirus update

play00:37

accidentally removed the windows service

play00:39

host file and knocked millions of

play00:40

computers running Windows XP off the

play00:42

internet causing many of them to go into

play00:44

an endless reboot loop the blue screen

play00:46

of death shut down critical services

play00:48

around the world that was 15 years ago

play00:50

when Justin Bieber was only 16 years old

play00:52

but it's nearly identical to the

play00:53

crowdstrike disaster going on right now

play00:55

here's the crazy part though the CTO of

play00:57

McAfee in 2010 was none other than

play00:59

George kurts the CEO of crowd strike

play01:02

today that's quite the example of

play01:03

failing upwards now he did just lose

play01:05

$300 million in paper wealth but most

play01:08

importantly we now know the embarrassing

play01:10

truth about how the crowd strike

play01:11

disaster actually happened almost it is

play01:14

July 22nd 2024 and you watching the code

play01:16

report the creator of C++ be straup once

play01:20

said C++ makes it harder to shoot

play01:22

yourself in the foot but when you do you

play01:24

blow your entire leg off and we should

play01:26

have listened to him crowd strike

play01:27

released an official statement

play01:28

explaining what happened come on you

play01:30

guys there it is right there in front of

play01:32

you the whole time you're dereferencing

play01:35

a m pointer open your eyes the crowd

play01:38

strike Falcon sensor is software that

play01:40

sits in the background on your machine

play01:42

looking for potential security anomalies

play01:44

it contains a driver which is the thing

play01:46

that actually executes code along with a

play01:48

bunch of Channel files which are

play01:50

basically just config files that contain

play01:52

rules about new potential attacks that

play01:54

the sensor can look for these files are

play01:55

not kernel drivers and can be updated on

play01:57

the Fly and when crowd strike pushed an

play01:59

update to channel file 291 a logic error

play02:02

caused the entire system to crash now

play02:04

normally when an application crashes it

play02:06

only breaks that application running in

play02:08

user land or ring three in the CPU

play02:10

protection ring no blue screen of death

play02:12

required but crowd strike is a unique

play02:14

piece of software that runs within ring

play02:16

zero or kernel mode the most privileged

play02:18

Zone around the CPU usually reserved for

play02:21

process scheduling and direct Hardware

play02:22

access ring zero is an area that

play02:24

normally only microsof is are allowed to

play02:26

touch and in order for any third party

play02:28

to run code here they must receive a

play02:30

whql certification from Microsoft to

play02:33

verify that your code won't Breck 8.5

play02:35

million devices and shut down the global

play02:37

economy the crowd strike driver was whql

play02:40

certified so it sounds like it's

play02:41

Microsoft's fault well not so fast

play02:44

what's unique about crowd strike is that

play02:45

they can make updates to those config or

play02:47

Channel files dynamically in this case

play02:49

the driver had some kind of issue

play02:50

reading Channel file 291 causing the

play02:53

entire system to fail that's pretty much

play02:55

all the detail we have from official

play02:56

sources but luckily there's a guy on the

play02:58

internet who's a professional C++

play03:00

programmer and provided a breakdown that

play03:02

went viral his hypothesis was that this

play03:04

was a skill issue where some engineer

play03:06

coded up a n pointer trying to access a

play03:08

memory address that doesn't exist a

play03:10

simple rookie coding mistake that could

play03:12

have been fixed with an if statement

play03:14

this tweet got a lot of traction but

play03:15

since then it's been Community noted and

play03:17

another security researcher explains

play03:20

that this code is reading pointers from

play03:21

a table in a loop and some are invalid

play03:23

perhaps an error parsing the

play03:24

configuration file left some entries

play03:26

uninitialized what's kind of crazy here

play03:28

is that it looks like the driver code

play03:30

has actually been broken for a long time

play03:32

and this one config file was the straw

play03:33

that broke the camel's back we may not

play03:35

know the full truth until there's a

play03:36

congressional hearing but it looks like

play03:38

some developer there wrote some bad code

play03:40

said works on my machine but then made

play03:42

the horrible mistake of deploying on a

play03:43

Friday but we can't blame this one

play03:45

person programmers write bad code all

play03:47

the time but a failure like this should

play03:49

never reach production the Falcon sensor

play03:51

is not just some crappy to-do list app

play03:53

when software operates in the critical

play03:54

path like this there should be multiple

play03:56

layers of protection quality assurance

play03:58

continuous integration this staggered

play04:00

rollouts and so on it's absolutely

play04:02

insane that this wasn't caught by some

play04:03

automated process before it killed 8.5

play04:06

million computers heads need to roll for

play04:08

this but it's not the person who wrote

play04:09

the code it's an organizational failure

play04:11

and it's not the first time Colonel

play04:12

Curts has been connected to a worldwide

play04:14

outage he knows that real men test in

play04:16

production and is willing to die on that

play04:18

Hill the thing is this company sells a

play04:20

very expensive product that very few

play04:22

people understand and if you want to

play04:23

have an exotic car collection like this

play04:25

your Enterprise sales team is your

play04:26

highest priority not your software

play04:28

engineering team those nerds therefore

play04:30

the most likely root cause of This

play04:31

Disaster is just a lack of quality

play04:33

control at the company crowd strike but

play04:35

another theory floating around is that

play04:37

this wasn't an accident but actually the

play04:39

work of a foreign spy who infiltrated

play04:41

the company or perhaps a rogue employee

play04:43

who wanted to send a message a message

play04:45

that is time to switch to the Russ

play04:46

programming language for Windows driver

play04:48

development but the conspiracy theories

play04:49

go even deeper and some think this

play04:51

failure is so egregious that it was

play04:53

actually pre-planned in advance the

play04:54

world economic Forum has made

play04:56

predictions about a worldwide Cyber

play04:58

attack and crowd strike is a World

play04:59

economic Forum partner this was all just

play05:01

a test run for the real Cyber attack

play05:03

scheduled to happen on August 12th 2026

play05:06

most of us will already be dead by then

play05:07

but if your goal is to write robust

play05:09

Colonel drivers on Windows you'll need

play05:10

to know how to problem solve like a

play05:12

programmer and you can start doing that

play05:13

for free thanks to this video sponsor

play05:15

brilliant problem solving is a skill

play05:17

that you keep forever brilliant's

play05:19

platform will introduce you to essential

play05:20

programming Concepts but most

play05:22

importantly the handson exercises will

play05:24

develop your brain to recognize and

play05:26

solve complex problems that developers

play05:29

need to over come on a daily basis best

play05:31

of all every lesson is concise and

play05:33

rewarding by investing just a few

play05:34

minutes each day you'll develop habits

play05:36

that can level up your programming

play05:37

skills for the rest of your life and you

play05:39

can do it anywhere even from your phone

play05:41

to try everything brilliant has to offer

play05:43

for free for 30 days visit brilliant.org

play05:46

fireship or scan this QR code for 20%

play05:49

off their premium annual subscription

play05:51

this has been the code report thanks for

play05:53

watching and I will see you in the next

play05:55

one

Rate This

5.0 / 5 (0 votes)

Связанные теги
CrowdStrikeWindows CrashCybersecurityUpdate IssueY2K ExperienceTechnical AnalysisSoftware FailureC++ ProgrammingGlobal ImpactConspiracy TheoryQuality Control
Вам нужно краткое изложение на английском?