CrowdStrike IT Outage Explained by a Windows Developer

Dave's Garage
21 Jul 202413:40

Summary

TLDRDave, ein ehemalige Software-Entwickler von Microsoft, erklärt in seinem Video die Ursache für die weltweit auftretenden 'CrowdStrike Blue Screens'. Er erläutert, warum CrowdStrike-Software auf den Maschinen ist, was passiert, wenn ein Kernel-Treiber wie CrowdStrike fehlschlägt, und warum diese spezifische Aktualisierung so viel Havoc verursacht hat. Dave bietet Lösungen an, wie man mit einem solchen Absturz umgeht, und teilt sein Wissen über Kernel-Modus und User-Modus mit, um das Problem besser zu verstehen.

Takeaways

  • 👋 Dave ist ein ehemalige Software-Engineer von Microsoft und gibt in seinem Video einen Überblick über das CrowdStrike-Problem.
  • 🛠️ CrowdStrike ist eine Sicherheitssoftware, die oft auf Maschinen installiert ist, um Malware-Angriffe zu erkennen und zu verhindern.
  • 💻 Das Problem entstand durch ein fehlerhaftes Update der CrowdStrike-Software, was zu globalen Blue Screens führte.
  • 🔍 Dave erklärt die Unterschiede zwischen Kernel-Modus und Benutzermodus und warum Bugs im Kernel-Modus das gesamte System abstürzen lassen.
  • 🛡️ Kernel-Modus ist für die Ausführung von Code im direkten Zugriff auf Systemressourcen und Hardware verantwortlich.
  • 🚫 Benutzermodus-Code darf niemals im Kernel-Modus ausgeführt werden, da dies zu schwerwiegenden Systemproblemen führen kann.
  • 🔧 CrowdStrike-Falcon ist ein Produkt, das im Kernel-Modus läuft, um Anwendungsverhalten zu analysieren und Angriffe zu erkennen.
  • 📄 Microsoft bietet die WHQL-Zertifizierung für Treiber an, um sicherzustellen, dass sie robust und vertrauenswürdig sind.
  • 🚨 CrowdStrike hat möglicherweise ein Update verteilt, das unsignierte PE-Code im Kernel-Modus ausführt, was zu Instabilitäten führen kann.
  • 🤔 Ein möglicher Grund für den Absturz könnte eine fehlende Ausreichende Überprüfung und Validierung der Parameter im CrowdStrike-Treibercode sein.
  • 🔄 Um das Problem zu beheben, kann man das fehlerhafte CrowdStrike-Update in den System32-Treibern-Ordner entfernen und das System neu starten.

Q & A

  • Was ist das Hauptthema des Videos?

    -Das Hauptthema des Videos ist das Problem mit der CrowdStrike-Software, insbesondere warum sie auf den Computern installiert ist, was passiert, wenn ein Kerneltreiber wie CrowdStrike scheitert, und warum der CrowdStrike-Code die Computer zum Absturz bringt.

  • Wer ist Dave und was hat er mit dem Thema zu tun?

    -Dave ist ein in Microsoft pensionierter Softwareentwickler, der Erfahrungen mit Windows-Entwicklung hat. Er erklärt in seinem Video, was der CrowdStrike-Bluescreen tatsächlich ist und wie man es beheben kann.

  • Was ist der Unterschied zwischen Kernelmodus und Benutzermodus?

    -Der Kernelmodus wird für den Betriebssystem-Code und Gerätetreiber verwendet, der direkt auf der Hardware agiert. Der Benutzermodus wird von Anwendungen verwendet, die nie im Kernelmodus ausgeführt werden. Ein Absturz im Kernelmodus führt zu einem Systemabsturz, während ein Anwendungsabsturz nur die Anwendung selbst beeinträchtigt.

  • Was ist CrowdStrike Falcon und warum ist es im Kernelmodus?

    -CrowdStrike Falcon ist ein Sicherheitsprodukt, das sich nicht nur auf Antiviren-Schutz beschränkt, sondern auch auf die Analyse von Anwendungsverhalten. Um diese Analyse durchzuführen, muss es im Kernelmodus laufen, um uneingeschränkten Zugriff auf Systemdatenstrukturen zu haben.

  • Was ist die Bedeutung von WHQL-Zertifizierung für Treiber?

    -WHQL (Windows Hardware Quality Labs) ist eine Zertifizierung, die besagt, dass ein Treiber von Microsoft als kompatibel mit dem Windows-Betriebssystem angesehen wird. Treiber, die diese Zertifizierung haben, wurden gründlich getestet und sind robust und vertrauenswürdig.

  • Warum ist es riskant, PE-Code im Kernelmodus auszuführen?

    -Ausführen von PE-Code (Portable Executable) im Kernelmodus ist riskant, weil dieser Code nie signiert oder gründlich getestet wurde. Ein kleiner Fehler im Code kann zum Absturz des gesamten Systems führen.

  • Was passiert, wenn ein Kerneltreiber wie CrowdStrike scheitert?

    -Wenn ein Kerneltreiber scheitert, führt dies zu einem Systemabsturz, da der Kernel im Kernelmodus läuft und kritische Systemfunktionen steuert. Ein Fehler im Kernel kann das gesamte System destabilisieren.

  • Wie kann man einen Computer, der aufgrund des CrowdStrike-Problems abstürzt, reparieren?

    -Man muss den Computer im Safe Mode starten, um auf den Systempfad zuzugreifen und die fehlerhafte CrowdStrike-Datei zu löschen. Nach dem Neustart sollte das System normal funktionieren.

  • Was ist der Unterschied zwischen einem Boottreiber und einem anderen Gerätetreiber?

    -Ein Boottreiber ist ein Gerätetreiber, der zum Starten des Windows-Betriebssystems erforderlich ist. Sie werden normalerweise mit Windows geliefert und beim ersten Start automatisch installiert. CrowdStrike hat seinen Treiber als Boottreiber markiert, was bedeutet, dass das System ohne ihn nicht starten kann.

  • Was ist das Ziel von Daves Video?

    -Das Ziel von Daves Video ist es, das CrowdStrike-Problem zu erklären und zu helfen, Menschen zu verstehen, warum ihre Computer abstürzen und wie sie das Problem beheben können.

Outlines

00:00

😀 Einführung in CrowdStrike-Problem

Dave, ein ehemaliger Softwareentwickler von Microsoft und heutiger Sanitärinstallateur, stellt sich vor und erklärt das CrowdStrike-Blue-Screen-Problem. Er erinnert an seine Erfahrungen als Windows-Entwickler und erklärt, dass die Blue-Screens durch einen fehlerhaften Update von CrowdStrike-Software verursacht wurden. Dave will die Funktionsweise von CrowdStrike-Software, den Unterschied zwischen Kernel-Modus und User-Modus und die Ursache für das Absturz-Problem erklären. Er erzählt von seiner Zeit bei Microsoft, wo er mit Abstürzen gearbeitet hat und wie man solche Probleme debuggt hat, einschließlich der Verwendung von Debuggern und der Analyse von Absturzberichten.

05:00

🔧 Unterschied zwischen Kernel-Modus und User-Modus

Dave erklärt die grundlegenden Unterschiede zwischen Kernel-Modus und User-Modus im Betriebssystem. Er beschreibt, wie der Kernel-Modus für die Verwaltung von Kernfunktionen wie Hardwarezugriff, Speichermanagement und Thread-Scheduling zuständig ist und wie User-Modus für die Anwendungsausführung vorgesehen ist. Er betont die Bedeutung von Kernel-Code, der niemals im User-Modus ausgeführt wird und wie ein Absturz im Kernel-Modus einen Systemabsturz auslöst. Dave diskutiert auch die Rolle von Treibern im Kernel-Modus und wie das CrowdStrike-Falcon-Produkt, das als Sicherheitstool für Server dient, in den Kernel-Modus läuft, um Anwendungsverhalten zu analysieren und Angriffe zu erkennen.

10:02

🛠 Problembehandlung für CrowdStrike-Fehler

Dave geht auf die Problembehandlung für das CrowdStrike-Blue-Screen-Problem ein. Er erklärt, dass das CrowdStrike-Fehler durch einen fehlerhaften Download eines dynamischen Definitions-Datei als '.Cy'-Datei verursacht wurde, die nur Nullen enthalten sollte. Dies hat dazu geführt, dass die CrowdStrike-Treiber, die für die Verarbeitung dieser Updates zuständig sind, einen Absturz verursachen, da sie keine ausreichende Parametervalidierung haben. Dave empfiehlt, das betroffene CrowdStrike-Update im Safe-Mode zu löschen, um das Problem zu beheben. Er betont, dass dies keine weiteren Probleme verursachen sollte, da das Update vermutlich nie benötigt wird. Schließlich nutzt Dave die Gelegenheit, um seine neue Buch-Publikation zu bewerben und seine Kanal-Abonnenten zu danken.

Mindmap

Keywords

💡Crowd Strike

Crowd Strike ist ein Sicherheitssoftwarehersteller, der in dem Video als Ursache für 'Blue Screen'-Probleme genannt wird. Im Kontext des Videos ist Crowd Strike für die Entwicklung eines Falconsensors verantwortlich, der in Kernel-Modus läuft, um Anwendungsverhalten zu analysieren und Angriffe zu erkennen. Das Problem entsteht, wenn ein Update für Crowd Strike fehlerhaft ist und zu Systemabstürzen führt.

💡Blue Screen

Ein 'Blue Screen' (auch als 'BSOD' - Blue Screen of Death bekannt) ist ein Absturzbildschirm, der in Windows-Betriebssystemen erscheint, wenn ein schwerwiegender Fehler auftritt. Im Video wird erklärt, dass die 'Blue Screen'-Probleme durch einen defekten Crowd Strike-Update verursacht wurden.

💡Kernel-Modus

Der 'Kernel-Modus' ist ein privilegierter Modus in Betriebssystemen, in dem Kernfunktionen und Gerätetreiber ausgeführt werden. Im Video wird erläutert, dass der Crowd Strike-Falconsensor im Kernel-Modus läuft, um auf Systemebene Daten zu sammeln und Schutz zu bieten, was jedoch zu Problemen führen kann, wenn ein Fehler im Code vorliegt.

💡Device Driver

Ein 'Device Driver' ist ein Programm, das das Betriebssystem anweist, wie es mit bestimmten Hardwarekomponenten interagieren soll. Im Video wird Crowd Strike als Device Driver implementiert, um im Kernel-Modus zu arbeiten, was zu den Abstürzen führen kann, wenn der Treiber fehlerhaft ist.

💡Boot Driver

Ein 'Boot Driver' ist ein spezieller Typ von Device Driver, der zum Starten des Betriebssystems benötigt wird. Im Video wird erwähnt, dass Crowd Strike ihren Treiber als Boot Driver markiert hat, was bedeutet, dass das System möglicherweise nicht starten kann, wenn der Treiber fehlerhaft ist.

💡WHQL Certification

Die 'WHQL Certification' (Windows Hardware Quality Labs) ist eine Zertifizierung, die von Microsoft angeboten wird, um die Qualität und Kompatibilität von Gerätetreibern zu gewährleisten. Im Video wird darauf hingewiesen, dass Crowd Strike möglicherweise nicht die WHQL-Zertifizierung für ihre Updates durchlaufen, um schneller Schutzupdates zu verteilen.

💡Parameter Validation

Die 'Parameter Validation' bezieht sich auf den Prozess, bei dem überprüft wird, ob die Daten und Argumente, die an eine Funktion übergeben werden, gültig sind. Im Video wird kritisiert, dass der Crowd Strike-Treiber möglicherweise nicht ausreichend Parameter validiert, was zu Systemabstürzen führen kann.

💡Null Pointer

Ein 'Null Pointer' ist ein Programmierfehler, bei dem ein Zeiger auf eine Adresse verweist, die keine gültige Datenstruktur enthält. Im Video wird dies als mögliche Ursache für den Absturz des Crowd Strike-Treibers beschrieben, da ein Null Pointer zu einem unzugänglichen Speicherbereich dereferenziert wurde.

💡Safe Mode

Der 'Safe Mode' ist ein Modus, in dem Windows gestartet wird, um Probleme zu diagnostizieren oder zu beheben. Im Video wird beschrieben, wie man einen Computer im Safe Mode startet, um den defekten Crowd Strike-Treiber zu entfernen und den Computer wieder in Betrieb zu nehmen.

💡Postmortem Debugging

Das 'Postmortem Debugging' ist der Prozess, bei dem ein Absturzbericht (Crash Dump) analysiert wird, um die Ursache eines Programmabsturzes zu ermitteln. Im Video wird dies als Methode beschrieben, um zu verstehen, warum der Crowd Strike-Treiber abstürzt.

Highlights

Dave, a retired software engineer from Microsoft, explains the CrowdStrike blue screen issue.

CrowdStrike blue screens are a result of a bad update to CrowdStrike software.

Dave's experience with debugging blue screens at Microsoft in the 1990s.

Explanation of kernel mode and user mode, and their significance in system stability.

Kernel mode runs at a higher privilege level and a crash in this mode leads to a system crash.

The difference between kernel mode and user mode in terms of memory access and control.

The role of the CrowdStrike Falcon sensor and its necessity to run in kernel mode.

The process of WHQL certification for Windows drivers and its importance for stability.

CrowdStrike's approach to updating their driver without WHQL certification due to agility.

The risk of running unsigned code in kernel mode and its potential to cause system instability.

A postmortem debugging approach to understand the cause of the CrowdStrike issue.

The discovery that the CrowdStrike dynamic definition file was corrupted with zeros.

Lack of resilience and inadequate error checking in the CrowdStrike driver.

The designation of the CrowdStrike driver as a boot driver, causing system startup issues.

A practical guide to fixing a machine affected by the CrowdStrike issue.

Dave's new book on living a successful life on the autism spectrum.

Invitation to subscribe to Dave's channel and leave a like for the informative content.

Transcripts

play00:00

hey I'm Dave welcome to my shop I'm Dave

play00:03

plumber a retired software engineer from

play00:04

Microsoft going back to the MS DOS at

play00:06

Windows 95 days and thanks to my time as

play00:09

a Windows developer today I'm going to

play00:11

explain what the crowd strike issue

play00:12

actually is the key difference in curdle

play00:14

mode and why these machines are blue

play00:16

screening as well as how to fix it if

play00:18

you come across one now I've got a lot

play00:21

of experience working up to blue screens

play00:22

and having them set the tempo of my day

play00:25

but this Friday was a little different

play00:26

however first off I'm retired now so I

play00:28

don't debug a lot of blue screens and

play00:31

second I was traveling in New York City

play00:32

which left me temporarily stranded as

play00:34

the airlines sorted out the digital

play00:36

Carnage but that downtime gave me plenty

play00:39

of time to pull out the old MacBook and

play00:40

figure out what was happening to all the

play00:42

windows machines around the world as far

play00:45

as we know the crowd strike blue screens

play00:47

that we've been seeing around the world

play00:48

for the last several days are the result

play00:50

of a bad update to the crowd strike

play00:52

software but why so today I want to help

play00:55

you understand three key things first

play00:58

why the crowd strike software is on the

play00:59

machines at all and second what happens

play01:02

when a kernel driver like crowd strike

play01:04

fails and finally we'll look at

play01:06

precisely why the crowd strike code

play01:08

fults and brings the machines down and

play01:10

how and why this update caused so much

play01:12

Havoc as systems developers at Microsoft

play01:15

in the 1990s handling crashies like this

play01:17

was part of our normal bread and butter

play01:19

every Dev at Microsoft at least in my

play01:21

area had two machines for example when I

play01:24

started in Windows NT I had a Gateway

play01:26

486 dx250 as my main Dev machine and

play01:29

then some old 386 box as a debug machine

play01:33

normally you'd run your test or debug

play01:34

bits on the debug machine while

play01:36

connected to it as the debugger from

play01:37

your good machine on nights and weekends

play01:40

however we did something far more

play01:41

interesting we ran a process called

play01:44

anti-stress now anti-stress was a bundle

play01:46

of tests that would automatically

play01:48

download to the test machines and run

play01:50

under the debugger and so every night

play01:52

every test machine along with all the

play01:53

machines in the various labs around

play01:55

campus would run anti stress and put it

play01:58

through the gauntlet the stress tests

play02:00

were normally written by our test

play02:01

Engineers who were software developers

play02:03

specially employed back in those days to

play02:06

find and catch bugs in the system so as

play02:08

an example they might write a test to

play02:10

Simply allocate and use as many GDI

play02:12

brush handles as possible if doing so

play02:14

causes the drawing subsystem to become

play02:16

unstable or causes some other program to

play02:19

crash then it would be caught and

play02:20

stopped in the debugger immediately the

play02:23

following day all of the crashes and

play02:25

assertions will be tabulated and

play02:26

assigned to an individual developer

play02:28

based on the area of code in which the

play02:30

problem occurred as the developer

play02:32

responsible that you would then use

play02:34

something like telnet to connect to the

play02:35

Target machine debug it and sorted out

play02:38

what went wrong all this debugging was

play02:40

done in Assembly Language whether it was

play02:42

Alpha myips power PC or x86 and with

play02:45

minimal symbol table information so it's

play02:47

not like we had Visual Studio connected

play02:49

still it was enough information to sort

play02:51

out most crashes find the code

play02:53

responsible and either fix it or at

play02:54

least enter a bug to track it in our

play02:57

database the hardest issues to sort out

play02:59

were the ones on that took place deep

play03:00

inside the operating system kernel which

play03:02

executes at ring zero on the CPU you see

play03:05

the operating system uses a ring system

play03:07

to bifurcate code into two distinct

play03:09

types kernel mode for the operating

play03:11

system itself and user mode where your

play03:14

applications run kernel mode does tasks

play03:16

such as talking to the hardware and the

play03:18

devices managing memory scheduling

play03:20

threads and all of the really core

play03:22

functionality that the operator system

play03:24

provides application code never runs in

play03:26

kernel mode and kernel code never runs

play03:28

in user mode kernel mode is more

play03:30

privileged meaning it can see the entire

play03:32

system memory map and what's in memory

play03:34

at any physical page in any instance

play03:37

user mode only sees the memory map pages

play03:39

that the colel wants you to see so if

play03:41

you're getting the sense that the kernel

play03:42

is very much in control that's an

play03:44

accurate picture even if your

play03:46

application needs a service provided by

play03:47

the kernel it won't be allowed to just

play03:49

run down inside the kernel and execute

play03:51

it instead your user thread will reach

play03:53

the kernel boundary and then raise an

play03:55

exception and wait a kernel thread on

play03:57

the Kernel side then looks at the

play03:59

specified ARG ments fully validates

play04:01

everything and then runs the required

play04:03

kernel code when it's done the kernel

play04:05

thread Returns the results to the user

play04:06

thread and let it continue on its merry

play04:08

way there is one other substantive

play04:10

difference between kernel mode and user

play04:12

mode when application code crashes the

play04:15

application crashes when kernel mode

play04:17

crashes the system crashes it crashes

play04:20

because it has to imagine a case where

play04:22

you had a really simple bug in the

play04:24

kernel that freed memory twice when the

play04:26

kernel code detects that it's about to

play04:28

free already freed memory it can just

play04:29

detect that this is a critical failure

play04:31

and when it does it bluec screens the

play04:33

system because the Alternatives could be

play04:36

worse consider a scenaria where this

play04:38

double freed code is allowed to continue

play04:40

maybe with an airror message maybe even

play04:41

allowing you to save your work the

play04:43

problem is that things are so corrupted

play04:45

at this point that saving your work

play04:47

could do more damage erasing or

play04:48

corrupting the file Beyond repair worse

play04:51

since it's the kernel system that's

play04:52

experiencing the issue application

play04:54

programs are not protected from one

play04:56

another in the same way the last thing

play04:58

you want is Solitaire during a kernel

play05:00

bug that damages your GI enlistment and

play05:03

that's why when an unexpected condition

play05:04

occurs in the kernel the system is just

play05:06

halted this is not a Windows Thing by

play05:08

any stretch it is true for all modern

play05:10

operating systems like Linux and Mac OS

play05:12

as well in fact the biggest difference

play05:14

is the color of the screen when the

play05:16

system goes down on Windows it's blue

play05:18

but on Linux it's black and on Mac OS

play05:20

it's usually pink but as on all systems

play05:22

a kernel issue is a reboot at a minimum

play05:25

now that we know a bit about kernel mode

play05:27

versus user mode Let's talk about what

play05:29

spefic specifically runs in kernel mode

play05:31

and the answer is very very little the

play05:33

only things that go in the kernel mode

play05:35

are things that have to like the thread

play05:37

schedule and the Heap manager and

play05:38

functionality that must access the

play05:40

hardware such as the device driver that

play05:42

talks to a GPU across the pcie bus and

play05:46

so the totality of what you run in

play05:47

curdle mode really comes down to the

play05:49

operating system itself and device

play05:50

drivers and that's where crowd strike

play05:53

enters a picture with their Falcon

play05:55

sensor Falcon is a security product and

play05:57

while it's not just simply an antivirus

play05:59

it's is not that far off the mark to

play06:01

look at it as though it's really anti-

play06:02

maware for the server but rather than

play06:05

just looking for file definitions it

play06:06

analyzes a wide range of application

play06:09

Behavior so that it can try to

play06:10

proactively detect new attacks before

play06:13

they're categorized and listed in a

play06:14

formal definition and to be able to see

play06:17

that application behavior from a clear

play06:19

vantage point that code needed to be

play06:21

down in the kernel without getting too

play06:23

far into the weeds of what crowd strike

play06:25

Falcon actually does suffice it to say

play06:27

that it has to be in the kernel to do it

play06:29

and so crowd strike wrote a device

play06:31

driver even though there's no Hardware

play06:32

device that it's really talking to but

play06:35

by writing their code as a device driver

play06:36

it lives down with the kernel in ring

play06:38

zero and has complete and unfettered

play06:40

access to the system data structures and

play06:42

the services that they believe it needs

play06:44

to do its job now everybody at Microsoft

play06:46

and probably at crowd strike is aware of

play06:49

the stakes when you run code in kernel

play06:51

mode and that's why Microsoft offers the

play06:53

whql certification which stands for

play06:55

Windows Hardware quality Labs drivers

play06:58

labeled this whql certified have been

play07:01

thoroughly tested by the vendor and then

play07:02

have passed the windows Hardware lab kit

play07:04

testing on various platforms and

play07:06

configurations and are signed digitally

play07:08

by Microsoft as being compatible with

play07:10

the Windows operating system by the time

play07:13

a driver makes it through the whql lab

play07:15

test and certifications you can be

play07:17

reasonably assured that the driver is

play07:19

robust and trustworthy and when it's

play07:21

determined to be so Microsoft issues

play07:23

that digital certificate for that driver

play07:26

as long as the driver itself never

play07:27

changes the certificate remain remains

play07:29

valid but what if you're crowd strike

play07:31

and you're agile ambitious and

play07:33

aggressive and you want to ensure that

play07:34

your customers get the latest protection

play07:36

as soon as new threats emerge every time

play07:39

something new pops up on the radar you

play07:40

could make a new driver and put it

play07:42

through the hardware quality Labs get it

play07:44

certified signed and release the updated

play07:46

driver and for things like video cards

play07:48

that's a fine process I don't actually

play07:51

know what the whql turnaround time is

play07:53

like whether that's measured in days or

play07:55

weeks but it's not instant and so you'd

play07:57

have a Time window where a zero day

play07:59

could propagate and spread simply

play08:02

because of the delay in getting an

play08:03

updated crowd strike driver built and

play08:05

signed what crowd strike often to do

play08:07

instead was to include definition files

play08:10

that are processed by the driver but not

play08:13

actually included with it so when the

play08:15

crowd strike driver wakes up it

play08:16

enumerates a folder on the machine

play08:18

looking for these dynamic definition

play08:19

files and it does whatever it is that it

play08:21

needs to do with them but you can

play08:23

already perhaps see the problem let's

play08:26

speculate for a moment that the crowd

play08:27

strike dynamic definition files are not

play08:29

mer

play08:29

malware definitions but complete

play08:31

programs in their own right written in a

play08:33

PE code that the driver can then execute

play08:36

in a very real sense then the driver

play08:38

could take the update and actually

play08:40

execute the PE code within it in curdle

play08:42

mode even though that update itself has

play08:44

never been signed the driver becomes the

play08:47

engine that runs the code and since the

play08:48

driver hasn't changed the sech is still

play08:50

valid for the driver but the update

play08:52

changes the way the driver operates by

play08:54

virtue of the P code that's contained in

play08:56

the definitions and what you've got then

play08:58

is unsigned code of unknown provenance

play09:01

running in full kernel mode all it would

play09:03

take is a single little bug like a null

play09:05

point of reference and the entire Temple

play09:06

would be torn down around us put more

play09:08

simply while we don't yet know the

play09:10

precise cause of the bug executing

play09:12

untrusted PE code in the kernel is Risky

play09:14

Business at best and could be asking for

play09:16

trouble we can get a better sense of

play09:18

what went wrong by doing a little

play09:19

postmortem debugging of our own first we

play09:22

need to access a crash dump report the

play09:24

kind you used to get in the good old an

play09:26

days but are now hidden behind the happy

play09:28

face blue screen

play09:29

depending on how your system is

play09:31

configured though you can still get the

play09:32

crash dump info and so there was no real

play09:34

shortage of dumps around to look at

play09:36

here's an example from Twitter so let's

play09:37

take a look about a third of the way

play09:40

down you can see the offending

play09:41

instruction that caused the crash it's

play09:43

an attempt to move data to register nine

play09:45

by loading it from a memory pointer in

play09:47

register 8 couldn't be simpler the only

play09:49

problem is that the pointer in register

play09:51

8 is garbage it's not a memory addressed

play09:53

at all but a small integer of 9 C hex

play09:56

which is likely the offset of the field

play09:58

they're actually interested in with in

play09:59

the data structure but they almost

play10:01

certainly started with a null pointer

play10:03

then added 9C to it and then just

play10:04

dereferenced it now debugging something

play10:06

like this is often an incremental

play10:08

process where you wind up establishing

play10:10

okay so this bad thing happened but what

play10:11

happened Upstream beforehand to cause

play10:13

the bad thing and in this case it

play10:15

appears that the cause is the dynamic

play10:17

data file downloaded as a Cy file

play10:20

instead of containing pcode or a malware

play10:22

definition or whatever was supposed to

play10:23

be in the file it was all just zeros we

play10:26

don't know yet how or why this happened

play10:28

as crowd strike hasn't publicly released

play10:30

that information yet what we do know to

play10:33

an almost certainty at this point

play10:34

however is that the crowd strike driver

play10:36

that processes and handles these updates

play10:38

is not very resilient and appears to

play10:40

have inadequate air checking and

play10:41

parameter

play10:42

validation parameter validation means

play10:45

checking to ensure that the data and

play10:46

arguments being passed to a function and

play10:48

in particular to a kernel function are

play10:50

valid and good if they're not it should

play10:53

fail the function call not cause the

play10:55

entire system to crash but in the

play10:57

crowdstrike case they've got a bu they

play10:59

don't protect against and because their

play11:01

code lives in ring zero with the kernel

play11:03

a bug and crowd strike will necessarily

play11:05

bug check the entire machine and deposit

play11:07

you into the very dreaded recovery blue

play11:10

screen now even though this isn't a

play11:12

Windows issue or a fault with Windows

play11:14

itself many people have asked me why

play11:15

Windows itself isn't just more resilient

play11:17

to this type of issue for example if a

play11:19

driver fails during boot why not try to

play11:21

boot next time without it and see if

play11:23

that helps and windows in fact does

play11:25

offer a number of facilities like that

play11:27

going back as far as booting n with last

play11:29

KN and good registry Hive but there's a

play11:31

catch and that catch is that crowd

play11:33

strike marked their driver as what's

play11:35

known as a boot driver a boot driver is

play11:38

a device driver that must be installed

play11:40

to start the Windows operating system

play11:43

most boot drivers are included in driver

play11:45

packages that are in the box with

play11:47

Windows and windows automatically

play11:48

installs these boot start drivers during

play11:50

their first boot of the system my guess

play11:52

is that crowd strike decided they didn't

play11:54

want you booting at all without their

play11:56

protection provided by their system but

play11:58

when it crashes as it does now your

play12:00

system is completely borked fixing a

play12:03

machine with this issue is fortunately

play12:04

not a great deal of work but it does

play12:06

require physical access to the machine

play12:09

to fix a machine that's crashed due to

play12:11

this issue you need to boot it into safe

play12:13

mode because safe mode only loads a

play12:15

limited set of drivers that mercifully

play12:17

can still contend without this boot

play12:18

driver you'll still be able to get into

play12:20

at least a limited system then to fix

play12:23

the machine use the console or the file

play12:25

manager and go to the path window like

play12:28

Windows and then system through 32

play12:29

drivers crowd strike in that folder find

play12:33

the file matching the pattern C and then

play12:35

a bunch of zeros 2 91. cist and delete

play12:38

that file or anything that's got the 291

play12:41

in it with a bunch of zeros when you

play12:43

reboot your system should come up

play12:45

completely normal and operational the

play12:47

absence of the update file fixes the

play12:49

issue and does not cause any additional

play12:51

ones it's a fair bet that the update 291

play12:53

won't ever be needed or used again so

play12:55

you're fine to Nuke it if you found

play12:57

today's episode to be any combination of

play12:59

informative or entertaining remember I'm

play13:01

mostly in this for the subs and likes so

play13:03

I'd be honored if you consider

play13:04

subscribing to my channel and leaving a

play13:06

like on this video and if you're already

play13:08

subscribed thank you please consider

play13:10

sending this video to a friend if you

play13:12

think it covered the subject well and

play13:14

please do check out the free sample of

play13:15

my new book on Amazon the non-visible

play13:17

part of the autism spectrum it's

play13:19

intended for folks that don't have ASD

play13:21

but who suspect they might have a few

play13:23

characteristics that put them somewhere

play13:25

on the autism spectrum it's everything I

play13:27

know now about living a successful life

play13:29

on the spectrum that I wish I'd known

play13:31

long ago check it out at the link in the

play13:33

video description in the meantime and in

play13:35

between time hope to see you next time

play13:37

right here in Dave's Garage

Rate This

5.0 / 5 (0 votes)

Ähnliche Tags
CrowdStrikeBlueScreenKernel-ModusSoftware-FehlerMicrosoftWindows-EntwicklungSicherheitssoftwareDebuggingKernel-TreiberSystem-Stabilität