CrowdStrike IT Outage Explained by a Windows Developer
Summary
TLDRDave, a retired Microsoft software engineer turned plumber, dives into the recent global Windows blue screen issues caused by a faulty CrowdStrike update. He explains the difference between kernel and user mode, the importance of kernel mode for security software, and the risks of executing unsigned code. Dave also offers a practical solution for fixing affected machines by booting into safe mode and removing the problematic driver file, providing insight into the resilience of modern operating systems.
Takeaways
- 👋 Dave introduces himself as a retired software engineer from Microsoft with experience dating back to MS DOS and Windows 95.
- 💻 He explains the CrowdStrike issue, focusing on the differences between kernel mode and user mode, and why the machines are blue screening.
- 🔍 The CrowdStrike blue screens are due to a bad update in their software, causing issues when a kernel driver like CrowdStrike fails.
- 🚨 Kernel mode is critical because it controls hardware interaction, memory management, and core functionalities of the OS.
- 🛠️ Dave shares his experience debugging blue screens, explaining how bugs in kernel mode can crash the entire system, unlike user mode crashes which only affect the application.
- 🧪 At Microsoft, stress tests were run nightly to catch bugs early, with test engineers writing tests to expose weaknesses in the system.
- 🔄 CrowdStrike's Falcon sensor operates in kernel mode to monitor application behavior for security threats, requiring robust and thorough testing.
- ❌ The recent issue was caused by a dynamic definition file that was supposed to update the CrowdStrike driver but instead contained invalid data, leading to system crashes.
- 🔧 Fixing the issue involves booting into safe mode and deleting the problematic update file from the system's drivers folder.
- 📚 Dave concludes by promoting his book about living a successful life on the autism spectrum and encourages viewers to subscribe to his channel for more content.
Q & A
Who is Dave and what is his background?
-Dave is a retired software engineer from Microsoft who has experience dating back to the MS DOS and Windows 95 days. He is now a plumber but still has a deep understanding of Windows development and debugging.
What issue is Dave discussing in the video?
-Dave is discussing the CrowdStrike issue, which has been causing blue screens on Windows machines worldwide due to a bad update to CrowdStrike's software.
What is the main difference between kernel mode and user mode in operating systems?
-Kernel mode is a more privileged mode where the operating system and device drivers run, having access to the entire system memory map and hardware. User mode is where applications run with limited access to system resources, ensuring that application crashes do not affect the entire system.
Why is running code in kernel mode considered risky?
-Running code in kernel mode is risky because if there is a bug in the kernel code, it can cause the entire system to crash, as it has access to all system resources and data structures.
What is the role of the WHQL certification in ensuring the robustness of drivers?
-The WHQL (Windows Hardware Quality Labs) certification ensures that drivers have been thoroughly tested by the vendor, passed the Windows Hardware Lab Kit testing on various platforms, and are digitally signed by Microsoft as being compatible with the Windows operating system.
Why might CrowdStrike choose not to go through the WHQL certification for every update?
-CrowdStrike might choose not to go through the WHQL certification for every update to ensure that their customers get the latest protection as soon as new threats emerge, avoiding the delay that comes with the certification process.
What is the CrowdStrike Falcon sensor, and why does it need to run in kernel mode?
-The CrowdStrike Falcon sensor is a security product that analyzes a wide range of application behavior to proactively detect new attacks. It needs to run in kernel mode to have complete and unfettered access to system data structures and services to perform its job effectively.
What is the problem with executing unsigned code in kernel mode?
-Executing unsigned code in kernel mode is problematic because it can lead to system crashes if there is a bug. Unsigned code has not been verified for stability and security, increasing the risk of system instability or security vulnerabilities.
How can one access a crash dump report to analyze the cause of a system crash?
-A crash dump report can be accessed by configuring the system to generate crash dump files. These files can provide detailed information about the state of the system at the time of the crash, including the offending instruction and the values of system registers.
What steps can be taken to fix a machine that has crashed due to the CrowdStrike issue?
-To fix a machine that has crashed due to the CrowdStrike issue, one needs to boot the machine into safe mode, navigate to the Windows system32 drivers directory, find and delete the problematic CrowdStrike driver file (usually with a pattern of 'C' followed by a series of zeros and '2 91.cist'), and then reboot the system.
What is the significance of marking a driver as a 'boot driver' in Windows?
-Marking a driver as a 'boot driver' in Windows signifies that the driver is essential for the startup of the Windows operating system. If such a driver crashes, the system may not boot properly, as it is considered a critical component of the boot process.
Outlines
This section is available to paid users only. Please upgrade to access this part.
Upgrade NowMindmap
This section is available to paid users only. Please upgrade to access this part.
Upgrade NowKeywords
This section is available to paid users only. Please upgrade to access this part.
Upgrade NowHighlights
This section is available to paid users only. Please upgrade to access this part.
Upgrade NowTranscripts
This section is available to paid users only. Please upgrade to access this part.
Upgrade NowBrowse More Related Video
CrowdStrike Outage Explained by Keith Barker CCIE
CrowdStrike Update: Latest News, Lessons Learned from a Retired Microsoft Engineer
System Calls
Why Microsoft Is To Blame For The Crowdstrike Outage (Not The EU)
L-1.7: System Calls in Operating system and its types in Hindi
Blue Screen of Death(BSOD) | CrowdStrike’s Mistake: Inside the Microsoft Outage |Must Watch
5.0 / 5 (0 votes)