On July 19, 2024, parts of the world seemingly stood still as CrowdStrike, a leading software security company, pushed out a worldwide update that crashed Windows. This affected multiple industries and enterprises worldwide as Windows machines were met with a blue screen of death. While the fault lies with CrowdStrike, it’s important to know what went wrong, why it happened, and how it could be prevented in the future.
CrowdStrike is a software company that plays an important role in helping companies find and prevent security breaches, claiming to have the fastest mean time to detect threats in the industry.
Since its inception in 2011, it has helped investigate major cyberattacks such as the Sony Pictures hack in 2014 and the Russian cyberattacks in the Democratic National Committee in 2015 and 2016. It’s been so successful that, up until the night prior to the incident, CrowdStrike was valued at upwards of $83 billion.
CrowdStrike has roughly 29,000 customers, with over 500 on the Fortune 1000 list. This fame is a good way to network and gain more clients, but also poses a larger problem if something goes wrong.
If you’re connected to leading industries around the world and help keep them running, it stands to reason that a slip up would be catastrophic.
On the morning of July 19, as part of routine operations, CrowdStrike released a content configuration update for the Windows sensor to gather data on possible new threat techniques.
It’s important to note that CrowdStrike delivers security content configuration updates like these in two ways:
Sensor Content provides a wide range of capabilities to aid in adversary response. This is not a dynamically updating cloud function but one that is always part of a sensor release. Sensor Content includes on-sensor AI and machine learning models and is made up of code written to deliver longer-term, reusable capabilities for CrowdStrike’s threat detection engineers.
The area which caused the worldwide issue, Rapid Response Content, is used to perform a variety of behavioral pattern-matching operations on the sensor using an optimized engine. It is representative of fields and values, with associated filtering.
Rapid Response Content is stored in a binary file with configuration data, not as code or a kernel driver. It is delivered as content updates to the Falcon sensor, CrowdStrike’s flagship platform.
Aspect | Details |
---|---|
Content Config System | Part of the Falcon cloud platform, creates and deploys templates to the sensor using Channel Files. |
Channel Files | Store and update content data on the sensor, saved on the host’s disk. |
Content Interpreter | Reads Channel Files on the sensor to help detect or stop malicious activity, based on user settings. |
Sensor Detection Engine | Detects or prevents threats, handles problematic content smoothly. |
Template Types | Tested for event volume, resource use, and system performance. |
Template Instances | Used to test Template Types, created and checked by the Content Config System and Validator. |
Content Validator | Checks content for errors before it gets published. |
On July 19, 2024, two more IPC Template Instances were deployed. Due to a bug in the content Validator, one of the two Template Instances passed validation despite having problematic content. Based on the testing performed before the first deployment of the Template Type, which occurred earlier this year, these checks were trusted and allowed to deploy into production.
When received by the sensor and loaded into the Content Interpreter, the problematic content resulted in an out-of-bounds memory read triggering an exception. As the system couldn’t handle this exception, it resulted in a Windows operating system crash.
Since the update introduced a faulty software onto the core Windows operating system, systems around the world were stuck in a boot loop. Systems would show an error message that Windows couldn’t load correctly, while giving users the option to try troubleshooting or restarting their PC. While a fix was sent out shortly after, the issue isn’t that easy to resolve.
For some businesses, it has taken days or weeks to resolve as their IT administrators may have to get physical access to a device to get them working again. For larger enterprises, this could take much longer, especially if their IT team is already tied up in alternative issues. Some few systems may even be beyond recovery and require a complete replacement.
Some issues are entirely out of your control. You don’t control how Windows updates are launched and if they are safe. At most, you can figure out if your enterprise will take a Windows update live or not. A problem like with CrowdStrike could be a critical blow to some companies and result in unexpected downtime that lasts far longer than you’re comfortable considering. US Cloud is a reliable solution at the best and worst of times.
Our expert engineers:
When disaster strikes, US Cloud is there to lend a helping hand. Don’t leave your enterprise vulnerable. Contact US Cloud today and ensure your data stays protected!