Be Kind to Your IT Teams: An Overview
Have you ever wondered how prepared your business is for a major IT crisis? Last week’s global IT outage was a stark reminder of the importance of robust and proactive IT management. While many companies were scrambling to fix issues caused by the latest CrowdStrike update, Total Group's clients remained unaffected. Here’s why.
Your IT engineers and teams are your saviors and are in no way responsible for the outage last week. When cyber-attacks occur, isolating devices in milliseconds and protecting company data is vital. While staff and user training are important, they have proven to be a weak deterrent against cyber risks. With millions of new threats emerging daily, protecting data becomes increasingly complex. Data security software like CrowdStrike, installed on every compliant device, is a powerful tool. Its ability to insulate critical servers and company data from damage is necessary.
Last week, the damage was not malicious but reported as accidental, caused by a faulty update. The result was that devices became isolated from critical servers, and the update itself crashed the Windows operating system, rendering millions of devices unusable. IT teams pulled 2 x 20-hour shifts in an attempt to limit the damage to end users. Would you pull 2 x 20-hour shifts for them?
An Analysis of the CrowdStrike and Microsoft cyber failure
The CrowdStrike incident on 19th July 2024 disrupted digital networks of airports, hospitals, and governments worldwide. A routine update by CrowdStrike, intended to enhance security, included faulty code that led to widespread crashes of systems running Microsoft's Windows operating system. These crashes, manifesting as the now infamous "blue screen of death," caused significant disruptions in various industries, including global banks, airlines, hospitals, and government offices.
Slow Recovery
Once devices are isolated from the internet, the ability for any company to push out fixes centrally is prevented. This meant that most devices in this incident required an engineer to physically interact with each device, which takes a great deal of time. The slow recovery highlighted the critical need for better update management practices worldwide.
Vulnerability from Overreliance on Single-Point Solutions
The sheer complexity of interconnected services and servers prevents us from identifying points of failure. This case highlighted an overreliance of numerous organisations on single-point IT solutions. All impacted organisations were running the same software, underscoring a vulnerability in their cyber-resilience strategies. This incident underscores the importance of a global conversation about how such IT solutions are maintained and updated.
Supply Chain Security Dilemma
With the constantly evolving threat landscape and increasing complexity of digital systems, frequent software updates are necessary to maintain security. However, this case highlighted that updates can also be the root cause of security challenges. The rapid evolution of the supply chain introduces new vulnerabilities to security risks, especially with the trend towards automatic updates.
Failure Without Cyberattack
Cybersecurity is often associated with cyberattacks, but as this incident shows, our computer systems can fail without any malicious intent due to faulty processes. The underlying cause was an update to the kernel-level driver that CrowdStrike uses to protect Windows computers. The incident demonstrated that security features could inadvertently lead to security risks.
Focus on Critical Infrastructure
The incident puts the trust in digital infrastructure at risk, leading to increased scrutiny and demand for more robust, resilient systems, especially in critical sectors. Countries and companies must focus on critical infrastructure and critical information infrastructure, from dealing with complex systems and supply chains to protecting submarine cables and other critical points of failure in the modern internet.
Need for International Response
An update on a CrowdStrike server affected systems worldwide and highlighted the dependence of numerous digital systems in critical sectors on a single provider. This may require owners and operators of critical infrastructure from both public and private sectors to diversify their third-party service providers where possible and enhance cyber resilience. Achieving this requires international cooperation, which may be challenging in the current geopolitical environment.
Total Group Unaffected by Global IT Crisis: Here’s How We Did It
Rigorous Testing and Staggered Rollouts
-
Thorough Testing: At Total Group, we prioritise comprehensive security by rigorously testing updates before deployment. We conduct thorough testing in a controlled environment to ensure they function as intended and do not disrupt our clients' systems. This approach ensures that our clients remain protected from vulnerabilities.
-
Staggered Rollouts: For non-critical updates, we implement staggered rollouts, testing them for seven days for all deployments. This allows us to catch and address any issues on a smaller scale first, minimising the risk of widespread impact.
Diversification of Providers
-
Avoid Overreliance: Do not rely solely on a single provider for critical systems. Diversifying providers enhances resilience and ensures a failure in one system does not cripple the entire infrastructure.
Focus on Critical Infrastructure
-
Robust Systems: Invest in robust and resilient systems, especially in critical sectors like healthcare, banking, and government. Regular audits can identify and address potential vulnerabilities proactively.
The CrowdStrike incident underscores the fragility of our interconnected digital infrastructure and highlights the necessity for stringent update management protocols. The fallout from this event serves as a compelling case for implementing robust testing procedures, staggered deployment strategies, and diversified security solutions. By taking these steps, businesses can better protect themselves from similar disruptions and ensure operational continuity.
At Total Group, we are dedicated to providing comprehensive protection for our clients. Our proactive approach to update management and cybersecurity ensures that our clients' operations remain secure and resilient, even amid widespread IT challenges.
For more expert insights and important updates in cybersecurity, subscribe to LinkedIn Newsletter IT Pro - Defend Your Digital Assets, published by Nathan Stewart, Total Group Managing Director & Data Protection Officer. Stay ahead with proactive IT strategies and comprehensive knowledge tailored to protect your business through its digital infrastructure.