Most of us have heard about the “mass extinction events” that have occurred over various epochs of the earth’s history. One of them, the K-T event, took place about 65 million years ago, and wiped out about half of the planet’s existing animals and plants. We’ve seen near-extinction events over the history of the Internet, too, including the Morris Internet Worm, the SolarWinds attack, and the CrowdStrike event. These events aren’t identical; SolarWinds was the direct result of a hacker-driven supply chain attack. SolarWinds occurred because of a software flaw accidentally introduced by the company.
Yet, they share similarities that can help us learn critical lessons, if we take the time to really think about them. The lessons learned involve taking a more nuanced approach to processing updates, modernizing practices, and analyzing any code entering your organization, no matter how trustworthy that code might seem. So, over the past 4 years, what have we learned?
A historical case in point
It might be helpful to get a bit more context by telling a story about a similar, but much smaller, near-extinction event from 24 years ago. It happened when the IT department of a smaller, 200-employee company decided to update all four of its firewalls at once. These firewalls were all identical: They were all purchased from the same vendor and had the same patch level. They were responsible for connecting four different company offices in U.S. cities ranging from California, to Texas, Arizona and Virginia. They needed updating to help thwart a serious security issue.
The IT department felt that updating all four firewalls at once at night would minimize downtime and keep employee productivity high. But, unbeknownst to the folks updating the firewalls, the vendor-provided update software contained a hidden technical flaw that ended up “bricking” all four of the firewalls once it was applied. The result? The entire company lost all Internet and Intranet access for over four days. You can see that this was already a technical near-extinction event. But, it was a near-extinction event for the business, as well.
You see, this event occurred during the Fall time frame, when the company did the vast majority of its business. Sure, the technical problem was resolved in less than a week. But, the business problems took longer to resolve. As a result of this issue, the following problems happened:
1. Loss of revenue: Customers who couldn’t make purchases simply went to another vendor.
2. Reduced client trust: It took years for the clients who stayed to trust our uptime promises.
3. Impacted IT and cybersecurity maturity: Because the company made less money that year, the IT department received a lower budget that year, which caused them to postpone essential updates to customer-facing systems and cybersecurity controls.
In the case of the above little debacle, who is to blame? Let’s do a quick root-cause analysis: Some might point the finger at the IT department for updating an untested piece of software. Others might also point out that either the IT or the cybersecurity teams should have tested the software on a non-production system. Or, you can blame the CIO and the CISO, as they both have responsibility for their teams. But let’s not stop pointing the finger at just the internal tech teams.
What about the firewall vendor? They’re the ones that introduced the bad software in the first place. This blame then cascades to the vendor’s software developers, then to the quality assurance (QA) teams, and then to vendor’s business culture all the way to the CEO and the board. So, what do we have, here? A failure to communicate? A software quality problem? An IT problem?
Similarities between all three events
Notice the similarities, here, between the above relatively minor event of 24 years ago and what happened with the massive problems we’ve seen with the SolarWinds hack and the Crowdstrike events over the past four years. After all three incidents, there was plenty of talk about problematic IT practices: CrowdStrike, for example, blamed Delta Airlines for using older systems and tolerating technical debt when trying to explain why they were particularly hard-hit. Government probes identified flaws in SolarWinds software, too. Over time, we’ve seen root cause code analyses, class action lawsuits, and government hearings.
Yes, events such as CrowdStrike and SolarWinds have shown that folks are much more adept at the blame game: Class-action lawsuits on IT vendors have become the norm. Governments across the world are hold endless hearings to investigate major events and hacks. There’s now talk of “professionalizing” the workforce through licensure: The Ghana Cyber Security Authority (CSA), for example, now requires cybersecurity professionals to accredit themselves. All of these efforts are designed to address fundamental flaws in how organizations use tech.
But, it is far from certain how much all of the legal wrangling and government directives in the world can solve these problems. The best answers are for organizations to focus on modern IT and cybersecurity practices.
Focusing on best practices
We’ve learned from our subject matter expert research at CompTIA that it’s wise to focus on root causes, rather than play the blame game. For the future, it is wise to:
Manage, and even re-imagine our “instant update” culture. You can do this by:
Segmenting and isolating update rollouts: This step alone can help reduce the “blowout zone” whenever an organization fails to discover flaws in the code that streams into your systems.
Validate updates: This is not a trivial task in our continual update culture. But, the most mature organizations do just this. Doing so requires a highly-trained workforce and extensive automation. But, it’s a necessary step to protect today’s attack surface.
Train everyone who works on your attack surface: Focus on best practices to use automation, cloud-based tech, and security analytics processes more effectively.
Carefully manage tech monocultures: Most events, including SolarWinds and CrowdStrike, involve large numbers of systems that have the same architectures, configurations, and patch levels on un-segmented, un-isolated networks. Find ways to change things up and reduce the “blowout zone” during an event.
Reduce technical debt: Using older systems and processes remains a serious problem. Focus on eliminating out-of-production systems and resources with low patch levels and configuration issues. Most importantly, focus on improving communication and modifying traditional IT practices so that they have adapted to modern technology.
CompTIA’s infrastructure and cybersecurity pathways teach these elements. Therefore, any well-trained IT and cybersecurity team knows these things. Yes fairly common for organizations to short-circuit and “unhook” good IT and cybersecurity processes to save time. Nothing transforms a culture more than effective education. But, if we focus on maturing operations through education process improvement, we’ll all be able to learn more from recent near-extinction events, and avoid the impact of future events as they occur.