Fully Open Edge Cloud

Cloud outages are on the rise. Here’s why

The reason why cloud infrastructure is less reliable these days.
  • Last Update:2023-06-08
  • Version:001
  • Language:en

Posted on June 7 2023, Fortune, by Jean-Paul Smets.

“The cloud infrastructure is down today” is a phrae that could send shivers down any CEO’s spine–and yet, it’s been a common occurrence recently. On Apr. 26, an outage caused by fire and exacerbated by water damage disrupted Google’s Cloud service, impacting Western Europe, Japan, India, Indonesia, and South Carolina in the U.S. This was the second major incident in 2023 after the Microsoft Azure outage on January 25th which prevented millions of users accessing Outlook and Teams.

One in four I.T. leaders believe infrastructure outages are the most likely source of disruption for their organization, according to a recent report from Splunky. Such outages have an estimated cost of $365,000 per hour of downtime for businesses. Perhaps more worryingly, half of the respondents’ organizations don’t have a digitally resilient infrastructure designed to mitigate or outright prevent the impact of a cloud service outage.

In the increasingly complex digital age, cloud outages aren’t merely possible, they’re likely.

The centralized data center problem

All cloud services are dependent on centralized management and data centers. If these centers experience problems, cloud users across the infrastructure can be affected. This is actually one of the most common causes of cloud outages. A power failure in these data centers, for example, could impact millions of people.

Another reason for cloud outages is software updates that leave gaps for bugs or malicious players to attack the technology. Cybersecurity is becoming more and more sophisticated, but so too are hackers. If cloud providers’ management systems are attacked, users’ access is not only restricted, but their cloud data is also possibly compromised.

Natural disasters can’t be ruled out. Unpredictable and extreme weather has wreaked havoc on data centers, causing millions of dollars of damage and contributing to cloud outages. Heatwaves in Europe last summer recorded temperatures of 104.4 degrees Fahrenheit, causing Google and Oracle data centers to go offline. The physical hardware for cloud infrastructure has to be built to withstand more intense conditions, while also meeting sustainability efforts–a difficult line to toe.

While many cloud providers are developing systems to mitigate outages, the fundamental challenge is that a centralized system creates a weak point in cloud services as a whole. Unsurprisingly, resiliency concerns are growing among business leaders and politicians. 

The cloud as a political pawn

Censorship laws can fuel cloud outages. Governments are cracking down on tech activity and in some cases, have blocked cloud platforms that don’t align with the political ideology of a location. For example, China’s “Great Firewall” censorship means Google Drive is banned in the country. VPNs to circumvent the blockage are banned.

Export restriction laws are similarly problematic for the cloud. GitHub has faced setbacks in Crimea, Syria, and Iran due to U.S. trade restrictions that reduce the platform’s cloud capabilities in these places. While Iran has recently restored GitHub operations, developers in Crimea and Syria can only utilize portions of the service.

In Europe and further afield, businesses have experienced cloud downtime from attacks on internet submarine cables. In France, suspected sabotage of underwater cables by a Russian sub impacted the U.S., Europe, and Asia, as cloud providers reported increased latency and intermittent service downtime. In Egypt, damaged cables resulted in temporary outages for businesses in South East Asia. Though no cause was concluded, the event shows the value of underwater cables in geopolitics.

While, according to NATO, the war in Ukraine has led Russia to begin mapping out critical undersea infrastructure for a potential attack on the underwater sea cables. With 200 of the 400 cables around the Atlantic and North Sea area deemed critical infrastructure and the ambiguity of responsibility for the Nord Stream gas pipe attack, clearly, we are moving into uncertain times on the safety of our critical infrastructure.

Though damage to sea cables in the North Sea is much closer to home, the Google Cloud outage from April caused downtime in Japan, India, and Indonesia, demonstrating that the interconnectivity of the cloud providers has already led to ramifications in other regions.

A now very plausible scenario considered by security experts is the cutting of submarine cables around the island in the event of a conflict, effectively cutting off Taiwan from the internet. Establishing cloud data centers in Taiwan may not be sufficient to achieve resilience in the event of partial destruction of critical cloud infrastructure.

As cloud services become even more necessary for all functions of a business, businesses need to ensure their digital infrastructure is resilient and able to navigate expected cloud downtime.

Jean-Paul Smets is the CEO of Rapid.Space, a fully open cloud provider offering resilient edge infrastructure and private 5G.