3 AWS Cloud Outages in a Month Underscores Importance of a Hybrid Strategy
AWS experienced three major outages in December 2021, nearly one per week. The reasons for these outages varied, revealing the existence of multiple single points of failure in AWS infrastructure and issues with common procedures. The third outage took around 12 hours to resolve, leaving some businesses to endure difficulties providing services to their customers for an entire business day.
The first outage was related to an “automated activity” that involved scaling service capacity. According to Data Center Dynamics, this activity “resulted in a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network….” The second outage involved internet connectivity issues on the west coast. The third involved a power outage at an Amazon data center in Northern Virginia, affecting businesses across the entire eastern half of the U.S.
These outages affected major business and consumer service providers across the U.S., including Asana, Twitter, Slack, Netflix, Hulu, Quickbooks, and Canvas, an online learning management system used by thousands of universities and k-12 schools. During the third outage, most education institutions could not access Canvas for most of the school day, impacting final exams. Citi Bike, a privately-owned bike-sharing company servicing New York and New Jersey, was taken offline for each of the three December outages. The third outage left riders stranded during the morning rush hour.
All-Cloud Strategies Lead to Complexity and Dependency
When businesses rely on a single cloud provider, they are bound to experience outages and pass those costs on to their customers. According to Data Center Frontier, a 2021 survey from The Uptime Institute found that data center outages cost companies an average of $100,000 per incident, with about a third of respondents citing costs of $1 million or more. Many customers use cloud services with an SLA of 99%, but this still allows for over seven hours of downtime per month.
Cloud outages have led some infrastructure technologists to begin questioning whether they should continue to rely only on cloud services. While using cloud services can seem easier at times, abstracting away the work of spinning up, load balancing, and managing servers, it is also, in a way, more complex. When relying entirely on cloud services, companies can reach a point where their services are operating inside a black box, making it extremely difficult for IT teams to understand any single point of failure.
Hybrid cloud strategies that include on-prem or colocation data centers give companies a greater line of sight and control over their deployments, uptime, and ultimately, business results.
Colocation Data Centers Can Offer 99.99999% Uptime
This is why some companies choose to contract with a reliable colocation provider to ensure they have a backup plan if their cloud provider experiences an outage. Or they use a combination of both, with the option to failover to one or the other. Some reputable colocation providers can offer companies 99.99999% uptime, have outstanding track records.
For example, Element Critical has a record of 100% uptime at several of our facilities, thanks to our redundant systems and our engineers who run our data centers with military precision. Onsite managers and technicians maintain uptime with rigorous daily inspections of all power, network, and cooling equipment and continual preparation for extreme weather events. Our Chicago data centers have experienced many extreme winter weather events yet have not experienced a single downtime event in 30 years. Our Houston data center has not experienced downtime since its inception. Our Austin data center has been online for 15 years, successfully weathering Winter Storm Uri, leaving many Texans without power for several days.
Adding a data center footprint with a reliable colocation provider to a company’s IT infrastructure plan can keep a business online 99.99999% of the time and more, significantly increasing the reliability of services they provide.