Your Business Hinges On Data Center Uptime – What You Need To Know

by Element Critical

Nov 10, 2022

Vulnerability management is critical for security, infrastructure, and IT risk management leaders. IT leaders continue to face challenges in ensuring continuous uptime for business-critical systems. Ultimately, an effective IT strategy for your data center infrastructure must prioritize risk management since downtime can disastrously affect revenue and reputation.

At a time of increasing reliance on the internet and cloud computing, companies find solace in not owning or managing IT equipment. Yet, the downtime risks are higher with cloud services since outages with major cloud providers are not uncommon. Parametrix reports that one of the three major public cloud providers – Amazon, Google, and Microsoft, experienced an outage at least 30 minutes every three weeks in 2021.

Organizations that manage their own on-premise environment must handle the unexpected costs and expertise required to operate their own infrastructure. Delivering uptime resilience is a tremendous responsibility when operating a data center environment is not the business’s only priority. Most modern enterprises are distributing their IT stack and compute applications across a larger geographic area and with multiple cloud and application providers, adding even greater complexity to where and how to prioritize and maximize the resiliency of their data operations. Data center downtime is brutal to a business’s bottom line. Currently, data center downtime costs $5,600 per minute or $336,000 per hour.

Even when the frequency of the downtime events can shift from year to year, the impacts don’t appear to gain relief with each passing year. The severity of these instances of downtime is becoming more intolerable, with 62% of outages in 2021 costing businesses more than $100,000, compared to 56% of these types of incidents occurring in 2020. Of greater concern, cloud users have no control over an outage or how long it will last. Even thirty minutes of downtime can feel everlasting when your business revenue depends on the hosted applications.

Leading Cause of Downtime Often Boils Down To Power Reliability

Although businesses are investing enormous funds into digital infrastructure, the considerable growth in new operating platforms doesn’t always translate into investments in improved reliability, especially in a complex architecture where IT workloads are spreading far and wide. The leading cause of data center downtime that continues to top the list is backup power issues, with 43% of IT managers reporting it as the leading cause of data center downtime in 2021. Poor power reliability issues are a frustrating cause of downtime – even more concerning when the fundamental capability of any data operation is to keep the lights on and avoid costly repercussions year after year.

In an age of continual digital transformation, businesses that choose resilient data partners and data center operators that can exceed the rigorous staff training, and operational procedures and deliver power reliability will reap the rewards of consistent services.

Data Center Redundancy Is A Critical Measure Of A Data Center Partner

Colocation data center operators are singular in focus. The entire business function is devoted to delivering optimal design, reliable operations, and consistent data center management to secure, power, and connect customer assets. Colocation providers provide services round the clock, and this requires that redundancy is built into all systems and protocols. A data center is considered redundant if it has the infrastructure to quickly or immediately replace failed systems with workable strategies following an outage.

Backups For The Backups

To achieve this, data centers possess built-in duplication in power, cooling, and network equipment. For example, diverse fiber pathways in the building ensure that disruptions are mitigated if one fiber line is impacted. Multiple providers in-house also support redundant network service. Backup battery UPS systems, backup generators, and refueling contracts are all examples of redundancy for power. The cooling systems are also protected by power redundancy and auxiliary equipment. Operators focus on each component in the line of service to customer’s equipment due to their impact on a data center’s operations: a sudden loss of power outage or temperature spike may disrupt service to end users and result in lost data, corrupt files, or damaged equipment.

Though redundancy is a high priority for data center operators, not all providers or the facilities they manage are designed or managed equally. IT leaders can investigate data center grading and power redundancy based on industry tier rating. Secondly, after customers determine the degree of tier rating and redundancy, providers can articulate their service requirements with robust SLA terms.

Data Center Tiers Provide Reliability Rating

The amount of system availability a data center can offer is directly correlated to the amount of redundant infrastructure it possesses. To make it easier to discern how redundant a data center’s infrastructure is, the Uptime Institute devised a rating scheme that assigns data centers to four tiers, each corresponding to the possession of increasingly redundant infrastructure: Tier 1, Tier 2, Tier 3, and Tier 4.

For a data center to achieve the next-tier ranking, it must deliver the minimum level of service required for that tier. Besides equipment, uptime also takes staff expertise, maintenance protocols, and other factors into determining what tier a data center belongs to.

Tier 1: Best for small operations seeking cost-effective hosting
A Tier 1 data center possesses a single path for power and cooling and no backup components for critical systems. In this tier, an organization can technically manage during power spikes or outages but would have to shut down to perform regular maintenance or make emergency repairs. As a result, a Tier 1 data center can only guarantee an uptime of 99.671% per year or less than 28.8 hours of downtime per year.
Tier 2: Best for small- to midsize organizations seeking basic redundancy
A Tier 2 data center possesses all of the capabilities of a Tier 1 facility with partial redundancy in the form of backup generators, UPS systems, and cooling equipment. Backup systems allow Tier 2 facilities to withstand power spikes or outages and perform select routine maintenance. Unfortunately, Tier 2 facilities still rely on a single distribution path for power and cooling and are more vulnerable to unexpected disruptions. Tier 2 facilities can guarantee slightly more uptime than Tier 1 facilities, with a guaranteed uptime of 99.741%, or less than 22.7 hours of downtime per year.
Tier 3: The standard for mid-sized and large organizations prioritizing resiliency
Tier 3 data centers utilize additional redundant distribution paths for power and cooling to offer N+1 redundancy. To achieve this level of redundancy, dual power supplies are attached to different UPS units, and redundant cooling systems are put in place. This ensures that both a UPS unit and a cooling unit can be taken offline without impacting IT operations. This redundancy in data centers is concurrently maintainable, allowing staff to perform routine maintenance without interrupting end users’ experience with the service. Tier 3 facilities can guarantee 99.982% uptime per year, which translates to a total annual downtime of 1.6 hours.
Tier 4: The must-have for enterprises who require uninterrupted availability
A Tier 4 data center possesses all of the capabilities of the previous tiers while adding fault tolerance mechanisms which are themselves redundant. In a Tier 4 facility, there is an entirely identical system on standby for every component, with this system being physically isolated and independent from the primary system. In turn, this physical separation prevents a local event from compromising both systems. Tier 4 data centers can guarantee an uptime of 99.995% per year, which amounts to less than 26.3 minutes of downtime each year.

However, it’s important to note that data center tier certification ratings are optional and not a regulatory requirement. Therefore, only some data centers will have an assigned tier. Beyond looking at a data center’s Tier ranking, organizations should look at other dimensions of their data center provider’s operations to get a more holistic read on their redundancy. Asking about whether there are physically separate utility rooms and paths with independent generators or asking for run times and performance under failure conditions can reveal the redundancy a data center can offer for future services.

That being said, the most surefire way for organizations to ensure their data centers meet performance requirements is to craft robust redundancy and support terms into their SLAs or seek out providers that stand behind their agreement with 100% uptime guarantees.

Colocation Providers Close The Gap On Customer Expectations

Cloud Uptime Standards Versus Colocation Uptime SLAs
Most cloud providers offer an industry-standard five-nine uptime, or 99.999% uptime. Though this alludes to nearly continuous uptime, a data center that supplies 99.999% uptime still has the potential to experience almost six minutes of downtime annually. Going back to our cost figure from earlier, that’s $33,600 per year lost to downtime.

Leading colocation providers like Element Critical, on the other hand, confidently offer 100% uptime SLAs thanks to their commitment to at least N+1 redundancy anchored by concurrently-maintained power resources and carrier-grade or better infrastructure. Additionally, since colocation environments house many different types of industry data, colocation providers are incentivized to constantly upgrade their equipment and redundancy protocols to keep their facilities compliant.

The technical acumen necessary to maintain and operate a data center with mission-critical systems is substantial. Colocation providers have on-site engineers and technicians with extensive data center management experiences monitoring their customers’ data environments round-the-clock to troubleshoot and even follow strict protocols to prevent system failures. Leading colocation providers will even offer 24/7 support through ticketing and reporting portals to give their customers’ IT staff the visibility to see how incidents affect their services’ performance and availability.