When Does Recovery Time Objective (RTO) Begin
In some organizations the debate still rages about when the clock begins “ticking” on Recovery Time Objective (RTO), after a technology outage. Does it begin when the outage occurs? Or does the recovery clock start when the recovery team begins to resolve the outage? It is time to settle that debate, and agree that: RTO must begin at the point the production environment is impacted; and organizations must confront the challenges posed by this approach.
To establish a common perspective from which to approach this question, it is helpful to quote the definition of RTO from the Disaster Recovery Journal’s glossary of terms: “The period of time within which systems, applications, or functions must be recovered after an outage. RTO includes the time required for: assessment, execution and verification. RTO may be enumerated in business time (e.g. one business day) or elapsed time (e.g. 24 elapsed hours).”
The glossary’s notes provide important amplification regarding the three phases of RTO. “Assessment includes the activities which occur before or after an initiating event, and lead to confirmation of the execution priorities, time line and responsibilities, and a decision regarding when to execute. Execution includes the activities related to accomplishing the pre-planned steps required within the phase to deliver a function, system or application in a new location to its owner. Verification includes steps taken by a function, system or application owner to ensure everything is in readiness to proceed to live operations.”
With these notes in mind, one may restate the core question in this way: does the recovery clock begin during assessment (‘application down’) or execution (‘team working’)?
Regardless of which approach an organization takes, the assessment phase must be compressed. Assessment includes the time required to: reach a decision to shift from the production to the disaster recovery (DR) environment, assemble the recovery team, and alert the application owner to standby for verification. The assessment phase of recovery can vary widely, particularly if an outage occurs outside the business day.
“Assessment includes the time required to: reach a decision to shift from the production to the disaster recovery (DR) environment, assemble the recovery team, and alert the application owner to standby for verification”
In an exercise it is difficult to measure assessment duration, as everyone has plenty of time to schedule and prepare for the test. Organizations should utilize unannounced exercises to validate the assessment period. These no-notice tests can occur without disrupting the production environment, as no technology recovery takes place during this phase.
The ‘application down’ outlook holds significant advantages over the alternative. This view is the only one which aligns with the user’s loss of access to the application. Regardless of when the recovery effort starts, the application user (or customer) no longer can access the desired functionality. In today’s high expectation marketplace, t he customer frustration levels typically have short durations. Minutes matter in an outage; hours of downtime can potentially cause long-term relationship and financial implications.
Adopting the ‘application down’ perspective makes it essential to have timely awareness of application failures, and to maintain the capability to rapidly convene teams to troubleshoot and recover the impacted system(s). The status monitoring process must quickly determine the nature and severity of an application outage, and thereby facilitate the rapid and focused allocation of the resources to address the issue. If the application may be restored in the production environment, taking that course of action may be quicker than recovering it in the DR environment.
Rapid team assembly may be required at any time of day or night, so employment policies must ensure that responding team members are available whenever needed. But the organization should avoid overreacting to an outage when possible.
Application RTOs must specify whether the objective is in elapsed hours or business hours. This delineation answers the question, “if that application goes down at 3:00 a.m. on a Sunday, do we awaken the recovery team?” Consider a four-hour RTO example. A n elapsed time RTO answers that question “ yes.” However, an RTO of “ four business h ours” could give the team their night’s sleep before addressing the recovery requirement.
Business hour recovery requires clarification when the organization or its application users occupy multiple time zones. As the span of time zones increases, so does the need for organizations designate RTOs in elapsed hours? RTOs are assumed to be in elapsed time if they do not state otherwise.