Disaster Recovery for Government Organizations -Planning, Practice, and People

Jim Smith, CIO, State of Maine
34
47
11
Jim Smith, CIO, State of Maine

Jim Smith, CIO, State of Maine

When the State of Maine computer operator drove to work on that cold, crisp morning in January of 1998, he probably did not realize that he would soon be living in the State’s data center, full time. In addition, he probably did not know he would bring his family with him. In Maine, in January, you need heat to survive and thousands of families did not have heat that January.

The Great Ice Storm of 1998 hit Maine and Eastern Canada hard. It extended over two weeks. Over half the state’s population was out of power for over two weeks, schools and government offices were closed, radio communication systems were out; every state, car, building, and road was covered in a four-inch layer of ice. It made travel impossible and it challenged basic survival.

Disasters happen–it could be a cyber-attack that hits your network, or fire at a data center, or a weather event of flooding or ice. Disasters happen.

The challenge for all of us is the preparation, execution, and reaction to unforeseen events. And in the government sector, the challenges include securing the funding, determining which services are the most critical, and planning with agency partners.

Getting started -- analysis

Disaster recovery/business continuity consists, like many technology endeavors, of several elements—people, process, and technology. The other critical components are analysis and planning. You can’t recover all the hundreds of services and systems at once; you must know, before a crisis, what you are going to recover, and in what order. For us at the State of Maine, that means understanding from the governor’s office the recovery priority, and understanding from our agency partners what are the most critical things to recover. Interruptions in services could impact police response, or health services, or child protection.

‚Äč  The challenge for all of us is the preparation, execution, and reaction to unforeseen events  

1. Analysis -- Prioritizing the recovery – understanding the State’s mission

Working with the governor’s office, we created the DR priority foundation–what are the most important services to recover:

1. Essential Communications – we cannot function without email, telecom. These elements come first.

2. Citizen Health and Safety - Whether it is a regional event disrupting life for citizens, or a local event impacting a state data center, it is imperative that public safety officials be able to function. They must be able to communicate, to retrieve critical data, and have information regarding events.

3. Direct Citizen services - Citizens depend on the state for a myriad of services, from supporting daily business through business licensing and information, to daily critical services like food and medical support.

4. State Revenue - For longer outages, we need to ensure that the state can function financially

5. Economic Development - Businesses depend on state services and regulations.

2. Analysis with the agencies – the detailed work

The first DR / BC step in working with the agencies, like Health and Human Services, or Public Safety, or Department of Corrections, or Agriculture or the Department of Labor, is to determine which of their hundreds of services should be restored first. For us, that means working with them to complete their BIAs (Business Impact Analysis document).

The BIA analysis includes:

• Defining agency business functions
• Determine business function criticality, recovery prioritization, and recovery objectives
• Resources (both technical and business) needed to recover the business functions

In addition, the agencies create emergency response plans (fire, building evacuation, active shooter, etc.) and communication and incident response plans.

3. After analysis – Planning and Execution

Build those business partnerships and together determine what business functions must be there. Business Continuity is an operational activity—finding new space (if needed), setting priorities, getting the work done in a more limited fashion, communication with citizens—these are business activities.

Next steps after business priority and department BIAs:

o Obtain business continuity software, if desired. It can really be an efficiency gain for communications and planning
o Review redundancy for IT components—network, telcom, storage, applications, etc

• How will you recover enterprise IT? Duplicate data center? DRaaS (disaster recovery as a service) through a third party?
• How long will it take to rebuild IT infrastructure, if you are doing it internally?

Vendor, SaaS review—what are your vendors DR plans? How often are they tested? How are they tested, desktop exercise or actually pulling the plug? Review the disaster recovery clause in all your contacts.

4. Test and Practice, Practice, Practice

A desktop DR plan is only as good as? In truth, a desktop plan is only a first step in testing. Disaster recovery really demands rigorous, full scale testing—disabling a system, rebuilding it, and restarting it. The actual problems you will encounter in a secondary data center or with a vendor service won’t appear in a desk top exercise. It is imperative that you conduct a full-scale DR exercise.

If you can’t find the funding, the SMEs, the time to do a full-scale test? Then your first step should be testing an actual, up-to-date, comprehensive DR call tree. Can you reach employees, management, and vendors at 6 am? Can you reach them at 11 pm? Make sure you can get to your critical resources.

I am not sure when the state of Maine will be covered in another 4-inch blanket of ice; but I am sure that there will be future disrupting events. We need to plan and practice recovery to be ready.

Read Also

Using

Using "The Box" for Disaster Recovery Planning

Eric J. Satterly, Vice Provost for Information Technology
Disaster Recovery: A Continuous Journey

Disaster Recovery: A Continuous Journey

Mathew Beall, VP of Infrastructure, First American Financial Corporation
Crisis and Incident Management for the 21st Century

Crisis and Incident Management for the 21st Century

Louis Grosskopf, General Manager, Business Continuity Software, Sungard Availability Services