Disaster Recovery Plans (DRPs) are all too often overlooked in the IT industry – being concocted once-upon-a-time because it was necessary to show the business one was required, but never updated or upgraded or audited. Considered somewhat of a tick box exercise for many years, organizations who now rely upon IT systems for their operations must revisit such protocols and test them thoroughly if they’re to truly survive though a catastrophic or disruptive IT incident at any time.
Historically, DRPs were focused on some kind of malfunction in a server room or, depending on geographical location, a weather event causing system disruption. However, today, with the sheer level of dependence on IT systems, there are considerably more potential incidences to be considered and planned for; including, but by no means limited to:
- Electricity blackout
- Internet failure
- Theft of equipment
- Failure in ambient-temperature equipment
- Data breach or security risk.
There are numerous events that could occur in a data center, and businesses need to be well prepared and well equipped to quickly resolve any such situation.
What should a Disaster Recovery Plan look like?
A DRP should be a document of a business’ strategic outline of the processes they have in place to cope with any major incident or disruption to their IT systems. This may include the loss of power, data, or connectivity and often includes plans and processes for enacting a duplicate IT site situated physically elsewhere to reinstate service.
DRPs form part of a business’ Business Continuity (BC) plan. Business Continuity planning refers to the overall approach to keeping an entire organization functioning throughout an incident. The DRP is the tactical plan specifically to reinstate service to IT infrastructure to reach its pre-disaster state of operations.
The Goals and Objectives of a Disaster Recovery Plan
While no organization can ever ensure wholly that no disruption or negative event can ever happen to them at all, they can choose how to respond to it if it does.
A DRP takes a proactive approach to threat management and can ensure that all involved parties know how to react and what to do should the worst occur. Its goals include:
- Minimizing the risk of any negative event
- Maximizing uptime of systems and services to end users
- Demonstrating a commitment to user safety
- Maintaining compliance with legal and accreditation responsibilities
- Reinstating and continuing an unbeaten level of service to customers.
What Not to Forget in Your Disaster Recovery Plan
While there may be no one-size-fits-all DRP for businesses, there are several elements that can be easily overlooked or even forgotten entirely in the plan’s creation and coalition. Through Procurri’s years of experience, we’ve seen lots of the following misjudged, mismanaged or mistaken entirely – and so our expert staff recommend ensuring the following points are revisited and checked vigorously.
Alignment with elements of Business Continuity Planning
While a DRP is certainly a critical document, it must not be forgotten that it forms part of a bigger and wider plan – the Business Continuity Plan. Involved parties in BC must understand how the DRP works and how and when it should be enacted and implemented. They also must have good oversight of their own responsibilities within it, as well as any responsibilities that sit on the side of prevention; before the DRP kicks in, to minimize the chance of it ever needing to be used.
Regular assessment of Downtime Tolerance
It’s essential that a business understands the expectations and tolerance for what could be sustained during a critical event. This will vary from organization to organization, as those reliant on real-time IT systems may suffer vast consequences with just a few seconds of downtime. This will mean that this kind of business will require heavy investment and preparation in recovery protocol as even a couple of seconds could cause survival issues for the organization as a whole. Where businesses are smaller and less reliant on IT, a slightly longer outage may be sustainable and so less investment and preparation may be suitable.
Most businesses fluctuate in their demands either seasonally or as they grow and develop, and so it is key that an organization regularly assesses their downtime tolerance to understand the changing needs that a DRP may need to cater for.
Inventorise existing functions
To understand best how to react to a disaster event, a business must first understand what its BAU looks like and what is most likely to happen. The relevant information must be collated:
- Are there any existing back-up systems in place within the current IT configuration that would kick in should an unexpected event occur?
- What is most likely to happen if a system goes down? What would the impact be on the end user? What would the impact be on the end user?
- What happens if an element of the business’ data center experiences an outage or fault?
- What does BAU look like for all systems at present, and what variance is acceptable for sufficient operations to continue?
Identify weak points
In order to best understand which areas of an IT system may need the most support in the event of a critical incident, those working in and on it must know the likely ‘pain points’ and vulnerabilities in the existing system. Common areas of weakness include:
- Oversights made in the initial design of the data center and its configuration
- Strains on energy resources
- Failures in power supply
- Failures in temperature controls.
Define objectives and goals
A DRP cannot be defined to have been successful unless it has specific objectives to meet and goals to fulfil. These include:
- RTO (Recovery Time Objective) – the time the business needs to recover all applications back to how their ‘BAU’ operating level prior to an incident occurring
- RPO (Recovery Point Objective) – the age of the files that will need to be recovered in order to resume ‘BAU’ service prior to an incident occurring
- What does success look like for the business in disaster recovery?
- Are there any system areas that could take longer for recovery and this wouldn’t impact on service for end users?
Proritize risks
If the company’s existing system setup and configuration is already well understood internally, it may be that the risks to it are already understood. A risk assessment (either done entirely from scratch or from interviewing staff who have the relevant knowledge) can uncover:
- What are the most likely threats faced by the business?
- How likely is each risk to actually happen?
- Which risks would have the largest impact if they were to occur?
From here, risks can be prioritized and the DRP created to ensure that the critical balance between what is most likely to happen with what will have the most drastic impact. This can ensure that investment and planning are made as wisely as possible.
Assign roles to relevant parties
A DRP must have a set of clearly defined roles and responsibilities for all involved parties. This should include leadership for each area, clear reporting trees and back-ups in case of critical staff being away. What’s more, all parties involved should know how to communicate with one another throughout any incident occurring.
Define any external Business Continuity sites
In the event of a disaster occurring, many businesses switch their immediate systems to another physical location, ensuring that the disruption does not occur at the back-up site. There are a wide variety of such locales worldwide, and these can be accessed both remotely and in-person for operation. Businesses must decide if and where such sites will be, and reach contract agreements with the operators of such facilities to utilize them as and when needed.
Practice and test
The actual enactment of the DRP should be well tested and practiced periodically; both on-site and off. The regular testing of such protocol allows for any new or developing vulnerabilities to be identified as well as any strengths to be amplified and enhanced. If a business does opt for a third-party off-site Business Continuity facility for DR, this should be visited by a team at least annually to ensure all parties know how everything operates and how their role contributes.
Need help from the experts?
Procurri are on-hand to provide a designated point of contact project manager, specialist technicians, 24/7/365 support and the world’s largest stockholding of hardware. Our experts have helped restore service to businesses even after the most devastating of unexpected events, and we can help your organization do just that should the worst happen. Get in touch for a call today and let’s protect your operations, together.