Businesses face a number of threats from their internal and external environments. These forces not only threaten the normal operations in a business, but they could also lead to the loss and damage of resources such as data and affect the future operations of the company. As a result, Stallings and Brown (2012) insist that businesses need to improve their resilience by putting in place measures that mitigate such threats while at the same time providing a mechanism that necessitates a swift response. This is easily achievable with the help of a disaster recovery plan (DRP). According to Luckey (2009), a DRP is a document that analyzes the prevailing business status, potential threats, and their probable impact, and then provides a mechanism, plan, or procedure in which to protect the business and help in recovering its vital information technology processes in the event of a disaster. The following sections discuss the elements of a DRP, risks associated with businesses, recovery strategies, and appropriate testing methods for the DRP that is designed for ABC, Inc. which is a hypothetical audit firm based in the US and with a global presence.
Elements of a DRP
The first element of a DRP is the communication plan. Sanders, Randall and Smith (2013) insist that companies need to set a clear communication plan so as to ensure that threats and potential disasters are reported to the relevant authorities and that there is a proper coordination to responding to these disasters. In addition, a DRP is not complete without goals. As reported by Glen (2010), it is always essential that a DRP has a well-defined set of goals that define its scope and provide its purpose. This way, it is then possible for a business to develop a more actionable plan that meets these goals, and that should focus on disaster recovery and the sustainability of the business. The second element is the personnel. In this case, all the team members, their command, and their responsibilities need to be clearly defined so as to provide a clear chain of reporting and command and to ensure proper coordination of the recovery process. The third element is the application profile. As reported by Croy (2014), all the applications should be listed alongside comments of their associated threats and also the frequency in which these resources are utilized. In fact, this is in addition to the inventory profile element, manufacturers, the cost, and the frequency of the audit. This ensures that threats are monitored at regular intervals and that the risks associated with any inventory are identified on time.
A good DRP should also have an information services procedure. According to Croy (2014), this is particularly important as it serves as a preventive measure, which helps in ensuring that disruption of processes and loss of data are minimized as much as possible. A DRP should also have a disaster recovery procedure. This is a set of processes that have to be followed in the event that a disaster happens, which then provides a coordinated process of disaster response. The DRP should also have a clear process for restoring the entire system. This is critical as it helps in restoring normal business operations once a disaster has been addressed. In fact, Wallace and Webber (2012) add that a DRP should always have a clear rebuilding process that should detail the assessment process and the reconstruction of resources such as data centers that may be damaged following a disaster.
Risks
There are a number of risks that businesses face. The first category is internal risks that could arise from accidental or intentional events. As described by Stalling and Brown (2012), some common internal threats include sabotage, structural failure, data leak, and a system crash. On the other hand, external risks include cyber attacks and robbery, among others. While it is possible to mitigate most internal and external risks, Stallings and Brown (2012) insist that environmental forces are hard to manage, which then creates a need for a disaster preparedness, response, and recovery plan. Some common environmental risks include earthquakes, fire, tornadoes, and hurricanes, among others.
Backup/recovery strategies
There are a number of recovery or backup strategies that ABC, Inc. could adopt in order to ensure that it remains operational even after the occurrence of a disaster. One of these is the use of a hot site, which is an operational space that facilitates a duplicate of the original site of an organization. According to Alhazmi and Malaiya (2012), this site runs parallel to the original site and adopts a real-time synchronization of the data and processes, which means that an organization’s operations will be up and running almost immediately. On the other hand, a cold site is an empty operations space where an organization can readily transfer some of its vital processes and recreate its operations. Backup is not readily available and the cold site will rely on the most recent backup, which means that the downtime is likely to be long while the operations will be significantly scaled down. Finally, a warm site is a compromise of a hot and a cold site in that it is a scaled-down version of the original that has a ready backup that is updated frequently. The recovery process will take a short or moderate duration based on the components and backups that have to be brought onsite.
For ABC, Inc. the most appropriate recovery strategy is the warm site. According to Disaster Recovery Journal (2018), a warm site balances between costs and efficiency, which means that a business can easily recover its vital processes within a short period at a sustainable cost. This is vital for a company like ABC as it helps the business in maintaining its operations while evading unsustainable costs that could arise in the event that the company ran parallel sites.
Testing methods
There are a number of testing methods that ABC, Inc. could use for its DRP. One of these is the paper test, which is a simplified process that allows the team to annotate and read the recovery plan. A different strategy is a walkthrough, which is a process that allows the team to comprehensively analyze plans, identify issues, and come up with the necessary changes. A team could also choose to simulate the DRP in order to identify its adequacy. Further, a parallel test could be carried out building up the entire recovery system and carrying out actual tests. Finally, a cutover test can also be carried out in a similar way as a parallel test but with the disconnection of the primary system.
For ABC, Inc. the recommended testing method is a simulation. According to Ali and Ali (2009), a walkthrough and a paper test are not sufficient to acquaint the team with the threats and to enable it to carry out sufficient testing of their impact. On the other hand, parallel and cutover tests could be too expensive to carry out (Luckey, 2009). A simulation is a cost-effective method that allows an organization to create a virtual system and virtually test a DRP at an almost similar efficiency to cutover and parallel methods. In return, this should help the company achieve its goals without compromising its costs.