You need to have a plan in place if a risk assessment shows that the likelihood of an emergency event (e.g., severe storms or earthquakes) occurring is high. A business continuity plan allows an organization to minimize the consequences of a disaster and continue normal business functions afterward.
Key Concept
Business impact analysis looks at how individual business units would be affected after a significant interruption of the computing and communication services that the organization needs for continuous compliance with Sarbanes-Oxley.
Business continuity is essential to continuing business functions in the event of an emergency so you can comply with Sarbanes-Oxley and other regulations. Using a business continuity plan, you can configure your SAP system running on SAP NetWeaver Application Server to work with a business continuity plan on critical operations. This applies to SAP ERP, SAP BusinessObjects GRC solutions, and third-party products you have purchased for integration with your SAP system.
Before you develop and test the plan, you must ensure that you assess the risks adequately, reduce the frequency of risk events to an acceptable level, and minimize the severity of the consequences if risk events occur.
I’ll show you what you should include in your business continuity plan, including:
- Business impact analysis
- Business continuity plan objectives
- Recovery time and point objectives
- Performance criteria
I’ll finish with a discussion of control methods you can use.
Note
Having a business continuity plan does not ensure that you can always resume operations quickly from the standby system at a remote site. It does not address the issues of both the primary and backup sites within the same geographical area that are subject to same types of disaster (e.g., hurricanes) or issues such as worn-out backup tapes and disks that cannot be used in recovery efforts.
Business Impact Analysis
You need to assess your organization’s business operations and information system support services for inputs into a project plan to develop a business impact analysis. A business impact analysis assesses the effect a significant interruption in business would cause on certain units. For instance, financial effects may include the costs of non-compliance and dollar loss of business interruptions, while operational effects may include the inability to deliver and monitor quality data retention service during the interruptions.
You need to determine how you would approach the business impact analysis process. The approach you use depends on the current status of the process as shown in Table 1.
Initiation
|
Initiate a business impact analysis for a business continuity plan
|
Update
|
Initiate the process for carrying out the update of a business impact analysis
|
Justification
|
Justify current activities with a business impact analysis (e.g., a recovery alternative such as a new backup site)
|
Environment change
|
Update a business impact analysis to identify changes in the environment
|
|
Table 1 |
Business impact analysis approaches |
In any approach, you identify which business continuity resources are critical to your organization’s continued existence and identify those risks to those resources, assessing the likelihood of those risks occurring and the impact of those risks on the organization. Then you prioritize the allocation of resources to various risks you identified and assessed. Next, create a list of all the risks you analyzed during the business impact analysis process and sort them in descending order of the annualized loss exposure (or expectancy) that you compute during the impact assessment phase. For a simple example, see Table 2.
Flood
|
Facility
|
.01
|
Fire
|
Server room
|
.1
|
Denial of service
|
Sarbanes-Oxley-relevant SAP systems
|
1.0
|
|
Table 2 |
Risks in descending order |
You should also include a list of current and future statuses of residual risks that remain after you apply security controls to mitigate them.
The annualized loss exposure is the monetary loss that you can expect for an asset due to a risk over a one-year period based on an asset value and the level of frequency. For example, if a threat or risk has an annualized loss exposure of $5,000 due to severe storms twice a year, then it may not be worth spending $10,000 per year on a security measure that eliminates it. In this case, you have a residual risk, which I talked about in a previous article, "5 Steps to Accept or Reject Residual Risks."
Business Continuity Plan Objectives
In your business continuity plan, you should include objectives in addition to policies, plans, and procedures to help your organization return to normal business operations from a disastrous event. The objectives are important to reduce the likelihood of an unanticipated disruption through risk management. They contribute to maintaining good relationships with your customers, mitigating negative publicity about the disruptions, and preventing or reducing damage to your reputation.
Unanticipated disruptions include denial of service (e.g., client negligence, willful misconduct, or network attacks), telecommunication failure (e.g., a provider accidentally cuts a fiber line), software and system flaws, and monitoring and measuring system failure. By contrast, anticipated disruptions include scheduled maintenance such as hardware upgrades, software upgrades, and backups.
Unanticipated disruption to the primary system causes users and auditors who are looking to retain Sarbanes-Oxley data on the system to experience poor service. To mitigate negative publicity about the disruptions, you must inform your external Sarbanes-Oxley auditors, compliance officers, and other stakeholders about the cause of the disruption and recovery objectives on when, where, and how to start recovering from the disruption and complete the recovery process.
Objectives differ depending on company and industry. For example, if your company is a public authority, the business continuity planning objective is to protect mainframe, server, and desktop computer systems. For a water company, the objective is to maintain supply to customer via automated systems. If the company type is public (based in the United States) that uses computer technologies to comply with Sarbanes-Oxley, there should be at least four objectives:
- Protect the Sarbanes-Oxley-relevant computer systems and equipment from denial of service and telecommunication failure
- Protect automated business functions at head offices from disruptions
- Make Sarbanes-Oxley-relevant data available quickly to auditors
- Mitigate negative publicity by informing your external Sarbanes-Oxley auditors, compliance officers, and other stakeholders about the cause of the disruption and how to start recovering from the disruption and complete the recovery process
Recovery Time and Point Objectives
As part of the risk assessment, you must determine how long a system can be unavailable and how much of the data can be lost when it is available. This translates into the recovery time objective and recovery point objective.
The recovery time objective refers to the maximum acceptable length of time that can elapse before the absence of critical business functions severely affect your organization. It does not set the upper boundary on time. It is the time from when the disastrous event occurs until the time when a business process is recovered. A disaster can be anywhere between someone with proper credentials accidentally changing data to total site destruction.
The recovery point objective refers to the point in time at which you must recover the data after a disaster has occurred. For instance, when you do backups overnight, you need to determine the recovery point time such as the end of the previous day’s activity at a specific time (e.g., 5:30 pm). Then you determine the critical data point at which data must be recovered (e.g., the last record update transaction or all record activities).
You can retrieve the background information you need to compute the recovery time objective and the recovery point objective from the reports that SAP BusinessObjects Risk Management generates. You can get those reports from the Reporting and Analytics work center. This work center provides access to various risk management reports based on different risk criteria and auditing reports.
You can compute the recovery time objective and recovery point objective independently and compare them based on infrastructure, technology, and regulatory issues. The longer the recovery time objective for a system, the less the cost is to recover it. At the same time, the potential losses from the system that is unavailable rise dramatically. Ideally, when the potential loss equals the cost of recovery, you have the acceptable downtime point. This should tell you how much you should spend on your plan.
Always include in the recovery time objective the time you need to make a decision to deploy the plan. Determine how fast you can make a critical decision. The longer you wait to make the decision, the value of that decision slides.
However, the ideal acceptable downtime may not be a suitable performance criterion to minimize the effects on restoring a Sarbanes-Oxley-relevant system after a disaster. The time to resume business functions of complying with Sarbanes-Oxley is shorter than that for non-regulatory business functions. Section 802 requires that you protect all retained data against loss via high server failover mechanisms. Section 404 requires CEOs, CFOs, and auditors to confirm the effectiveness of internal controls for financial reporting on retained data.
If both Sarbanes-Oxley-relevant and -irrelevant systems are running at the same facility at the same location, then you should include the minimum business continuity objective in your business continuity plan. This objective is the minimum level of service that is acceptable to the organization immediately after a disaster. You need to recover the business processes for Sarbanes-Oxley compliance immediately after a disaster. You can defer recovery to normal service levels for other systems (e.g., personnel) until later during the recovery period.
Performance Criteria
You must specify performance criteria in your business impact analysis to minimize serious consequences such as prolonged downtimes, outages, and disruptions of any technology that supports critical functions of complying with Sarbanes-Oxley. Performance criteria assists the organization in understanding the impacts associated with possible threats and risks.
In addition to acceptable downtime, you need to compute the maximum tolerable downtime, maximum acceptable outage, and maximum tolerable period of disruption for reliance on time-critical support services and resources, such as personnel, facilities, hardware, software, data and voice networks, and data records.
- Maximum tolerable downtime: This looks at how long the organization can discontinue a business function without putting its survival at risk. Business functions associated with Sarbanes-Oxley data retention often require the shortest maximum tolerable downtime.
- Maximum acceptable outage: This provides the time frame during which a recovery of the system must be effective before its loss compromises the organization’s objectives or survival. Examples include a total service failure or loss of data center facility.
- Maximum tolerable period of disruption: This specifies the maximum time an activity or resource can be unavailable before irreparable harm is caused to the organization. It sets the upper boundary on the time by which an organization intends to recover a business function or resource as specified in the recovery time objective. You should estimate a maximum tolerable period of disruption for each key activity within the organization.
Then you compute how much these periods can deviate from the ideal acceptable downtime. The deviations may be large due to a short recovery time objective for Sarbanes-Oxley compliance. The maximum tolerable downtime, maximum acceptable outage, and maximum tolerable period of disruption must be short. You need to set the standard on how large the deviations should be.
Control Methods
The most traditional disaster recovery medium is magnetic tape. The shorter the recovery point objective, the shorter the backup interval should be.
When the recovery time objective is short, you must have a standby system at a hot site or a backup system offsite. A hot site duplicates the original site of the organization, including computer systems, office space, and furniture needed to continue business operation in the event of a disaster. It is important that real-time synchronization between the original and hot sites is in place to mirror the environment of the original site. When a disaster to the original site occurs, the organization can relocate temporarily with minimal losses to normal operations. The backup company that offers a hot site also offers backup services.
Just make sure the primary site, hot site, and standby or backup system are not in close proximity in the same area subject to the same disaster type (e.g., severe storms). In evaluating tapes, consider what type of backup hardware you will be using, where the tapes are stored until they go offsite, how quickly the tapes go offsite, what routes the duplicate tape copies are sent, and how often the backup tapes are performed.
Incremental Backup
Consider incremental backup as a preferred backup method in a business impact analysis to achieve the objectives of a business continuity plan. Incremental backups store files that have been changed since the last incremental, differential, or full backup. Differential backups store all the files since the last full backup. Restoring from a differential backup is faster than restoring from an incremental backup. Conversely, restoring from multiple incremental backups is faster than restoring from multiple differential backups.
When you run multiple differential backups after your full backup, you’re most likely including some files in each differential backup that were already included in earlier differential backups, but have not been recently modified. For this reason, backing up differentially is slower than backing up incrementally.
Incremental backups are the quickest to perform because they store the smallest number of files while the storage space requirements for differential backup are higher. For full restoration, the last full backup needs to be restored, followed by each incremental backup in sequence since the last full backup.
You need to periodically check out the condition of backup tapes. Worn-out tapes can result in a media failure in restoring incremental backup. This causes problems restoring subsequent incremental backups. If an incremental tape fails, full restoration may not be achieved, particularly when you do not have a copy of the incremental tape in good working condition to replace the failed tape.
Reciprocal and Mutual Assistance Agreements
Consider reciprocal or mutual assistance agreements along with other backup agreements you may have in a business impact analysis to achieve the objectives of a business continuity plan. A reciprocal agreement is an agreement made by two companies to use each other’s resources during a disaster. A mutual assistance agreement is an arrangement between two or more companies to share computing resources to assist the other during a disaster.
Just make sure the businesses concerned are unlikely to be affected by the same disaster. For example, businesses in close proximity may all be affected by the same evacuation order, area power outage, telecommunications loss, or floods.
When businesses are far from one another and are unlikely to be affected by the same disaster, there are other issues to consider. For example, are the mainframes, minicomputers, and peripherals compatible? Is the systems software at both places identical? Are the tape identification numbers in each site uniquely identified so that the tape identification numbers at the primary site do not duplicate the identification numbers at the reciprocal site?
Make sure the providing site has sufficient excess capability to provide restoration of complex transactions. Any agreement should be subject to proper risk assessment and formal approval by originating and reciprocal or mutually assisted businesses. The assessment should establish that the arrangement would provide an acceptable level of backup and obligate one or more businesses to make available sufficient processing capacity and time.
Judith M. Myerson
Judith M. Myerson is a systems architect and engineer and an SAP consultant. She is the author of the Enterprise System Integration, Second Edition, handbook, RFID in the Supply Chain: A Guide to Selection and Implementation, and several articles on enterprise-wide systems, database technologies, application development, SAP, RFID technologies, project management, risk management, and GRC.
You may contact the author at jmyerson@verizon.net.
If you have comments about this article or publication, or would like to submit an article idea, please contact the editor.