Find out why multiple, in-depth layers of recovery parameters are more effective than a single layer when backing up and recovering SAP CRM data, business processes, and event logs.
Key Concept
Backup and recovery in depth refers to the strategy of creating multiple layers of recovery parameters (rather than a single layer) to better back up and recover the SAP CRM infrastructure in case of a major system or data issue. Parameters such as a recovery point objective, recovery time objective, and maximum tolerable downtime are all part of an in-depth backup and recovery strategy for SAP CRM. The benefit of this strategy is that if a layer in layered backup fails, another one can take over — or the system returns to a previous layer and updates it before continuing to other aspects of the layered backup.
An organization often needs more than the standard practices for backing up and restoring SAP CRM databases to comply with data retention policies for industries such as retail, health care, energy services, manufacturing, and media. A standard backup and recovery strategy usually includes backup sources, destinations, frequency, and scheduling, as well as security policies. It does not include multiple layers of recovery parameters to measure the effectiveness of backing up and recovering data in each layer. Some examples of the benefits of a multi-layer backup and recovery strategy include cost savings, risk mitigation process improvement, offsite storage, and the security of continuous backup.
Note
Backup sources include back-end main frames, data-intensive computing servers, and mobile devices (such as smartphones, handhelds, netbooks, and ultralight laptops). Layered backups include network drives, tapes, and external USB drives kept at off-site, fireproof facilities.
To achieve this level of security, you need the backup and recovery in depth model to include layered recovery parameters such as length of recovery time and maximum downtime after a disaster event. This allows you to more quickly restore SAP CRM data, business processes, email messages, and event logs. This strategy of layered backup and recovery helps maintain consistency between the SAP CRM data that is being backed up and the linked SAP ERP and SAP NetWeaver BW systems while complying with data retention requirements. The backed-up SAP CRM data may include data from smartphones for the on-the-go SAP CRM users, including mobile developers, sales representatives, field service employees, managers, and hospital physicians. (For an overview of the terms I use, refer to the “Glossary of Terms” sidebar at the end of the article.)
An Example of Data Retention Standards
The Payment Card Industry (PCI) security standards, which any company that handles credit card data must follow, contain various data retention requirements. PCI standards are an example of the type of requirements you must consider when your company is determining which backup and recovery method to use with your SAP CRM system to avoid non-compliance. For example, if a corporation does not follow PCI standards, it can face penalties of up to $500,000. A company can even lose the ability to accept credit cards if it performs gross violations.
In addition to application data and email messages, the retained data required by PCI standards includes event logs, transaction logs, and search engine data. This information could become legal records you will have to produce when requested by the appropriate authorities. If a legal process occurs, the data retention periods could be extended by legal authorities while litigation is in process.
Following PCI standards is an example of a single layer of backup. It requires that merchants and service providers follow regulations for storing, processing, or transmitting credit card data. For example, they must submit annual statements to show how they apply PCI to any network component, server, or application. An audit history must exist and cover at least one year. It should also be available online for at least three months. If merchants do not comply, they can face penalties.
Planning Backup and Recovery Strategies
Backup media such as tapes (full, incremental, and differential) and disks do not last forever. They can wear out and need to be replaced before a natural or man-made disaster occurs or business functions are interrupted. Tapes and disks that have been used longer than six months should be discarded and replaced with new ones. In addition, tapes and disks should be stored at an off-site facility, because it’s unwise to keep them in the same place or in the same geographical area subject to the same weather conditions (e.g., tornadoes).
In addition, your time of data recovery needs to be short to avoid the impacts of non-compliance. This means that access to a backup tape should be very fast, and the backup must be able to expand to accommodate a growing repository of data. When you need additional storage, you should plan automated data migration from one form of media to another of larger capacity.
Note
How often you should back up the active transaction logs for databases depends on how fast the databases on the local or remote disk reach the maximum limit of the disk capacity. You may need to review the log in the backup tape to determine what needs to be kept or discarded to comply with data retention requirements. You also may need to transfer the databases to media with a larger capacity.
Your strategy should address the importance of including policy, plans, and procedures on test schedules in the business contingency planning. Defective backup tapes due to improper labeling and testing could result in a failed recovery attempt from a disaster.
Several types of backup methods are available, depending on your time and storage needs. These methods include full backup, differential backup, and incremental backup:
- A full backup contains all data and files. A restore is the fastest with this backup style, while the actual backup is the slowest (compared to incremental and differential backups). Storage space requirements for this method are the highest.
- Differential backup contains all files that have been changed since the last full backup. Restoring for this is faster than an incremental backup, but slower than a full backup. Storage space requirements are lower than a full backup if more than one version is kept.
- Incremental backup stores all files that have changed since the full or differential backups. The actual backup is the fastest for this method, but the restore is the slowest because each incremental backup tape must be processed, and storage space requirements are the lowest. Storage space is higher for an incremental backup if a differential backup is performed daily.
Backup and recovery must be tested at periodic times to ensure they work properly. To measure the effectiveness of the recovery during the tests, you need to include multiple layers of recovery parameters in the backup and recovery in depth strategy.
Note
Some backup sources are only suited for full backups and do not include differential or incremental backups. For instance, BlackBerry devices and other kinds of smartphones permit only the full backup of handheld databases (e.g., messages) to a PC or an enterprise server.
Single Layer: Recovery Parameters
The length of time for the system to recover depends on the recovery parameters you set for SAP CRM applications and systems. I’ll look at recovery parameters first as a single layer and then as multiple layers in the backup and recovery in depth model.
Examples of parameters include:
Vulnerability Reduction
The MVR parameter begins the recovery process. You should include this parameter when you get the recovery facilities ready. It’s not always possible to completely eliminate the vulnerability for SAP CRM applications and systems. You should determine this parameter before you proceed to the RPO parameter. This RPO parameter is the time at which you plan to recover data from the last full backup followed by incremental backups. You need the additional time to install the incremental backups after the last full backup is restored.
Restore Business Functions
Next you must consider the MTD parameter. This parameter represents the maximum length of time a business can tolerate the absence or unavailability of a business function without causing irreparable harm to the business. The more critical a business function, the shorter the MTD.
The MTD consists of the RTO and the RDT. The RTO is the maximum acceptable length of time before business functions can be restored. For instance, it may take a couple of weeks for one business function to recover, but it may only take a couple of days for another business function to recover. The RTO, in turn, consists of the MTPD and the slack time (the difference between the earliest time an activity can start and its latest allowable start time).
The MTPD is the maximum length of time that a business can tolerate an absence or unavailability of an activity or resource before the business is harmed. The more critical an activity or resource is, the shorter the MTPD and slack time are. Ideally, the slack time should be zero. The MTPD sets the upper limit for the RTO.
The RDT is the maximum acceptable length of time to recover work after recovery facilities become available. It includes recovering lost data and work logs from the backup tapes and then testing and verifying that data or systems are in place.
Consider this scenario. You performed a full backup on Friday and incrementally on Monday, Tuesday, and Wednesday. The system crashes on Thursday morning. You need Friday’s weekly, full backup plus the incremental backup for Monday, Tuesday, and Wednesday. If you did differential backups on Monday, Tuesday, and Wednesday, you would only need Friday’s weekly full backup and Wednesday’s differential backup. The data contained in Wednesday’s backup is duplicated in the differential backups for Monday and Tuesday.
However, say that the full backup tape was found to be defective during a recovery attempt. The vulnerability was not checked and mitigated. What’s missing was the safeguard of alerting the system administrator that the tape was nearing end of its physical life. This tape was not checked for the exploitation of frequent use. The tape label might have even had a wrong expiration date — perhaps much later than the actual date. The testing of the MVR parameter was not done.
To prevent this, apply the MVR parameter by testing the physical conditions of the backup tapes and tape drives at scheduled times, and ensure that documentation on the status of each tape is up to date and accurate. This includes full backup tapes scheduled for weekly, monthly, and annual runs and rotated for specified periods of time (e.g., four weeks, one year, and five years, respectively). It also includes differential incremental backups, other types of backups (e.g., database transaction logs), and the results of search engines. The rotation period for the annual full backup runs (e.g., five or 10 years) depends on your data retention requirements.
Multiple Layers: the Backup and Recovery in Depth Model
Now I’ll transform the linear approach of a single layer model into layered recovery parameters that are easier to change in response to varying conditions, such as unexpected adverse weather events and litigation processes.
First Layer: MVR Parameter
Start with the MVR as the first layer for the SAP CRM system. The recovery vulnerability (e.g., vulnerability of incorrect tape labeling) must be decreased to the lowest acceptable level, preferably nearest to the zero level. It is not always possible to completely eliminate a vulnerability. You can set up a primary firewall to control the bundled backup tapes you get to other layers in the model to help lessen the vulnerability.
Second Layer: RPO Parameter
Next, proceed to the second layer: the RPO parameter for the same SAP CRM system. If a full backup tape for a primary file server from the bundled group of tapes scheduled to run together is found to be defective (e.g., a malicious change in the tape label giving the wrong expiration date), you should go back to the first layer to find the next full backup tape with the next highest acceptable level and moves this to this second layer. Another example of returning to the first layer is when a full backup tape from a group of workstations in a department is in good condition, but is missing some data from a BlackBerry device’s backup to an office workstation.
Third Layer: the MTD
This layer is split into first and second parts: the RTO and RDT parameters. The most critical task is making sure that this length of time and the slack time are low.
If for some reason the recovery activity cannot start at all, a person using this model returns to the second layer to change the RPO and gets another full backup tape from the bundle with the third highest acceptable level. It then repeats the process to get back to the RTO part of this third layer, which then proceeds to the RDT (the other part of the third layer).
Because of the high criticality of the SAP CRM data, the length of time to recover work must be short and an offsite, fire-safe facility holding the backup tapes should exist. Ideally, this facility should not be in the same geographical area as the company’s primary locations (and therefore subject to the same type of severe weather conditions).
If after an attempt to start the tape you discover that it is defective, return the model to the MVR layer — specifically to the second part of the third layer: RDT. If the tape is in good condition but needs to be rescheduled due to changes in production priorities, the model goes back to the RPO layer and then returns to the RDT parameter, as long as the changes in the RPO do not result in non-compliance or the company receives an authorization to extend data retention periods.
After the lost data and work logs are recovered from the bundled backup tapes and the systems verify that the recovery was a success, you than can leave the model and resume normal operations for the SAM CRM system.
Glossary of Terms
Maximum tolerable downtime: The maximum length of time a business can tolerate the absence or unavailability of a business function without causing irreparable harm to the business. The more critical a business function is, the shorter the maximum tolerable downtime is.
Maximum tolerable period of disruption: The maximum length of time before a business can tolerate an absence or unavailability of an activity or resource before the business is harmed.
Maximum vulnerability reduction: The maximum reduction of the vulnerability exploitation. It's not always possible to completely eliminate the vulnerability.
Recovery deployment time: The maximum acceptable length of time to recover work after recovery facilities become available.
Recovery point objective: The time to which you plan to recover data.
Recovery time objective: The maximum acceptable length of time before business functions can be restored. For instance, it may take a couple of weeks for one business function to recover while it may take a couple of days for another business function to recover.
Residual risk: The remaining risks after security controls have been applied.
Risk: The likelihood of a threat agent taking advantage of one of more vulnerabilities and the resulting business impact.
Vulnerability: The absence or weakness of a safeguard that could be exploited.
Judith M. Myerson
Judith M. Myerson is a systems architect and engineer and an SAP consultant. She is the author of the Enterprise System Integration, Second Edition, handbook, RFID in the Supply Chain: A Guide to Selection and Implementation, and several articles on enterprise-wide systems, database technologies, application development, SAP, RFID technologies, project management, risk management, and GRC.
You may contact the author at jmyerson@verizon.net.
If you have comments about this article or publication, or would like to submit an article idea, please contact the editor.