Siddhartha Rabindran MS, MBA, PMP, CSM, Principal SAP Technical Specialist, Microsoft Corporation
As many customers adopt the cloud, migrating existing SAP systems and deploying new ones can be challenging, even when all of the pieces of your architecture are working together. If you’re not sure whether the pieces of your architecture are working as they should, gaining an understanding of where the connection points are between different pieces of architecture, where you're putting them from an infrastructure perspective, and where and what sort of information they need to be communicating with, can make migrating existing systems to the cloud much easier.
In this article I will explain the architecture that exists in an infrastructure-as-a-service (IaaS) platform. This document is not a how to guide; rather, it is a document that dissects the pieces of SAP architecture components. Architecture scenarios include cloud automation, high availability (HA), and disaster recovery (DR) scenarios.
Automation in the Cloud
One of the core capabilities of the cloud is automation. Although agility is not synonymous to automation, automation is an aspect of accelerating the delivery of value to a business. Automation can be broadly categorized into three groups:
- Cloud provisioning automation
- SAP software installation automation
- Operations automation
We will not cover ABAP git or other SAP application features in this article.
Automating Cloud Provisioning
Any hyperscale cloud is software-defined infrastructure. For the customer, the hardware tangibility is abstracted and the interaction to the hardware is through software and more specifically application programming interfaces (API). Deployment of network, storage, and virtual machines are through software. Customers can use the cloud portal to deploy infrastructure and virtual machines. Imagine going through the process of deploying tens and hundreds of virtual machines. It can be quite tedious. Hyperscalers provide the ability to deploy VMs through infrastructure automation such as Azure Resource Manager (ARM) in Azure and Cloud Formation (CF) template in Amazon Web Services (AWS).
For an SAP system running on IaaS, there are typically three distinct components in terms of automation: SAP database (SAP on HANA, SAP on MSSQL, SAP on Oracle, etc.), SAP ASCS/SCS (ABAP SAP Central Services / SAP Central Services) sometimes in tandem with Enqueue Replication Services (ERS), and SAP Application Servers (dialog instances).
These three distinct components should be automated in different templates. The database template will typically include a larger compute size, disks for database data and log files, two virtual machines in case of high availability, etc.
When planning HA, ASCS/SCS and ERS requires two virtual machines typically sitting behind a load balancer. (For HA scenarios in Azure the floating IP is presented through what we call as Internal load balancer. Internal Load Balancer [ILB] enables routing to the appropriate VM where the services are active).
Finally, in the application server templates, customers may deploy few VMs in SAP to configure application servers, also known as SAP dialog instances. SAP application server templates are straightforward with one or two disks (for most scenarios), but the beauty of automation is we can write templates to deploy multiple VMs at the same time.
Typically, these templates are either in JSON or YAML format. The basic aspect that any customer should leverage in cloud is the ability to deploy VMs through code. Using cloud portal and other interactive interfaces is good to begin with, but prone to manual effort and errors when deploying at scale.
Automating SAP Software Installation
Once the infrastructure has been deployed, operating system (OS) configuration and system configuration need to occur before SAP software can be deployed. The tasks within these configurations include adding the virtual machine to domain, disk configuration, configuring appropriate OS settings (each OS has different configuration pertaining to the version of OS and software that is to be installed).
The fundamentals of automating SAP software lies with SAP batch mode installation or unattended installation.
SAP software deployment may require the database to be installed separately. For example, it is possible to install HANA prior to installing SAP. SAP HANA database can be installed in batch mode. Batch mode installation uses hbdlcm and with a parameter to a configuration file. For details, refer to the HANA installation guide for details. This entire installation process can be orchestrated in multi ways. It could be a shell script or a python script. Other ways to orchestrate is using ansible, chef, puppet, salt or any other tool.
Similarly, SAP can be installed using unattended install well. SAP Note
2230669 is a good start.
The building blocks for SAP software installation automation is unattended / batch mode installation process. One we understand this, the entire installation process can be orchestrated through scripting and other devops tools. I have provided a sample end to end code, you can take a look into my github repo (
https://github.com/sidrabindran/Singlenode_S-4HANA)
Automating SAP Operations
To leverage the functionality of the cloud, it is beneficial to automate some of your SAP operations. For example, snoozing, dynamic scaling, system refreshes, etc.
One of the capabilities of the cloud is to pay as you use. For customers to leverage this, customers can shutdown SAP instances and infrastructure over the weekend and/or non-business hours. This is also called snoozing. Snoozing of SAP infrastructure is not just shutting down SAP-related virtual machines—the process needs to be orchestrated. SAP instances and databases must be shut down before the virtual machines are shut down. One way of achieving this is to leverage power apps and power platform.
Dynamic scaling involves provisioning SAP instances during peak usage and de-provisioning them post peak. For example, during month end, quarter end, or year-end closes. Peak business period includes holiday season, product launches, etc.
Some of the system refresh process can be automated. System refresh steps involve pre-processing, back-up from production, restore, and post-processing. Most of the steps can be scripted and orchestrated.
With respect to automation, it is possible to build automation and buy automation as well. Cloud provides endless possibilities and from my experience the most interesting aspect is the ability to provision on demand and automate. The speed to value is tremendously improved through automation. Especially if you are a systems integrator or service provider, automation can help to deploy and manage at scale. While it is a continuous process, it is very exciting.
HA and DR Scenarios
HA and DR are critical topics to address during a migration of your SAP systems to the cloud. I am not going to summarize every possible HA and DR scenario, but instead will provide broader details and architecture.
HA
Platform resiliency should not be confused with HA. Every cloud provider provides platform resiliency with service level agreements (SLA). Some of the cloud providers provide live migration, which I classify as platform resiliency.
Application HA builds on platform resiliency, which provides failover capabilities for SAP application for planned and unplanned scenarios. When we look at SAP system SPOFs, it is SAP database, ASCS/SCS and SAP application servers. We will focus on SAP database and ASCS/SCS as typically SAP application servers are configured for HA using log-on load balancing.
Irrespective of the cloud provider, databases provide native replication features which can be combined with automated failover clustering to provide database HA. I have worked with a few commonly used databases.
Table 1 – Commonly used databases and replication methods
For HA scenarios, it is recommended to use database replication in synchronous mode.
Cloud requires one additional decision to be made: Whether to leverage availability zones or non-availability zones. Availability zones are two different facilities in the same cloud region. It is important to test the latency between two availability zones before configuring HA for synchronous replication. Synchronous replication can be used in non-availability zones scenarios, such as availability sets in Azure or deploying in same availability zones.
Database replication does not always provide automated failover by design to protect against failures. Operating system (OS) clustering in tandem with database replication is used for automated failover capabilities.
Redhat and SUSE Linux flavors provide the OS clustering pieces with pacemaker cluster. Pacemaker cluster have inbuilt features and database modules to work in tandem with database replication mechanism. For example, SUSE provides SAPHanaSR technology for SAP HANA HA on SUSE Operating System. Some of the config guides for configuring SAP HANA HA on Azure are provided below.
https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/sap-hana-high-availability-rhel
https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/sap-hana-high-availability
MSSQL can be configured for HA using SQL Always on with MSCS (Microsoft cluster services). IBM DB2 HADR can be integrated with Redhat and SUSE pacemaker as well.
One other aspect about HANA is HANA scale out. It is typical to use HANA stand-by node for application resiliency on-premise. In cloud, the concept of stand-by node is slowly evolving. You can look in HANA certified hardware directory to understand which cloud provider provides this capability.
https://www.sap.com/dmc/exp/2014-09-02-hana-hardware/enEN/iaas.html#
ASCS/SCS and ERS is the next component to be protected using HA. In hyper scale cloud there are essentially two flavors of operating systems: Windows and Linux. (From my experience, I have not seen any flavors of Unix being deployed unless there is a non-cloud native configuration). In Linux the most commonly used package for ASCS/SCS and ERS HA is pacemaker. The most widely used mechanism on Windows is Microsoft Cluster Services (MSCS). There is another third party mechanism to achieve ASCS/SCS and ERS HA as well, which is beyond the scope of this article. Do not forget the aspect that I mentioned about HA in availability zones vs. non-availability zones.
Before I wrap up the topic of HA, I would like to mention one aspect about application servers (dialog instances). Running your application servers in one availability zone and running ASCS/SCS and/or databases in another availability zone may cause issues if the latency between availability zones are high.
DR
DR configuration in cloud is necessary to protect against any regional outages. I do not recommend configuring DR in the same region across availability zones. The recommendation is to always use DR across regions in the same geo-political region. Cross continent DR is possible but not a very common scenario, even for global companies.
First major decision when it comes to DR is picking an appropriate region. Picking an appropriate region involves consideration of multiple factors like recovery time objective (RTO), recovery point objective (RPO), connectivity to the DR region, ability of interconnected systems to failover, etc. Once the region is picked with a combination of factors, we can move to technology decision.
Like HA, DR requires protection of database instance, ASCS/SCS instance, and SAP application servers (dialog instances).
The recommended way to protect against DR is database replication and backup and log replication. Database replication will help to recover from most scenarios. Certain exception scenarios may require full database backup and logs for point in time recovery. Retention and replication of logs to DR region should be aligned to RPO and RTO requirements. Database design should incorporate this. For example, let’s say a customer is running SAP on SAP HANA and RPO is 5 mins, although database replication is enabled. If log back-up occurs every 15 minutes, there is a chance of data loss. Also, in our experience most customers (some exceptions exist) have RPO anywhere as low as 5 to 15 minutes or higher. Lower than that is an exception. , The cost of designing a solution with RPO lower than 5 minutes is going to be order of magnitude higher.
Database replication to a cross region DR site should be in asynchronous mode. Synchronous mode replication will result in severe database problems in primary region.
SAP Hana system replication, MSSQL always-on replication, HADR replication, and Oracle Data Guard all provide the ability to replicate asynchronously to DR site. So during planned DR test/failover scenarios, once the primary region is closed for transactions, it is possible to ensure that all transactions are replicated to the DR site before opening up the DR database. For unplanned DR failover, the database mechanism ensures when the database is started, recovery happens to the most recent consistent state, in which case there is a potential for data loss. It is critical to ensure that you monitor database replication to the DR site. Unless it isa monitored, there is a potential to miss RPO SLA. For a fail safe mechanism, it is important that database back-up and log back-ups are available in the DR site. Although RTO could take a hit to restore databases, businesses can at least continue post-recovery until the primary site becomes available.
The other mechanism to enable database DR is through storage snapshot replication. Most databases have snapshot capabilities. In combination with underlying storage utilized, storage snapshot is another fail safe mechanism. Database recovery using storage snapshot can have a reduced RTO when compared to full database recovery.
While database is the most important component for SAP DR, ASCS/SCS and application servers need to be protected and available as well.
Typically, ASCS/SCS and application servers do not have any transaction data while in a few cases, there could be interface or batch related files. One way of accomplishing DR is to enable sync between the primary site and the DR site, in addition to the ability to bring up SAP instances. There are multiple ways to achieve this. In Azure the most common method is to achieve it through Azure Site Recovery (ASR). AWS achieves this through CloudEndure. There are other ways to achieve DR of SAP instances. One is through rapid automation. If the design of the SAP landscape includes running pre-production/non-production systems in the DR region, these can be repurposed for production DR as well.
Before we wrap up DR there are couple of topics that I would like to discuss. Domain Name Service (DNS) and bubble DR test.
DNS is one of the aspects outside of SAP to be taken care of during DR test. The time of DNS replication can influence the time it takes for users to access SAP systems. While a controlled group can access through host file changes, manipulating host files is not scalable.
Although it is complex, there have been cases where customers have explored whether IP addresses can be reused in a DR site. It is not recommended to use an IP address in a DR site running in the cloud for multiple reasons, the major one of which is related to routing. It may require major reconfiguration of the network to reuse an IP address in a DR site. While this is not impossible, this requirement should be specified during or before network design in cloud. Enabling the aspect of reusing an IP address post-network deployment can cause significant rework.
Another aspect is what I call a bubble DR test. Cloud provides quick ability to perform a DR test without shutting down the primary site. While this can be achieved on-premise as well, cloud provides the ability to automate through scripts. Segregating the network to perform the DR test can be achieved in Azure through definition of Network Security Group (NSG). AWS provides this ability through security groups as well.
With this, we will wrap up advance architecture related to automation, HA and DR topics. I plan to cover backup, security, migrations, and SAP Basis cultural shift in subsequent articles. If you have any topics that you would like to cover, please do not hesitate to email me.