SAP Data Services: Integrating with SAP BW on SAP HANA
SAP Data Services 4.2 is certified by SAP for use with SAP BW up to version 7.5 and SAP HANA 1.0 SP12, see the SAP Data Services Product Availability Matrix (PAM) for details. To use an SAP data source for SAP Data Services, you need to configure an SAP Data Services DataStore. This DataStore represents the logical link that enables metadata to be imported. (It is worth noting that SAP Data Services connectivity to SAP BW standalone or SAP BW on SAP HANA involves the same options and steps. Thus, the terms SAP BW and SAP BW on SAP HANA are synonymous and used interchangeably throughout this article.) SAP Data Services can be implemented as both an inbound provider (with data sources) and as an outbound provider (with InfoProviders and open hubs).
To load data that is extracted from an SAP BW system into SAP HANA using SAP Data Services, you need to establish a Remote Function Call (RFC) connection. To extract data from an SAP BW InfoCube, you need to create an SAP BW open hub and SAP BW process chain. To achieve this, create an RFC destination (TCP/IP) as a registered program in SAP BW.
The next step is to create a connection to the RFC server using the Data Services Management Console, which links to the SAP BW RFC connection that was established. Finally, an SAP Data Services job is built that runs the SAP BW open hub that extracts the cube data to the SAP BW database table. Subsequently, SAP Data Services queries the data through the RFC via the SAP Data Services server. This server then loads it into SAP HANA. SAP HANA studio offers compatibility with SAP Data Services to access external metadata. Figure 1 illustrates the link between all the components involved to achieve this process.
As of SAP Data Services 4.1, an Eclipse-based user interface (UI) is also available. This replication of data and metadata, such as table definitions, from most third-party databases directly into SAP HANA uses an intuitive step-by-step wizard. With only three steps, it is now possible to replicate massive amounts of records and hundreds of tables from any data source to the target SAP HANA system.
The three steps are:
- Set up the connection to the source and target systems (database or application)
- Select the requisite tables from the source for replication (target tables are automatically generated)
- Develop, deploy, and execute the job to mobilize all data with a single click
After completing the wizard, you can also change the data using filters, add complex data mappings (including expressions and functions), apply load options, and deploy other delta-loading approaches.
SAP BW as an Outbound Provider or Source
When using an SAP data source in SAP Data Services, you need to first define the DataStore as a connection to the SAP system. The next step is to import the metadata into the DataStore. SAP Data Services offers the following options in an SAP BW environment.
Data reads from an InfoProvider, such as a DataStore object, InfoCube, InfoObject, or InfoSet:
- Using an SAP BW Open Service Hub destination service – Performs an SAP BW process that is defined in the workbench and, along with the Data Transfer Process (DTP), reads data from the InfoProvider and then loads an SAP BW Open Service Hub destination table (Figure 2) that SAP Data Services reads as a source table.
- Using an automatically generated process – Automatically generated ABAP programs or RFC functions that are automatically generated read data from fundamental tables of the DataStore object, InfoCube, or InfoObject.
Data loads into an SAP BW DataSource or InfoSource using an SAP Data Services batch job that can be initiated by:
- Executing the SAP Data Services batch job that automatically prompts an SAP BW InfoPackage to load an InfoProvider
- Implementing an InfoPackage within SAP BW to schedule and execute the SAP Data Services load job that has been exported to SAP BW
In addition, it is possible to browse SAP BW metadata using the SAP Data Services Designer. In this case, certain SAP BW Open Service Hub destination tables and their related process chains can be chosen from the SAP BW workbench. SAP BW Open Service Hub destinations are managed from SAP Data Services using the process chains for loading and removal of data.
SAP BW as an Inbound Provider or Target
Use of SAP Data Services to populate an SAP BW system was originally certified by SAP in 2002.This certification guarantees compatibility and support of this type of integration. This approach enables access to SAP BW metadata via an SAP Data Services Designer, which offers InfoSources or DataSources as targets. As shown in Figure 3, SAP Data Services can load data into a data source in the Persistent Staging Area (PSA) through the Data Acquisition Layer or staging Business Application Programming Interface (BAPI) along with backing from the RFC server. SAP Data Services Designer can also initiate loading jobs via the SAP BW workbench by implementing the RFC server, which is integrated with the Management Console to facilitate SAP BW load requests that originate in an InfoPackage.
When SAP Data Services is being used to load data into an SAP BW production system, it is possible to create an SAP application profile (e.g., SAP Data Services_BW_PROD) that outlines further authorizations for PRODUSER. The required authorizations for this profile are:
- S_RFC
- S_RS_ADMWB
To browse and import metadata into an SAP BW target, you need to import the DataSource metadata into the object library. This step can be accomplished either by entering a name or browsing. In summary, with SAP BW version 7.3 and above, ad hoc SAP Data Services jobs can be created to load data from any source. The integration capabilities offered by the SAP BW workbench enable SAP Data Services extraction jobs to be automatically generated in SAP BW.
Data Load Processes for SAP BW on SAP HANA
Today, SAP BW on SAP HANA supports all existing SAP BW 7.4 ETL processes, meaning the customary SAP BW Business Suite extractors designed for Service Application Interface (SAPI) and SAP Data Services 4.x, flat file, DB Connect, UD Connect, and Web Services.
Depending on which version of SAP HANA is being used, you can implement ETL tools to directly load into the SAP HANA database. This includes the SAP Landscape Transformation (SLT) Replication Server and flat files (formatted as CSV, XLS, XLSX, and others). Using these mechanisms, you can automatically create tables in SAP HANA.
Data that is replicated directly to SAP HANA in real time via SLT can be accessed within SAP BW on SAP HANA. In this scenario, SLT real-time custom data marts can be fused from the SAP HANA schema with SAP BW data models via Transient or Virtual InfoProviders. Additionally, SLT can be used to Precisely load data into the SAP BW managed area of an SAP BW on SAP HANA system. SAP BW extractors present business logic and semantics to the source data, whereas SLT only replicates tables on a 1:1 basis.
You can use SAP Data Services to leverage SAP BW standard extractors within the Business Suite to push data into SAP BW or a custom SAP HANA data mart. It is important to note that only certain data sources are enabled to be used with SAP Data Services. For more information about this, refer to SAP Note 1558737. It is possible to use the SAP HANA Direct Extractor Connection (DXC) to pull data from the Business Suite directly to SAP HANA by using the SAP BW standard extractors. The DXC uses the DataSource extractors that are available in SAP Business Suite systems to load data directly from the SAP BW InfoPackage into SAP HANA. DXC is compatible with generic data sources and with any SAP data source containing custom extensions. DXC is available as of SAP HANA Service Pack 4 and can be implemented using a sidecar approach (system landscape disparity) using the SAP BW application in an SAP ERP Central Component (ECC) system. DXC can use established foundational data models of SAP BW business content for deployment in an SAP HANA data mart. Conversely, SLT provides real-time data attainment through individual tables in the SAP Business Suite.
It is vital to distinguish that DXC is not meant to replace the SAP BW scenario. Rather, DXC is meant to leverage SAP-delivered BW business content data sources as a way of minimizing the intricacy of data-modeling responsibilities within SAP HANA data marts and to accelerate timelines for implementation projects. More technical information about DXC can be found in SAP Note 1665602.
As of SAP Data Services 4.0, integration with SAP DataSources including SAP BW has evolved significantly. Further refinement was made to this area with the release of version 4.1. SAP Data Services can now read directly from SAP’s business content extractors in the enterprise resource planning (ERP), customer relationship management (CRM), and supplier relationship management (SRM) applications. The table structure that comprises these applications can be tremendously complex. Extractors serve as semantic views that overlay the tables and offer further capabilities such as delta-change capture. This functionality was introduced in version 4.0 and parallel processing was enhanced in 4.1.
SAP Data Services 4.x also supports native loading of SAP BW DataSource objects and is better integrated with SAP BW as a source system. This integration makes it easier to pull data from SAP Data Services sources directly into SAP BW DataSources, and it all can be done from the SAP BW interface. Note that these capabilities are only available in SAP BW version 7.3 or later. SAP Data Services 4.1 introduced improved parallel loading when SAP BW Open Service Hub’s tables are used as a source, improving the efficiency of moving data out of an SAP BW system and into other systems for analysis and reporting.
SAP business-content extractors, which were originally developed for use with SAP BW, are the semantic depiction of ERP data for analytic uses. They are currently available for non-BW use via SAP Data Services. These extractors should work without interruption when concurrently used for SAP BW extraction. Furthermore, they can be implemented toward populating other data marts or an operational DataStore, instead of, or in parallel with, SAP BW loads.
SAP Data Services is a popular solution for SAP BW implementations since it is more effective at certain types of transformations than the SAP BW interface. One example is pivot transformations, which are more challenging within SAP BW and often require custom coding. In certain cases, developing intricate transformations could be simplified via SAP Data Services custom development language (Python) as opposed to ABAP. Thus, custom transformation development becomes easier to reuse and the complexity of the SAP BW environment is lessened.
SAP Data Services presents a unique software life-cycle management approach. SAP BW modifications are frequently limited to pre-planned transport windows on a weekly or monthly cycle. Although this limitation ensures system stability, this can be problematic for transformation logic that is meant to track changes in source systems. Isolating source system-oriented transformations in SAP Data Services permits a software life-cycle management policy with more rapid assessment of source-system changes and errors.
Various source-system types for SAP Data Services behave differently in an SAP BW environment. SAP Data Services source systems enable users to create data sources that exactly emulate the layout of an SAP Data Services source. External system sources allow users to create data sources with arbitrary structures that are populated using any SAP Data Services job.
In SAP servers, SAP Data Services offers a function known as /SAP Data Services/RFC_ABAP_INSTALL_AND_RUN in the transport file. This function is implemented in the generate_and_execute mode that is defined in the SAP DataStore. The result of running this function is ABAP code within SAP Data Services that can then be executed on the SAP server. Dynamic program creation is therefore possible, and this is valuable in a development environment. It is important to note that the function includes write access to an SAP server that could violate security constraints, so the function needs to be removed from the SAP server if it is considered a risk.
General Guidelines for Extractor Selection
There are cases when an extractor can be replaced by a newer iteration, so ensure that you are implementing the latest version. Ideally, it is best to select extractors that offer granular data and not the ones that have already aggregated the data. Be conscious of primary key information in extractors as this is not used by SAP BW and consequently may be absent or erroneous. Do not assume that all extractors of a given delta process have congruent field values, and carefully analyze the requirements for each extractor.
Every extractor has certain capabilities, especially those relevant to delta processes. Numerous extractors return a handful of data and use of delta tracking (tracking of differences) does not make sense for these cases. Other extractors share the records that have been modified, but do not retrieve deletions, and no deletions are allowable in the source. There are also extractors that provide comprehensive information from insert, updates, before and after image values, and deletions. The different types of SAP BW extractors are shown in Figure 4. It is the responsibility of the ETL developer to build the appropriate dataflow.
Building Dataflows with Extractors
Using extractors is like using tables since both are queried and return data upon execution. The principal distinction is that unlike table reading, it is a common practice to add the extractor to a standard dataflow. When an SAP table is affixed to a standard dataflow, it is processed via the RFC_READ_TABLE function module that retrieves all the data in a single function call. This is acceptable for a few rows of data, but not for larger amounts of data. Extractors differ since the Application Program Interface (API) is like a SQL dataset. The extractor is prepped and initiated, the arrays of records are consecutively retrieved, and the data is promptly loaded to the downstream transform.
The initialization flag is used to signify that the extractor is reading all the data, and not just the delta. This is the same concept as the truncate table flag concept, which is found in the target table for an initial load. During the first run, the target table is empty, so truncation is unnecessary. If the initial load needs to be re-executed, the dataflow ensures that the data is truncated first. This applies for the extractor as well. If the read never occurs, then setting the initialization flag is of no consequence. If the initial load is re-run and the initialization flag is set to No, then only the delta would be retrieved.
A global variable can be used if there are two dataflows, one with the initialization flag set to Yes and the other to No. In most cases, there are two dataflows since the initial load includes the data as quickly as possible and the delta load uses its logic to account for changes and deletions.
(Note: To learn the steps to establish an RFC connection between SAP Data Services and SAP BW, click this link: https://scn.sap.com/docs/DOC-29394.)