Review these best practices in developing a data flow and understand the importance of reusability in SAP BusinessObjects Data Services job design. When constructing a good data flow, it is important to minimize the effort by thinking in terms of reusability up front in the design process. Considering the code as a reusable object is the best way to approach extraction, transformation, and loading and development with SAP BusinessObjects Data Services.

Key Concept

SAP BusinessObjects Data Services is SAP’s data integration and data quality solution. SAP BusinessObjects Data Services composes both batch and real-time jobs to fulfill a wide variety of enterprise information management (EIM) needs. The SAP BusinessObjects Data Services data quality engine is embedded natively in SAP NetWeaver Master Data Management (SAP NetWeaver MDM). SAP has established SAP BusinessObjects Data Services as the go-to extraction, transformation, and loading and data quality solution for SAP applications.

Extraction, transformation, and loading (ETL) is an integral task in every IT organization. ETL is used in a variety of tasks, from porting an operational application from an aging platform to a modern platform, to converting ERP data from one version to another, or loading a BI construct such as a data warehouse or data mart.

SAP BusinessObjects Data Services is a tool for ETL tasks within your organization. Connectors for SAP BusinessObjects Data Services are available for most common database platforms, as well as application-specific adapters that consume operational processing logic. Adapters for SAP BusinessObjects Data Services are available for systems such as SAP, Siebel, and Oracle. The examples I use demonstrate customer and customer_address data merging to form a customer dimension in a data mart. SAP BusinessObjects Data Services is ideal for operations like these, using its graphical transforms to fulfill dimensional needs right out of the box.

I frame my discussion around loading data into a data warehouse and examining best practices in constructing data flows within an SAP BusinessObjects Data Services job. Through examples, I demonstrate the nature of a properly constructed data flow that lends itself toward efficient processing on the SAP BusinessObjects Data Services job server, as well as proper job design and construction. SAP BusinessObjects Data Services is an object-oriented tool, which allows solutions to be delivered faster and cheaper with less code to maintain in the future.

Always Design the Job with Reusable Code

At the end of an ETL job, data has only moved from point to point. The developer must always think about reusability in the context of the job. You should always design the job with the idea of reusing one piece of code to perform multiple operations. To do this, you must manipulate the code with variables and conditionals to change the meaning and operation of the code (Figure 1).

Figure 1

Use variables and conditionals to drive logic

In Figure 1, note how the workflow WF_EGW_POS_LOAD_POS is used three times. Inside the workflow, more than 15 scripts and data flows exist. By using the conditional and variables to drive the logic to design the code with reusability in mind, I reused the same workflow consisting of 15 objects rather than making three separate workflows consisting of 45 objects in total.

The advantage of this is that it not only reduces the development time, but also makes alterations and later modifications of the code much easier. Using this method, a change to a data flow only happens once instead of three times because the data flow is only used once in a workflow that is used three times.

Proper Data Flow Construction

Now let’s focus on the proper construction of a data flow, the base unit of data movement in an SAP BusinessObjects Data Services ETL job. Properly designed data flows enable peak efficiency and maximum performance on the job server.

SAP BusinessObjects Data Services performs at its best when operations within a data flow are compartmentalized and broken down into sections. You should always divide the data flows into meaningful work sections or stages. Usually, four stages suffice for most ETL operations:

1. Driver stage

2. Parsing stage

3. Lookup stage

4. Business rules enforcement stage

The Driver Stage

The driver stage limits the data set that the SAP BusinessObjects Data Services job server processes. The driver section’s purpose is to push down as much work to the source database server as possible. SAP BusinessObjects Data Services works best when logic is distributed across the database server and the SAP BusinessObjects Data Services job server.

Having a good driver stage is essential for this to occur. Consider the example in Figure 2. This data flow has the following stages: qry_Driver is the driver, qry_Parse is the parsing, qry_lookup is the lookup, and case_Cust_Type and Validation are the business rule enforcement.

Figure 2

Driver and parsing stages

This data flow denormalizes customer and address data into a customer_address dimension in a data warehouse. The customer and address table both exceed 20 million records in the source staging database, so only changed data should be processed. Changed data is captured by including the date modified field in the where clause of the qry_Driver transform, allowing the recordset to be driven by processing data that has changed.

The Parsing Stage

The parsing stage occurs after the recordset is limited by the driver stage. Operations move from the database engine to the job server engine, in which most of the data scrubbing occurs. Parsing can consist of operations such as substituting mixed case to upper case values, functions to evaluate five- or nine-digit zip codes, or the conversion of NULL values to defaults for referential integrity.

Note
Any operations or Data Services functions that have an equivalent function that is present in the source database are pushed down to the database server and, in a sense, combined with the driver stage.

Consider the example of a combined push-down operation to the database. The data flow has two query transforms. The qry_Driver transform is the driver stage that contains the where clause that covers the delta range and only returns the changed records from the source (Figure 3). The qry_Parse query transform contains an Upper Data Services function to convert the last name field to upper case (Figure 4). In this example, the source database is SQL Server with an equivalent Upper function to convert the case of the field. SAP BusinessObjects Data Services combines both transforms operations into the SQL statement that it pushes down to the source database. Figure 5 shows the SQL statement that is pushed down to the SQL Server database from the data flow.

Figure 3

The where clause in the driver query transform in the data flow

Figure 4

Field mapping for the Last_Name field with the Data Services upper function in the parse query transform in the data flow

Figure 5

SQL statement pushed down to the SQL Server database

The Lookup Stage

The lookup stage follows when the driver and the parsing stage are complete. As stated earlier, sometimes it is possible to push down the first two stages, but the lookup stage always occurs in the job server engine. Lookups are powerful tools and very useful, but expensive operations. They are iterative operations that happen record by record — this type of operation only occurs in the job server. Because they are expensive and only happen in the job server, it makes sense to have the first two stages drive the data down and perform the parsing before the lookup begins.

If lookups are so expensive, then why use them? Lookups allow you to identify attributes relating to a customer, but only if certain conditions are met. You can write them by using the syntax in Figure 6.

Figure 6

Syntax for lookups

You can also create lookups by using the SAP BusinessObjects Data Services graphical user interface (GUI) as shown in Figure 7. One great enhancement to the lookup_ext function in SAP BusinessObjects Data Services 3.0 is that after you have written a lookup in the syntax, you can right-click the function statement in the field mapping and SAP BusinessObjects Data Services launches the GUI in Figure 7 to make the function easily readable even to novice Data Services users. This allows a greater sharing and understanding of complex development efforts across the team.

Figure 7

Lookup GUI in the LkuShippingCode query transform

Lookups fill a variety of data warehousing roles as well as operational roles. Think of lookups as tools whenever you need to perform an operational outer join inline in a data stream. SAP BusinessObjects Data Services actually creates the stored procedure at runtime in the job server engine, taking considerable coding out of the hands of the developer.

The Business Rules Enforcement Stage

The business rule enforcement stage is the final stage of the data flow in Figure 2. Case logic first splits the data path and then can do anything from remapping fields to changing values or where clauses based on rules set up in the case transform. Figure 8 shows how the data is split into the three paths based on values in the field CSTMR_SALES_OUTLET_TYPE.

Figure 8

Inside the case transform

After the three paths issued by the case transform, the validation transform casts out errors that do not meet certain rules to the error log. In this example, if the CSTMR_ID field was less than zero, then that signifies an error record for the error log.

Writing out to an error log is a great solution, but many other options or actions could occur. For instance, the flow could set a variable as an output that would cause SAP BusinessObjects Data Services to send an email to a group responsible for the error or even run an SAP BusinessObjects Web Intelligence report sourcing from the error log table. I used case and validation transforms in this example data flow, but there are many other tools available within SAP BusinessObjects Data Services. The options are only limited by the creativity of the developer.

Don Loden

Don Loden is an information management and information governance professional with experience in multiple verticals. He is an SAP-certified application associate on SAP EIM products. He has more than 15 years of information technology experience in the following areas: ETL architecture, development, and tuning; logical and physical data modeling; and mentoring on data warehouse, data quality, information governance, and ETL concepts. Don speaks globally and mentors on information management, governance, and quality. He authored the book SAP Information Steward: Monitoring Data in Real Time and is the co-author of two books: Implementing SAP HANA, as well as Creating SAP HANA Information Views. Don has also authored numerous articles for publications such as SAPinsider magazine, Tech Target, and Information Management magazine.

You may contact the author at don.loden@protiviti.com.

If you have comments about this article or publication, or would like to submit an article idea, please contact the editor.

Articles

Best Practices in SAP BusinessObjects Data Services Data Flow and Job Design

Published: 04/February/2011

Reading time: 9 mins

Membership Required

You must be a member to access this content.

View Membership Levels

Explore related questions

Already a member? Log in here

- SAP Finance
SAP Simple Finance: What Happened to the Cost Elements?

Published: 08/October/2015

Reading time: 12 mins
Long-Term Support of Rich Internet Applications: What You Need to Know to Be Successful

Published: 24/February/2012

Reading time: 10 mins
Advanced Data Mining: Clustering Lets You Find New Customers Hidden in Your Data

Published: 15/April/2007

Reading time: 11 mins
How SAP ERP Combats the Challenges of Global Accounting and Financial Reporting Standards

Published: 09/December/2009

Reading time: 16 mins

Key Concept

Always Design the Job with Reusable Code

Proper Data Flow Construction

The Driver Stage

The Parsing Stage

The Lookup Stage

The Business Rules Enforcement Stage

Don Loden

Best Practices in SAP BusinessObjects Data Services Data Flow and Job Design

Membership Required

Explore related questions

More Resources

SAP Simple Finance: What Happened to the Cost Elements?

Long-Term Support of Rich Internet Applications: What You Need to Know to Be Successful

Advanced Data Mining: Clustering Lets You Find New Customers Hidden in Your Data

How SAP ERP Combats the Challenges of Global Accounting and Financial Reporting Standards

Become a Member

Upcoming Events

Mastering SAP Collaborate an SAP TechEd on Tour event

Key Concept

Always Design the Job with Reusable Code

Proper Data Flow Construction

The Driver Stage

The Parsing Stage

The Lookup Stage

The Business Rules Enforcement Stage

Don Loden

Share this Article

Best Practices in SAP BusinessObjects Data Services Data Flow and Job Design

Membership Required

Explore related questions

More Resources

SAP Simple Finance: What Happened to the Cost Elements?

Long-Term Support of Rich Internet Applications: What You Need to Know to Be Successful

Advanced Data Mining: Clustering Lets You Find New Customers Hidden in Your Data

How SAP ERP Combats the Challenges of Global Accounting and Financial Reporting Standards

Become a Member

Upcoming Events

Mastering SAP Collaborate an SAP TechEd on Tour event