Apply these tips to master data-based aggregates to improve your system’s performance.

Most BI professionals know the basics of aggregates. Many would say, “Aggregates are baby InfoCubes that include fewer characteristics than the basic parent InfoCube and therefore summarize the data even more than the InfoCube does.” When queries can use aggregates, the result is faster reporting.

That definition is fairly accurate. However, in addition to characteristics from the InfoCube, you can use navigation attributes and hierarchies to build aggregates. This results in even more improvements in speed for what would otherwise be slower hierarchy and attribute queries. In this case, issues around aggregates can mushroom and the nuisances associated with aggregate administration in these situations are often unknown. This is especially true with performance issues related to navigation attributes. For ease of discussion I focus on navigation attributes, but the concepts are almost universally applicable to hierarchies as well.

My example in Figure 1 has both a time-dependent navigation attribute (Mat_Color) and a time-independent one (Cust_State). I included InfoCube characteristics (in the dimension tables) of state and color. State refers to where someone lives and the color refers to the color of the item he or she selects. The color and state are both in the InfoCube and the master data.

Note
If you have SAP NetWeaver BW Accelerator and do not use aggregates at all, this article is not applicable to you.

Figure 1

Simplified tables for InfoCube master data and a flat aggregate

The duplicated characteristics show that navigation attributes can display different views of data changes. What could Sherlock Holmes discern from the data in Figure 1? My guess is that he would observe that pink trucks don’t sell well, so they are now trying green ones. He might also observe that many customers prefer warm climates such as Florida over Alaska, but once in the warm climate, these customers stop buying toys.

However, with an extended star schema like this you can answer many versions of history questions, such as customer sales by state now or at the time of the order. You could also query sales by color now, when the sales transaction was loaded, or at any point in time. These options exist because the color and state InfoObjects are both in the cube’s dimension table and in the master data.

When you use the navigation attribute color from the master data in the aggregate you must also specify a date to select the master record you want. This is known as a key date. This is required because the aggregate uses time-dependent master data. A key date is also required on the query because it must match that of the aggregate to use the aggregate.

Note
In this example, I restate the data according to the state field of the customer and the color of the material as of today, as opposed to using the InfoObjects from the InfoCube’s dimensions.

Now let’s get into the performance aspects. When aggregates are made with navigation attributes, the improvement in performance for the targeted queries is massive. This is because the technical linking of the aggregate tables to the master data (and hierarchy) tables is much more direct than it is with just an InfoCube. You do not need to know the technicalities of this nuance and it is not shown in Figure 1. However, the query SQL for this navigation attribute example when only a basic InfoCube is involved includes four master data system ID (SID) tables, the two dimension tables, and the fact table — a total of seven in this example. When you use the aggregate in my example, only three tables are read: two SID tables and the fact table.

Look at the execution path (transaction RSRT > Execute and Debug) both before and after you have an aggregate similar to the one shown in Figure 1. For this example, a query on the Cust_State navigational attribute would access the X SID table of Cust and the fact table of the aggregate. A query using Mat_Color would read the Y SID table of Material and the fact table of the aggregate. Note that only the simplified version of the InfoCube is shown (i.e., no X and Y tables or dimensions are depicted in Figure 1).

The point is that instead of seven tables you only need three if both Mat_Color and Cust_State are used. This assumes that in both situations, the aggregate consists of 14 or fewer objects (navigation attributes/hierarchy levels or characteristics) and the original InfoCube had no line item dimensions. This is because the aggregate, if these criteria are met, is flattened automatically using the concept of line item dimensions. (Refer to the Line Item and High Cardinality document for more information about line item dimensions.) In addition, the aggregate fact table points to different SID tables than the InfoCube used — S versus X and Y.

This indeed improves performance for the end user. Yet I am sure most readers have heard the expression that there is “no such thing as a free lunch.” This is especially true with master data and aggregates. It is sometimes a very expensive lunch. In addition to the disk space and rollup costs incurred with regular aggregates, master data aggregate costs are in data latency (the delay between the load of changed master data and when users get to see it) and the significant processing time in the system when master data changes happen. After changes occur to master data or to the time validity of the master data, not only do you need to update the master data as you would if you had no aggregates (a very short process normally) — you also have to adjust or rebuild the aggregate InfoCubes that were constructed based master data when this master data changes.

Most BI professionals are aware of the job called the attribute/hierarchy change run. This is the job that is run after master data is loaded to look for changes to existing records (e.g., Ned moves to CA), fix or rebuild the associated aggregates, and then activate (i.e., allow the query read of) the updated master data.

A job that fewer experts know about is called time-dependent aggregate adjustment run. This is an additional job run after master data is loaded. In this case, the system fixes or corrects aggregates built with time-dependent master data when the time validity changes. In my example data, part number truck1 was changed to purple from green starting tomorrow until 12/31/9999. The aggregate needs to be fixed because my aggregate is based on a variable called today (current day). Therefore, The time adjustment run would decrease the green entry and add some purple. Starting tomorrow, all truck1 sales should show as purple in an aggregate based on the current date.

Strategies for a Better SAP NetWeaver BW Implementation

Here are some strategies you can use to reduce efforts and thus have a smoother running, more responsive SAP NetWeaver BW installation. They include:

Flat aggregates
Configuration parameters for aggregates
Basis aggregates
Parallel processing
Key date variables
SAP NetWeaver BW Accelerator

Flat Aggregates

Flat aggregates (aggregates with 14 or less objects) are great for reporting performance because they allow only a two-table read for very simple queries. They also are good for rollup (adding new data to the aggregates) performance. Therefore, there is almost no reason I can think of not to make all your aggregates flat.

Configuration Parameters for Aggregates

Following are configuration parameters for aggregates accessed via transaction RSCUSTV8 and the description of what they do:

Configuration of the adjustment percentage: If an aggregate is “wrong” in that it is based on the old value for the master data, then it is fixed during the attribute/hierarchy change run in one of two ways: either by a complete delete and rebuild and or by added offsetting records. In my example, if Ned now has moved back to the cold, you could add -10 FL and +10 MN to the existing aggregate. If you use this offset technique, then the negative is a slightly slower-running aggregate until it is compressed.

You can compare the cost of adding offsetting records versus a rebuild with just one record. Your SAP system gives you a choice. When the affected records meet a certain percentage of the total (the default is 20%), the system rebuilds the aggregate. You can experiment with this percentage number and evaluate the overall effect on change run time and query performance, though there should not be a huge difference in most cases.

Block size for building aggregates: If large InfoCubes with lots of data are involved, then you might consider changing the default block size parameter. If your system has a lot of unused memory, then you can increase it. You can review the F1 help on the block size field in transaction RSCUSTV8 for more information.

Partitioning: For aggregates in general — although it is not a performance factor for master data aggregates specifically — partitioning of the InfoCube and its aggregate is normally a wise option. If partitioning is performed on the source InfoCube, the rollup time is faster in general, and reporting is faster because aggregates are also partitioned.

Basis Aggregates

Although it sounds strange, you might have an aggregate that no query ever uses. In Figure 2, the hypothetical record counts are designed to make a point: The InfoCube has 10,000,000 records, yet there are only 200,000 customers and all of them currently live in Florida.

Figure 2

Basis aggregates

Assume the data represents the bagel sales in the Miami InfoCube. If you want to have an aggregate “Customers who live in FL now,” you would just build it and it would be filled via an initial lookup of 10 million rows to find the customers’ states in the master table. This is shown by step 1 in the figure.

Now let’s assume many of your customers move to New York for the summer (a real possibility). When you run the change run, you need to rebuild the aggregate again by looking up 10 million rows. However, instead you can insert an aggregate on customer only, resulting in 100,000 records. This takes some time every night, but not that much time. Now when you do the change run, instead of looking up 10 million rows and finding their state, you just need to look up 100,000. This is a huge savings in change run time.

You can define a Basis aggregate as an aggregate that may or may not be used by reporting but contains only InfoCube characteristics. It is created to improve the change run time for aggregates that have navigation attributes or hierarchies. Another requirement is that the navigation aggregate is a child baby to the Basis aggregate.

Note
Basis aggregates are called “Basis” only because in the old SAP R/3 days, the Basis guys were 100% responsible for performance. They are not really created by the Basis team.

Parallel Processing

As you can imagine, the change run can be a time-consuming task and so can rollup. If you use common process chain techniques, both processes can use parallel work processes. This increases the speed at which the needed tasks are completed, but also at a cost: fewer resources for other tasks. Parallel processing is set in the variants of process chain process types.

Key Date Variables

As shown in Figure 1, because the aggregate is built as a time-dependent navigation attribute, you must specify which records you want for the master data (not the InfoCube) based on a single validity date or key date. In my example, this is dynamically derived to today using a current day SAP-delivered variable in the aggregate definition for the key date. This means that every night I need to run a job and fix my aggregate, as each night the key date changes. I must use the color of the material on this new date.

If my materials change color frequently, this will be painful. Instead, consider having a weekly or monthly key date for most reporting requirements — first day of the month (FOD), for example. In this case I would still run my time adjust job, but most nights it would take only seconds to run because the derived date for the aggregate would stay the same for an entire week or month. Significant time delays would only occur at the first of each month. The only negative (or positive) is that only queries using this same FOD variable would use the aggregate, and thus users would need to agree to run these special variable key date-based queries and only consider master data changes weekly (or in my case, monthly because it only changes on the first day of the month).

SAP NetWeaver BW Accelerator

I saved this for last. Yes, all the above strategies work and will make your life easier when it comes to master data aggregates, but the whole process becomes nearly moot if you use SAP NetWeaver BW Accelerator. Instead of long-running rollups and change runs, you just need simple index updates. Building aggregates to anticipate the end users’ analysis needs goes the way of eight-track tapes. The users see 10- to 100-fold increases in speed, and you see similar decreases in support time and effort. However, there still is no free lunch because you do need to take into account the cost of installing SAP NetWeaver BW Accelerator.

Note
For more information about implementing SAP NetWeaver BW Accelerator, read “Best Practices for Creating a Sound Strategy for Your SAP NetWeaver BW Accelerator Implementation” and “Best Practices for Creating a Sound Strategy for Your SAP NetWeaver BW Accelerator Go-Live and Support.”

Ned Falk

Ned Falk is a senior education consultant at SAP. In prior positions, he implemented many ERP solutions, including SAP R/3. While at SAP, he initially focused on logistics. Now he focuses on SAP HANA, SAP BW (formerly SAP NetWeaver BW), SAP CRM, and the integration of SAP BW and SAP BusinessObjects tools. You can meet him in person when he teaches SAP HANA, SAP BW, or SAP CRM classes from the Atlanta SAP office, or in a virtual training class over the web. If you need an SAP education plan for SAP HANA, SAP BW, BusinessObjects, or SAP CRM, you may contact Ned via email.

You may contact the author at ned.falk@sap.com.

If you have comments about this article or publication, or would like to submit an article idea, please contact the editor.

Articles

Uncover Performance Improvements When Working with Aggregates Based on Master Data

Published: 02/March/2010

Reading time: 11 mins

Apply these tips to master data-based aggregates to improve your system’s performance. Most BI professionals know the basics of aggregates. Many would say, “Aggregates are baby InfoCubes that include fewer characteristics than the basic parent InfoCube and therefore summarize the data even more than the InfoCube does.” When queries can use aggregates, the result is…

Membership Required

You must be a member to access this content.

View Membership Levels

Explore related questions

Already a member? Log in here

Use Payment Order Functionality to Clear Invoices Being Paid

Published: 31/May/2011

Reading time: 8 mins
- SAP Business Technology Platform
- Premium
Leveraging SAP BTP Developer Tools for a Clean Core Strategy
BW Developers Benefit from Using Production Data for Testing

Published: 01/September/2006

Reading time: 11 mins
- Premium
The Role of EA on the Journey to the Intelligent Enterprise

Mastering SAP Collaborate an SAP TechEd on Tour event

November 12 - 14, 2025

Sydney, New South Wales

Australia

Learn More
View Event