Live from SAPinsider Studio: Anurag Barua on Data Quality for the Digital Enterprise
Independent consultant and longtime SAP technologist Anurag Barua joins SAPinsider Studio at the BI-HANA 2016 event to discuss data quality for the digital enterprise, including the role of SAP Data Services and SAP Information Steward. Topics of this discussion include the material impact of poor data quality, how a transition to a digital core and real-time data quality management changes the dynamic, organizational buy-in, and how SAP Information Steward and SAP Data Services serve a forward-looking digital enterprise.
This is an edited version of the transcript:
Ken Murphy, SAPinsider: Hi this is Ken Murphy with SAPinsider. I’m at the SAPinsider BI-HANA-IoT event in Las Vegas. This afternoon I’m pleased to be joined by Anurag Barua, who is a long-time SAP professional technologist, and he is here today to talk to us about data quality issues. Anurag, thanks for being with us.
Anurag Barua: Thanks Ken, and thanks for giving me the opportunity. Thanks for the introduction. So I’ve been in the SAP ecosystem for about 18 years, and I’ve been involved with a variety of SAP implementations over the years and now one of the areas that I’m very passionate about is data, data quality. And I’d also like to highlight my ongoing relationship with Wellesley Information Services (WIS). I’ve been speaking for Wellesley for the last 10 years, I’ve written a lot of articles and white papers, and I’m glad to be here.
Ken: In that time, obviously there’s been some changes with how companies look at data quality. With data volume increasing and the complexity of data, how should organizations begin to think about data quality? What are some of the problems that they have by not looking at data quality differently in a new way of the digital enterprise?
Anurag: I’ll start off with a statistic. According to a recent study, the cost incurred by a typical Fortune 200 company because of data quality issues is in the vicinity of $5 million, and it’s probably a lot more if you look at the downstream impacts of that data. The numbers speak for themselves, but going back to what are some of the specific problems of data quality and bad data quality, No. 1 I would say is the impact on your financial statements. You could be overstating revenues, understating expenses, and vice-versa. So it has a direct material impact on your financial numbers. Secondly, and very important is the quality of decision making in any organization is directly correlated to the quality of data. And so if the quality of data is suspect and unreliable, the quality of decision making therefore tends to be based on best guesses and hunches which is to say that you’re not making the right decisions. Poor data quality has a tremendous impact on the quality of decision making. And thirdly, data degrades over time so if it’s not nurtured or kept in good health or if it’s not clean enough it keeps getting worse and worse and the cost over time to fix that data gets higher and higher. Those are some of the challenges I see companies of all kinds globally going through.
Ken: How does operating in a real-time environment – specifically with SAP HANA and the move to becoming a real-time environment – how are traditional data quality practices coming up short in that regard?
Anurag: Traditional data quality practices have tended to be after the fact. So basically what they say extraction, transformation, loading – ETL – so you first extract the data, which could be from your SAP ERP system, and then you try to fix the data issues, and then once you’ve identified the data issues you go about fixing those data issues but it’s already too late. Because your transactions are in the system, a lot of decisions have been made on that, so now to go back and fix those is not only taking time, but it’s costly. In today’s day and age where information is expected to be available within nanoseconds with all sorts of mobile devices no one has the luxury of waiting that long. Especially with the advent of technologies like SAP HANA, it’s become an imperative to do real-time data quality management.
Ken: What does that entail? What does an organization have to do to arrive at that state of real-time data management?
Anurag: Thankfully there are tools and application just within the SAP suite, let’s just stick to that, that are available today and some of them are basically enhancements to existing technologies such as Data Services. So Data Services is the traditional data cleansing, data quality management tool, but you can also get to some real-time data quality activities and management. But again that’s not typically Data Services’ strength. But in HANA today there is functionality that has been built that allows the quality of data to be checked. There’s pre-built functionality that you do duplicate checks, geo-coding, address-matching and a lot more standard data cleansing activities so that data that gets into the system is clean and you don’t have to wait for the transaction to have to go back and fix your data issues.
Ken: And what role does Information Steward play? Is that the tool you’re referring to?
Anurag: Yes, Data Services and Information Steward are very closely related. They’re complementary to each other and in fact there are a lot of customers that seem to think that they’re synonymous. They’re different products. But Information Steward is more of a data profiling tool for businesses, for data stewards, for data analysts, and you can track the lineage of your data and you can monitor the quality you can set up validation rules and then you can set up scorecards to see what kind of data issues your data has. But it is typically after the fact. You first extract the data from some kind of system and then you have Information Steward do all the profiling, and then based on the results you pass some of that information to Data Services to fix the data. That’s how the two are related; Information Steward is still almost like an after-the-fact tool that you use.
Ken: But they’re fully complementary.
Anurag: Yes, they are fully complementary.
Ken: So if you’re looking to fix the data it goes back to Data Services. And so what happens in that scenario?
Anurag: Yes, so Data Services and Information Steward they interact at several different points and so once it goes back to data services the appropriate fixes that have been identified can be applied to the data and then it can be sent to the downstream system. So if the downstream system is BW which is typically the case for a lot of customers where they extract the data, move it to a data warehouse such as BW, so now the data going into BW has been cleansed for things like addresses, names, geographical locations, thereby guaranteeing some sort of assurance that the analysis and decision-making people are going to be doing on your BW system is based on clean data.
Ken: So for the organization currently without kind of a real-time data management platform, how does that company go about getting organizational buy-in for to reach that state or develop a business case for saying this is something they need moving forward?
Anurag: That’s something I’ve been helping organizations deal with. One of the first things I’d like to say whether it’s Data Services or Information Steward, they’re not inexpensive tools so there is a price tag to it. So the first challenge you face is not so much what Information Steward or Data Services can do for you, but really the price tag. But the best way to help organizations understand the value proposition is to work very closely with the business leaders and find out where their challenges are, and then take a subset of one of their problematic areas, it could be materials management, sales and distribution, and then do a small proof of concept. That’s something I’ve helped a couple of companies successfully do. And a small proof of concept does not entail any major investment of time, money, or resources so you can within a period of maybe a couple of weeks put together something that not only shows you the problems in your data but it also gives you an idea of what that means in terms of impacts. Because one of the things that Information Steward does very well is to show you in a visual way what the impacts of bad data are. And that in turn can be shared with a diverse audience and based on that you can build a compelling case of why having a tool such as Information Steward will put you on a path to data governance.
Ken: So seeing the actual impact of how your bad data will negatively impact the company is a good way to get that buy-in.
Anurag: Absolutely.
Ken: Anurag, thank you for joining us today.
Anurag: You’re very welcome. Thank you for having me here.