Yesterday I read (RED) a WHITE paper by IBM on Dynamic Data Warehousing (see "New White Paper" at http://www-306.ibm.com/software/data/db2bi/data-warehousing/) that was sent to us. It’s basically a marketecture document disguised as IBM’s view of the future of DW.
But it highlights that the industry is still struggling with the basic problems we’ve had in DW sinces the mid-90’s when Harvey and I first started looking at it.
The fact is we’ve come a long way but still haven’t cracked some of the most fundamental issues of DW (namely confidence in data consistency, data quality, the infamous: single version of the truth, silos, and now the need for realtime or near-realtime access to operational data for BI purposes) indicates that most organization either don’t fully comprehend the magnitude of the issues or are not READY to make hard decisions on how to solve them.
The hard decisions include, but are not limited to: The shift to shortening times between data capture & BI information delivery. The realization that multiple versions of the truth will be with us for some time. The (and I’ve waited this long to say it) … The sorry state of data definitions, data models & the associated business rules applied to generate BI results – aka metadata. And my personal mantra: Performance is always an issue.
Jim, you are correct. Let’s take the first one that comes to mind for me but isn’t in the list in the original post. That is FUNDING.
We spend a lot of time thinking about how the data is modeled. We spend a lot of time ensuring that the data we need is in the database. But we don’t always spend as much time as we should publishing the data, understanding the rules used to find the data in the database and how it interacts.
For example today, we were looking for some data in the warehouse. I’m supposed to be the expert in finding this data. It took me a little while to put all the pieces together to find an initial set of data that met the requirements. But even I wasn’t happy with the way the results were returned. And I am sure that the query I ran against the data could have been refined to give better results.
Two things prevented me from doing those refinements. 1) The results were close enough for the results we were looking for. 2) determining how to improve the query results would have taken a significant amount of time and effort because the rules for the refinements are not well documented.
Now consider how the “non-experts” are expected to find data to their satisfaction in the data warehouse. Basically they can’t, they give up trying and instead build a simple data-mart that they can manage in excel on their work stations. There are literally armys of people doing this today in companies all over the world.
The real problem is finding a solution to the metadata problem. All the business rules might be buried in word documents or repositories or in data definitions in the database. The devil is in the details of putting all this information together into a package that ordinary human beings (i.e. not the geek squad) can find in a reasonable amount of time and without indepth knowledge of the internals of the warehouse.
All that requires time, effort and funding that isn’t easily justified from a ROI perspective and therefore isn’t done.
Great post. I think each of those issues deserve some elaboration. I wonder how much of the limitation is related to the fact that the information has different priority for different parts of the organization. For example payrole and personnel use th edat in totally different ways. Accounting and management use data in totally different ways. Although it often is the same number, the meanings are quite different. There is often not a accepted set of rules between groups. Balancing at month end is certianly not a priority for management but sacrosant for accounting. Just different parts of the ellephant.
Any suggestions. Those are not performance issues.