I think a little history of data warehousing (DW) would be helpful to understand how it has evolved. In the late 80’s and early 90’s the idea of an executive information system (EIS) started to be the fashion. Programs were developed that would present data in a very user friendly way on executive’s desk. This data might be key performance indicators and lots of other external and internal information. Tools were required to extract data from all kinds of different sources and put them in a form that executive could use to explore the data. I remember one group develop a tool called the Executive Dashboard. That was the first time I heard about drilling down into the data.
People discovered many challenges around the extraction of data and how to put it into a form that made exploration quick and easy. Ideally the data would be from live data so that information was not replicated. However this idea proved difficult because performance of operational data bases was a big enough problem without adding a whole new set of complexity. Thus the idea was to extract data from the operation data bases and transform them in data suitable for the EIS tools.
Several data base organizations came out with tools to meet these needs. These new executive data bases became repositories of all kinds of really important data about what was happening in the business and in the business environment. For example a mining company executive could track the price of a metal on the open market with the price they were getting for the same product and compare both of these to their production costs. They could do this monthly and watch trends. Some of these executives became very sophisticated in using this data to make important decision. One organization developed a very sophisticated tool to measure the profitability of every sale versus the price they could get on the London Metal Exchange. This data was very close to real time and I recall was tracked daily.
Eventually people realized this data was a important asset and the data warehouse concept was born. Some companies realized huge competitive advantage by using the internal and external data. The accent is legendary in the data warehousing lore. The problem developed that the size and complexity of the DW grew exponentially but processing capability was lagging. Some companies developed specialized machines that could handle these demands. Thus response issues were address. However these machines and the data were still not being used by the people who needed the data for running the operations. The data was still extracted and transformed into a format for the DW.
We are now getting closer to being able to use the same data for operational and reporting requirements. Clearly this would be better because every time data is duplicated the cost rises and the chances of error increase. This design goal presents many challenges for the system architects and designers. However the dream of data base design is to have a data entity only exist in one place and be related to other data by pointers. These type of data bases are called relational data bases. Often this ideal is sacrificed for performance. More on this subject later.
My goodness you are cynic. I wonder if others would put different spin on the scenario. I will have to think on this.
The synics twist.
The evolution of Relational Database technology drove applications to be built using RDBMS (relational database management systems) along with transaction processing systems, the dreaded fourth generation languages and where necessary 3GLs like C and COBOL.
As Jim stated, at the time (late 80s, early 90s) the performance of the RDBMS systems could not support both OLTP (on line transaction processing – like Point-Of-Sale (POS), Order Entry, and other application that required sub-second response time) and Reporting
The problem was, at that time, the RDMBS were being sold to do both. Most application were custom made because the SAPs, AMDOCSs, Seibels and JD Edwards didn’t exist of weren’t advanced enough to meet most companies needs. Companies went out and bought, usually from their hardware vendor, a RDBMS and some development tools to built there applications in house.
Most of these projects cost millions (usually 2 to 3 times the original budget) but they were going to revolutionize the business. We could do all our business processing and reporting at the same time from the same system. OOOOWWWW Nirvana!
Recall this was also the era of Business Process Re-Engineering (BRP) and the rise of Open Systems (aka UNIX) and the proposed death of the Mainframe and other proprietary systems. Well I don’t know about you but I still have a mainframe userid almost 20 years later.
Imagine you’re a CIO in 1991. You’ve just spent 2 to 3 times your initial budget on a system that is late and the business is bleeding money trying to keep operational. What do you do when your people come to you and say the performance on the system is horrible. Your IT group will have to tune the B’Jeezuz out of the system to keep the company afloat and that will mean there will be no reporting available because it’s just too slow and takes too many resources away from OLTP.
Well, what happened was in 1993 your replacement (i.e. as CIO you got canned) and so your replacement CIO has to suggest to the Board of Directors that the solution to your reporting wows is a new concept called Data Warehousing. That’s where we copy the data out of the operational systems and do OLAP (OnLine Analytical Processing) on it. And it will only cost another couple of million dollars to do it.
Then by 1995 people are telling the beer and diapers story and how retailer X realized a 5 bijillion percent return on investment because they built a DW.
And that all leads to the stuff we haven’t touched Enterprise Data Models, Enterprise Data Warehousing and ROI … that’s another story