Databases: back to basics

The main reason we build relational databases is to minimize redundancy so we can

  • maintain control over complexity through structure,
  • manage operating & infrastructure costs and
  • reduce complexity.

If you’re not trying to reduce the number of databases in your organization then you’re using relational technologies in the wrong way.

Data-marts, spreadmarts (excel as datamarts) & federated data warehouses proliferate databases and therefore take organizations in the wrong direction – that is away from reducing the number of databases & their respective costs.

"The two areas of computer technology that will make the new applications possible — indeed, in most cases they are absolutely fundamental — are telecommunications and the integrated database"  C.J. Date, An Introduction to Database Systems, November 1974

I first read this in Date’s "Introduction to Database Systems" back in University. I dug it out again in the late ’90s. The prediction at the time was that the next big challenge after Y2K was going to be managing the data behind the internet. So CJ Date was correct all the way back in 1974. Here we are witnessing the crest of the next big wave in our business. We’ve solved our Network problems for the most part. Our budgets have now recovered and we’ve depreciated the assets from our year 2000 over spending . That leaves us with figuring out how to manage, distribute and disseminate data in our information age.

Back in the 80’s, when relational databases were new, the oracles of the day (someone who could see the future not a database vendor) were predicting that networking, client server computing and relational database would sweep the market place. Back then Relational Database Management Systems (RDBMS) had one big problem – their performance was abismal. Companies spent millions converting legacy file systems to relational databases and then millions more in hardware upgrades & software enhancements to compensate for poor database performance.

One result of these poor performance solutions spawned data warehousing as we know it today. Back before the performance problems hit the database was sold as being capable of handling transactions AND reporting. One data source could do both jobs. Relational databases would minimize redundancy. Give us one source for our data and make life easier… software nirvana!

By the late 80’s companies like Briton-Lee and Teradata were introducing dedicated database machines to churn through large volumes of data for decision support & reporting, while Oracle, Sybase, IBM and others set about try to increase their transaction level performance.

For those companies who didn’t want to spend big money on hardware and databases dedicated to reporting. Solutions like RedBrick, Cognos Cubes etc. appeared on the market w/ Multi-dimensional schema’s , stars, snowflakes, and Data-Marts to generate the performance required for analytics.

All this activity delivered on some of the business’ reporting and analytics requirements. But it caused us to loose sight of the concept of integrating data. One database, one version of the Truth in our company data.

If you’re building data marts or if you’re working on one large database, or if you build transaction based OLTP systems always keep in the back of you mind the big goal and promise of RDBMSs. We’re trying to build a single large database that can support all our need in an integrated way, to reduce cost, complexity, redudancy, and confusion.

Update: RDBMS defined

Leave a Reply

captcha *