Data Warehousing Fundamentals # 1

I think that we often need to go back to fundamentals in our area of specialty because we sometimes cannot see the forest for the trees.  We forget we are here to drain the swamp when we are up to our ears in alligators.  In diving into a problem we sometimes need to get back to fundamentals.  I admire the people who can look at an old problem with fresh eyes and go back to the basics and come up with a new approach.

In Data warehousing we are here to help the business make better decision based on good data.  Every business has lots of data but access to it is the challenge.  At the most basic level, we want to help the business executive make more knowledgeable decision. 

I have been reading some of the articles by the authorities in the field and would like to share what I have learned.

In this business we want to build something that gives our client insight into their key business question in a way that is simple and fast.  You often hear us talk about Rapid Results.  The reason for this focus is to limit the total cost of ownership and the overhead in the design and production phases. 

However the design fundamentals are quite challenging.  The soul of the data warehouse according to Ralph Kimball are as follows:

  • Drilling down
  • Drilling across
  • Handling time

Drilling down is the most basic end-user maneuver and the data warehouse must be flexible enough to support  drilling down in as general and flexible manner as possible.   We cannot predict the user’s drill down path so we must support the same approach for all drilling down.  The user must be able to use the same approach for all drill down activity.  Ideally a standard tool is best for drilling down because then the tool is not dependent on the structure of the data.  Specialized drill down tool result in much more custom design and inflexibility.

Drilling Across

Drilling across different fact tables means that the tables must use dimensions which conform.  The design of the data must use common definitions of terms.  The conformed dimensions do not have to be centralized to drill across.   All the data does not need to conform but those dimension that you wish to be able to drill across must conform.

Handling Time

  • The pledge of every data warehouse provider is that the data warehouse will preserve history.
  • Every piece of data must have a clearly understood time validity.   When it is valid and when it ceases to be valid.
  • If a description of an entity changes over time, the version of that entity must correctly associate with other contemporary data in the warehouse.  For example the description of the purchaser attached to that purchase must be correct for that time.
  • The warehouse must support natural ways of viewing data over time.  Data can be instantaneous, periodic, or latest.  The time associated with this data must be consistent with the time description.

These three fundamentals certainly represent a challenge to the designer.  However we are fortunate that tools have been developed to help us implement these principles.  Each one of these fundamentals must be address in all designs.  I think it is a good idea to remind ourselves occasionally of these ideas.

  1. Graham Boundy Reply

    I think the hardest of all these is the struggle we have with time in the data warehouse. Time ranges, time stamps, time durations, how long we need to keep data until it becomes less relevant. All of this weighs heavy upon the data warehouse builder.
    “Time and tide melts the snowman”, Dr Who.

Leave a Reply to Graham Boundy Cancel reply

captcha *