What is in your EDW?

I have been discussing with Graham the problem of finding out what information is in the data warehouse.  The general solution is to store some information about what is in the warehouse somewhere.  This data about the data is often called Metadata.   Often no rules exists about the structure of the metadata and people construct metadata using spreadsheets which help them find the data in the warehouse and provides definitions of the data.

The difficulty with this approach is that it is often designed by the person constructing the metadata.  I am a firm believer that we need to have some rules and structure around the metadata.  Also we need some simple ways of searching the metadata.

The first step is to propose a standard way of describing the metadata.  The format of the metadata needs to be very flexible to allow for different descriptions for different types of data.  A simple design is really key and one that the business analyst and business user will understand and be able to use.  Another key factor is ease of maintenance.  The data stewards should be able to maintain this metadata easily.  The next factor is to find a simple search tool, like Google, to search the metadata and return back the information quickly in an accessible form.

The big effort is to populate the metadata repository.  I suggest that initially new data being added will be the first in the repository.  As the metadata repository evolves and is being used by more data stewards, the older data will gradually be added.   One other route would be to extract data from metadata spreadsheets and populate the repository.

The major objection to starting to construct the metadata repository is the size of the effort in describing everything in the data warehouse.   I strongly suggest that you start with one project and, instead of a spreadsheet, populate a metadata repository.  Once we have this repository, test the concept and the design and only move on when the repository is working well.

Leave a Reply

captcha *