Guiding Principles

In the past we have talked about the role of governance, and in Monday’s post we talked about SOA – Top-Down vs Bottom-Up.  It had me thinking about how to manage things like:

  • Enterprise Platforms (BEA, Websphere, and the like)
  • Data Warehouses (Federated, Enterprise and Data Mart)
  • Integration (SOA, Web Services, EAI and EII …)

And one of the key roles of governance and stewardship when architecting and owning these types of platforms is to set guiding principles.  The purpose of these is to ensure that we are getting the business and technology benefit of choosing these platforms.

So taking Enterprise Data Warehousing as an example … one of the traps that seems to be always an issue, is the focus on evolving and growing the EDW on the chosen platform as opposed to allowing orphan operational data stores or data marts to spring up.  We all know this creates data duplication, leading to data inaccuracy, complex reporting among many other issues.  With the advent of Enterprise Information Integration layers (Ipedo or AquaLogic) this can be handled, but all of a sudden…oops you are now federated.  So where in the EDW you may have the information stored once (Teradata) twice to thrice (IBM or Oracle) you have it once in the EDW = $X + Other data store = $Y +… and this can get out of hand making the cost of your so called EDW seem really expensive.

Another example in EDW where guiding principles are necessary are on design principles:

  • How do you handle extract – one per source system (don’t overhandle the data)
  • Centralized ETL Development in ETL staging using one tool and one methodology or the same thoughts on ELT
  • Integrated Data Quality Monitoring (how, when and where to assess, monitor and maintain data quality)
  • Data Marts or Operational Data Stores
  • Views or direct to the EDW to get the data
  • Can Business Intelligence go directly to the operational source system, or only through the EDW
  • How do we enforce these rules as Architecture only sets direction and does not have a compliance slush fund (these are guiding principals and we are not the fun police)

When guiding principles are not set, how are the business and IT going to know when to go from the top or from the bottom.  Going from the bottom when you should design at the top can lead to a spaghetti of loosly integrated multi-copied data elements that should have been a 3rd normal form data store.

  1. Graham Boundy Reply

    I’ll be chiming in with my mantra here…
    Performance is always an issue! Boundy’s Rule
    I could be proven wrong here, but I am not holding out a great amount of hope for the success of the federated DW model or for the use of middleware to do joins across disperate database where the volume of data in the join will be large. In such situations the join data needs to be copied from one source to another or to an intermediate server for the purposes of executing the join. All this copying of data from one place to another takes time — will incur a performance hit.
    The guiding principle to handle the data as little as possible is to avoid the cost, risk, and delay of moving data from one place to another. The logical conclusion is the large enterprise wide central DW allows for the handling of data once (from source to DW) where it can be queried by the enterprise.

  2. Stephen Reply

    Great insight. I love your last line. Power to the People.

  3. Tim Matthews Reply

    In re performance, the problem is that it will vary by query, so having a generally useful benchmark is hard. But, we have many clients who are getting subsecond response with good concurrency for large user populations. Between CBO, throttling and judicious use of caching, performance can become a non-issue. Plus, we often find that even simple data combinations delivered in “Web time” (read simple queries with 3-4 second latency) make users ecstatic. We say – Give the people what they want.

  4. Stephen Reply

    Thanks for your comment – Great point. We commonly use views in the EDW (Teradata), but when you get Fedarated it very hard to do the same across multiple data sources, hence our looking at EII as leading the way on addressing this across the Federated DW and this is not necessarily helping us govern access to the data though that can be done down low.
    The business value is clear in both exposing and governing at a later like this.
    The hard issue we find is deciding on where governance should exist so that it does not become combersome in the process as well as performance impact. The SOA folks are talking about it at multiple layers as well as in the registry and then we add it here. It can be a confusing situation.
    The issue of performance of an EII layer seems to pop up in architecture conversations as well. We have been trying to benchmark this without a lot of success. I would love any thoughts you might have based on your experience.

  5. Tim Matthews Reply

    I was wondering if you’ve even thought about the potential impact of EII in helping data governance. Seems like one could use a middle-tier EII server to grant/restrict access based on policy. In some ways, Views could replace Marts (and not the EDW(s)). But better than marts, there is the ability to have centralized control. Thoughts?

Leave a Reply

captcha *