Material Handling – the cost of moving data

In data handling, the more we move and transform (pick up and put down) data the more we increase the issues of:

  • Data Latency
  • Data Quality Degradation
  • Cost of ETL development and Support
  • Number of copies of the same data element
  • Can we get the Business Intelligence that is intended

This may seem obvious, but I ask you to quantify that cost.  That very thing happened the other day with a client, so we are actually working on quantifying the direct cost.  This will not include the cost of having the same question asked of three people and getting five different answers back.

In the past we have used distribution as an example or metaphor for explaining Data Warehousing – the same metaphor holds true in the moving of data.  The more we handle an item the more the spoilage or shrinkage.

  1. Jim Reply

    I would like to wade in here. I was trying to think about how one copies material in the materials handling metaphor. I guess you really cannot. Maybe the metaphor tells us you cannot really duplicate data because when it is copied it is a different piece because data is not just the information but where it is located, not identical. Not sure.

  2. Stephen Reply

    The other area associated with spoilage is the transformations that are done during data movement.
    This to me leaves open the possibility of changes in the data as it moves from place to place.

  3. Graham Boundy Reply

    My thinking on this is it is not the degradation or spoilage issue, the data can sit on disk forever and have shelf life. And there’s a small space usage cost that is negligable. The real concern for me is “cost-of-goods” as inventory is moved from one place to another the cost of material handling goes up and therefore the cost of goods increases. This is even more true for data, because along with the data there is metadata and documentation that also have to be built and supported. Simply picking up and copying a table or a file of data is not that costly. But maintain the infrastrucuture around it is. For each identical copy of a piece of data there is a separate piece of metadata. That is the data in Table_A may be identical to the data in Table_B, but I have to maintain two metadata references and two sets of documentation, one for each table. I also have to maintain the ETL Transform rules between Table_A and Table_B. Replicate Table_B to Table_C and the ETL, metadata & documentation counts go up by one each – adding to the cost of goods.
    Questions like: What is this Data? Where did it come from? How has it been transformed? Answering these questions requires time and effort which adds to the cost of good of the data.
    What’s more, if the data is just moved and the metadata and documentation are not maintained then the cost of evaluating the data later increases even more because the time to investigate and determine what the data really represents goes up. Time is money, therefore this drives the cost of good on the data up too.
    My mantra is: Every time we pick up the data and put it down it costs money, adding to the cost of goods. Handle it as little as possible and with care.

  4. Jim Reply

    How do I find the trackback entry?

  5. Vincent McBurney Reply

    A good topic for debate. I argue that the metaphor does not hold true because physical objects cannot be perfectly copied so any handling results in some type of degradation while digital media can be copied without any degradation. See my trackback entry for a full response.
    Keep up the great blogging!

  6. Does moving data degrade data?

    Moving data can be expensive and slow and risky. Moving data can also be fast and save a lot of money and be robust. You cannot generalise either way.

  7. Jim Reply

    I think the metaphor really helps demonstrate the cost of moving data. However it does lose the value when you drive it too far. I was wondering how to apply the metaphor for data latency, data quality, and multiple copies. Any ideas?

Leave a Reply

captcha *