At the Teradata Users Group meeting this week we also had a great presentation by Peter Capobianco of Teradata on Data Quality.
Abstractly it was about: Tackling a data quality project can itself be a complex task – unless you take steps to quickly identify areas of improvement and realize a quick return on investment.
Peter started by talking about the correlation between Data Quality Assurance and Quality Assurance and that most best practices for DQA come from QA best practices – to that he referenced Larry English‘s work as well as Philip Crosby ("Quality is Free" book).
Anecdotally he mentions that often you can get more yield out of your analytics by focusing on DQA than the analytics can bring. This is an area that if you have garbage in you get garbage out.
Here are some of my take-aways:
- Data Quality (DQ) is the suitability of that data for it’s intended use
- DQ Assessment measures the suitability of data for its intended use – ie.
- Currency / Accuracy of data to appear in
- Availability of Credit Scores
- Data Profiling is a set of methods and tools used to process data for assessment purposes
- DQ Assessment plans should address all of the above topics
- Cleanse at the source, but this is not always practicle
- If using ELT (Sunopsis) then leverage the Teradata Profiler tool to assist with DQ
- Teradata Data Profiler can do a great job of supporting Data Mapping