In the data warehousing and business intelligence world we are constantly challenged to produce results faster and at reduced cost. The rapidly changing business environment means that today's business question may change in the next six months. The challenge to produce results more quickly exists for the whole IT community but it seems most important in the business intelligence work.
We, at Project X Ltd., continually strive to find innovative ways to reduce the time and cost to deliver results to clients on business intelligence projects. The challenge is also to produce results that fit within the context of an overall architectural strategy. In addition to being able to answer the specific business question, the approach also builds a capability to answer future question while conforming to the longer term architectural goals, . The tension between short term expedient and long term value is important to acknowledge and take into account in the design.
The waterfall approach to development is the most common way to avoid mistakes, each phase in the development process follows from the previous phase. But what is wrong with making mistakes. In Rapid BI we admit we are going to make mistakes and recover from them. At a presentation by Michael McIntire from E-Bay, he called this "Failing Fast". The premise is that it makes no sense in taking a long time to design and build something if it is not going to meet the needs.
For Rapid BI, the waterfall approach does not usually meet the need. Other approaches are required including allowing things to proceed in parallel. One approach we have used is to start development of the reports and downstream development work before the data is populated by the upstream phases. Trusting that the data will be there when the time comes for it to be needed.
Another benefit in delivering Rapid BI is improved data quality through exposing the data sooner. Often the problems of data quality arise very late in the project as the results are being viewed by the key users. When we see the results for the first time, the business rules become real and visible in the results. Our goal is to expose the data quality issue early in the project and resolve them as the development continues. The key factors in data quality:
- Cleanse at the source
- Handle data as little as possible (Treat data like produce)
- Transform gently or not at all
- Get the keys right
The challenge for the project is to work within data governance guidelines to produce quality data. Often on a project, when new objects are created and defined, these objects exist in many different forms in the data warehouse with different definitions. The rules for transformation must be understood and approved. Early agreement on definitions and business rules is very important. However often when these definitions are applied and presented in the form of reports, the implications of the original agreement finally exposed and "data quality" issues arise.
One way to avoid surprises at the end of the development phase when the products do not meet the users needs is to produce reports much earlier in the project. The difficulty is to populate these reports with real data. If we use real data the business user sees the transformed data in the report format. This activity requires close coordination and involvement of the users much earlier in the development. Users who work closely with the development team can head off surprises later in the project.
The tension between business users, BI builders, data modelers and ETL developers can become quite high if communications are not maintained. Everyone on the project must realize that driving ahead to a rapid solution will not be perfect but will meet short term needs. The solution will then evolve to meet long term goals.
This blog is the beginning of an on-going discussion of Rapid BI. I invite comments, observations and experiences.