This is a question that comes up all the time. I was recently away at a conference where we discussed this topic in great detail for a couple of hours and at the end I think we had a view of the concept of big data, but no common definition.
One of the funniest analogies of big data was mentioned in our session which really resonated with me, is that Big Data is like Teenage Sex. That sounds very strange, but the reasoning is explained in this image:
The Gartner definition is: Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
On wikipedia they have a much larger definition:
Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. Big data “size” is a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data.
In a 2001 research report and related lectures, META Group (now Gartner) analyst Doug Laney defined data growth challenges and opportunities as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). Gartner, and now much of the industry, continue to use this “3Vs” model for describing big data. In 2012, Gartner updated its definition as follows: “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” Additionally, a new V “Veracity” is added by some organizations to describe it.
If Gartner’s definition (the 3Vs) is still widely used, the growing maturity of the concept fosters a more sound difference between big data and Business Intelligence, regarding data and their use:
- Business Intelligence uses descriptive statistics with data with high information density to measure things, detect trends etc.;
- Big data uses inductive statistics and concepts from nonlinear system identification  to infer laws (regressions, nonlinear relationships, and causal effects) from large sets of data with low information density to reveal relationships, dependencies and perform predictions of outcomes and behaviors.
After looking at these definitions do you feel that you really understand and know what Big Data is and what it means to your organization? This is a hard concept to grasp and even harder to move into reality within an organization. We know that Big Data is important and will continue to be more important, but are we still in the “hype-cycle”? Is this why no-body really knows what to do?
What are your thoughts? How do we move Big Data into a reality within organizations? Is it something best left to the big players or should everyone be considering it?