What’s a Data Scientist ?

Twice this week I've encountered the newish title "Data Scientist" and have been struggling to understand what that role might entail.  The first place I came across the title was in an article in the Teradata online magazine:  

http://www.teradatamagazine.com/v13n01/Features/Data-Science–Future-or-Fiction-/ 

The second occurance was in an client's office where the subject came up and we speculated on the nature of the role.

If "Data Scientist" is the new, in demand, thing then as a provided of IT services I should understand what this talent actually is.   But, after reading the above article I was still at a loss to come up with a job description, skills inventory, or experience level that would qualify someone (like me or someone I know) as a "Data Scientist".   After a day or two I was still struggling for a definition, so I made one up.

Intuitively, we can split the title into Data and Scientist and derive a literal meaning.  Data: has extensive experience with the research, study and analysis of data and information in its many forms (structured, unstructured, human or machine readable, cleansed or not, fully qualified through metadata or not, modeled or not, etc.).  Scientist (picture lab coats and clipboards): someone who has an understanding of and expertise with the Scientific Method:

http://en.wikipedia.org/wiki/Baconian_method

This leads me to believe there would be basic research* and uncertainty associated with the role and its deliverables. The outcomes may not be known at the outset and there would be some risk that no value is realized from the process.  These Data Scientists would work in environments where solutions would evolve over time and change would have to be anticipated.  It also implies we don't know what we don't know at the beginning and therefore we'd be advised to proceed with caution. 

Alternatively, Data Scientist is a new definition for a Senior Business Consultant and Technical Data Analyst and Data Subject Matter expert.

My contrarian side fears it may actually be the role of a "Data Pseudo-Scientist" should Baconian methods not be implemented as rigourously as required and/or lacking peer reviewed results.  Which leads to the question: Is Data Scientist a marketing derived title being used to create a sense of awe around a role that is not well understood or defined.  Bear in mind, this expert would come at a high billable rate due to their highly specialized and esoteric skills.  

Is the Data Scientist to be the new silver bullet resource, positioned to solve seemingly unsolvable Big Data challenges?

If we're still looking for silver bullet solutions, I think this indicates that rigour around managing large volumes of data, data quality, database and data based management issues, data governance, metadata management have become the elephants in the room.  As such we are choosing to live with but ignore these elephants while we deal with the new distraction object called Big Data.  We can't use the old titles to deal with Big Data and not deal with these other issues so we've created this new title to obfuscate our failure to deal with our elephants.

I've derived three meaning or definitions for Data Scientist:

1) Data specialist applying the scientific method to data problems

2) A business consultant and technical data specialist and data subject matter expert

3) Data Pseudo-scientist silver bullet distracting from our elephants solution

I really hope it turns out to be 1 or 2 and not 3. 

——–

*"Basic Research is what I am doing when I don't know what I am doing"  Werner VonBraun

Leave a Reply

captcha *