The Data Scientist Oath (Part 1)

I believe in the truth. Truth is in the eye of the beholder. When the beholder is a single person, or very small part of the population that has an idea that group will try to influence the rest of the population it is correct. Almost every conversation is a negotiation of what should be the dominant truth.

“There are three kinds of lies: lies, damned lies, and statistics [Data Scientist Output]”Benjamin Disraeli.

Today, we employ data scientist to distill data and facts into information. I believe we should hold data scientist to higher standard than most people. It is their sworn duty to ensure that they understand and make veracity of the data and their conclusions 100% transparent.

4-Vs-of-big-data

While Veracity is just one of the 4-V’s of Big Data, it is the most critical element. Up until now, we assumed data was gathered through a scientific process, using a scientific instrument, guided by a scientific review, and driving to scientific conclusion of a hypothesis. Unfortunately, today, it is an incorrect assumption.

We must understand that all data must be considered big data. I realize that technically all data is NOT big data, but as we shift from using data only in science to using it in every aspect of our lives, we must now treat it all as big data.

Part one of the Data Scientist Oath is about the input of data into the model. It is critical that the Data Scientist always utilizes the highest quality of data possible for his model from known source so that he can treat the veracity of the data correctly and be transparent to his clients in the final products.

In the next part, I’ll address the output side of the oath with some more simple examples that clearly illustrate how easy facts can be distorted. I’ll conclude with a summary and some next steps towards a Data Scientist Oath.

 

Advertisements

Author: cloudubq

Shaving solutions with Occam's razor while seeking simple elegant synergies. Scientist working as an engineer by architecting systems to improve the world and support my family.

2 thoughts on “The Data Scientist Oath (Part 1)”

  1. Very important aspect of data analysis to bring up now at this inflection point of big data. Personally, my comfort with statistical interpretation stems from the fact that there were human SMEs weighing in on the interpretation using statistics as a tool. Now that the scale of the Volume, Velocity and Variety is / will be in mind blowing proportions, the point you make about integrity is an important watch-dog function.

    The profundity of the oath below is more pronounced now – ” to tell the truth, the whole truth and nothing but the truth” .

    When WATSON ingests vast quantities of unstructured data, there is a lot of ‘contamination’ to be expected including myriad levels of “opinions” – gleaning objectivity from this data will be a challenge. Philosophy classes may get more popular in college in the coming years as we grapple with these thoughts.

    Thanks for bringing this topic up.

    1. Sabbarao,
      Thank you for you kind words. As with any powerful tool, it has the ability to do good and the ability to evil. Integrity is critical and being transparent in the source and methods allows others to evaluate the correctness of the output.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s