I believe in the truth. Truth is in the eye of the beholder. When the beholder is a single person, or very small part of the population that has an idea that group will try to influence the rest of the population it is correct. Almost every conversation is a negotiation of what should be the dominant truth.
Today, we employ data scientist to distill data and facts into information. I believe we should hold data scientist to higher standard than most people. It is their sworn duty to ensure that they understand and make veracity of the data and their conclusions 100% transparent.
While Veracity is just one of the 4-V’s of Big Data, it is the most critical element. Up until now, we assumed data was gathered through a scientific process, using a scientific instrument, guided by a scientific review, and driving to scientific conclusion of a hypothesis. Unfortunately, today, it is an incorrect assumption.
We must understand that all data must be considered big data. I realize that technically all data is NOT big data, but as we shift from using data only in science to using it in every aspect of our lives, we must now treat it all as big data.
Part one of the Data Scientist Oath is about the input of data into the model. It is critical that the Data Scientist always utilizes the highest quality of data possible for his model from known source so that he can treat the veracity of the data correctly and be transparent to his clients in the final products.
In the next part, I’ll address the output side of the oath with some more simple examples that clearly illustrate how easy facts can be distorted. I’ll conclude with a summary and some next steps towards a Data Scientist Oath.