Today, we are bombarded by data and information as FACTS. We accept what is in chart from any printed source as if it is a FACT and therefore true. Let me illustrate with sublime example.
If the title of this chart is “Households with Gourmet Cooks”, you might be influenced to run out an buy stock in a company that makes gourmet cooking equipment like Middleby Corporation the makers of Viking kitchen equipment. If the title of this chart is “Households with Gourmet Cooks (sample size = 2)” or “Households with Gourmet Cooks (Std. Dev. = N/A), you’d probably not. In fact, you’d probably wonder why I built the chart, and yet we frequently fall into this trap and don’t even ask for the transparency. The data scientist presenting his data should always tell you how they arrived at their conclusion.
Unfortunately it happens almost everyday in trusted sources ranging from news magazines, newspapers, documents at work, and in every aspect of the Internet (social, web sites, e-mail, etc.). The Internet is huge force multiplier, which enables one zealot to look like an entire movement and their arguments look like widely accepted FACTS.
There are even more subtle ways of influencing you especially with cultural and emotional cues. If you look at the charts below, what happens if the title for both charts which are identical is “Evil is winning the war over Good.”
In the first chart, you might initially draw that conclusion that evil is wining big time. We assume red represents evil since red is the color of the devil in the Western world. We react to the color and not the fact fact that evil is a 2% green slice. Plus the text in the legend is small and hard to read. In the second chart, you’d look at it and think the author is an idiot since the evil slice is clearly a tiny 2%. How information is represented is almost as important as the source data and methods used to turn it into information. Even good information can be displayed poorly.
PT Barnum said “you can’t fool all the people all the time, but you can fool all of the people some of the time” and I agree we all do get fooled occasionally. It is each individual’s responsibility to consume data carefully and consider the source and how the information is being displayed to minimize the “some of the time” to almost always never.
It is the responsibility of the data scientist to try to present the information in good faith, transparently, and with as little bias as possible. Stan Lee via Spiderman Comics said via wise Uncle Ben Parker “that with great power comes great responsibility.” If you work with data and publish it then understand the potential influence and power over people’s opinions, thoughts, feelings, and potential actions and use it wisely.
Start with a simple Data Scientist oath or code. “As a Data Scientist, I will understand the veracity and validity of my data and its sources, and I will clearly, transparently and with minimal cultural bias display the results so the end consumer can make valid conclusions.”
It is that simple. How cool is that you get to side with Stan Lee and all his comic book heroes and become a real hero by making every effort to represent the truth as plainly and obviously with unflinching transparency as humanly possible.