Gramar saves lives & numbers can hurt

Pasta should be cooked in salty water even if the numbers say otherwise. Olive Garden in order to get lifetime warranties on their pots, eliminated salt in the water  and they still don’t salt the water as of today (April 2016). As a decent cook, and others agree, you need salt in the water if you want the pasta to taste good. Only some junior analyst who thinks Olive Garden is the pinnacle of Italian food would think saving about $80 a year, the price of pot or the loss of 3 customer meals, would have no impact. Plus to cover the lack of taste in the pasta, you have to add more salt and sugar, often corn syrup in commercial products, to the sauce.

When we use numbers, we need to make sure the numbers, analytics, or metrics correspond to a meaningful output. I’ve seen consulting companies cut consultants to increase savings, not recognizing that if you have fewer consultants, you’ll have less revenue since consultants are the product. In a classic case, the US government made selling ice cream and visits to beaches illegal in an effort to stop polio.  The increased ice cream sales and visits to the beaches was due to the heat. The same heat that enables the polio virus to stay alive long enough to  spread.

At SAP TechEd a few years ago, I watched with fascination as Nate Silver kept telling the interviewer that the thing we really needed was to have highly knowledge analyst who understand the complexity of real life topics – actual experts. The interviewer kept wanting him to expound on SAP tools, but Mr. Silver held his ground. It is not the tool, the metrics, or anything that is technical. Some tools are better or easier than others, and SAP has some great tools was all he’d say, but paraphrasing the interview, he said “you have to understand the subject,  the relationships, and why a number might go up or go down”. Once you know the basics of the real system, you can use the numbers to look deeper, but not until.

If grammar can save lives (“let’s eat, grandma” vs. “let’s eat grandma”) then analytics can ruin them. Many decisions, such as layoffs or substituting equivalent products, are made on the basis of numeric analysis. As responsible analyst, we must make sure our metrics of success, failure, and our discoveries makes sense. Does the data really explain the results? Is it reasonable? What is scientific, logical reason for the outcome?

As we make more data available online and accessible, we must make sure it is clear what the data represents. Finally, we have a duty to be ardent guardians of proper use of analytics and be aggressive prosecutors when others misuse data and analytics. Numbers, and the conclusion we derive from them, directly impacts peoples quality of life. Number’s can hurt.

Our 2 fears of Artificial Intelligence (AI)

We have two (2) overarching fears of AI. AI domination is the most irrational fear where AI becomes smarter than organic intelligence and wipes out or subjugate the organic life forms. This plays out in number of number of science fiction works like “Transformers”, “Terminator” and “I, Robot”. In “I, Robot”, the AI unit is claiming to do it in service of humanity. I’d argue AI domination is the least likely scenario of doom and maybe in dealing with our second fear, we can solve our AI domination fear, too.

The second fear is that of misuse of AI. I’d argue that is the same argument has been used against every technological advancement. The train, automobile, nuclear fission, vaccines, DNA, and more have all been cited for ending the world. I suspect someone said the same thing against the lever, wheel, fire, and bow. Each has changed the world. Each has required a new level of responsibility. We’ve banded together as humans to moderate the evil and enhance the positive in the past. Ignoring it or banning it has never worked.

Amazon, DeepMind/Google, Facebook, IBM, and Microsoft are working together in the “Partnership on AI” to deal with this second fear as described by the Harvard Business Review article “What will it take for us to trust AI” by Guru Banavar. It is a positive direction to see these forces coming together to create a baseline set of rules, values, and ethics upon which to base AI. I’m confident others will weigh in from all walks of life, but the discussion and actions needs to begin now. I don’t expect this to be the final or only voice, but a start in the right direction.

I hope the rules are as simple and immovable as Issac Asimov’s envisioning of the  3 laws of robotics on which the imagined, futuristic positronic brains power AI robots. Unfortunately, I doubt the rules will be that simple. Instead they will probably rival international tax law for complexity, but we can hope for simplicity.

The only other option is to stop AI. I don’t think it is going to work. The data is there and collecting at almost unfathomable rate. EMC reports stored data growing from 4.4ZB in 2013 to 44ZB in 2020. That is 10^21 (21 zeros) bytes of data. AI is simply necessary to process it. So unless we are going to back-out the computerized world we live in, we need to control AI rather than let it control us. We have the option to decide our fate. If we don’t then others will move forward in the shadows. Openness, transparency, and belief in all of human kind have always produced the best results.

In the process of building the foundation of AI, maybe we can leave out worst of human kind – lust for power, greed, avarice, superiority. Maybe the pitfalls in humans can simply NOT be inserted in AI. It will reflect our best and not become the worst of human kind – a xenophobic dictator.

Putting the AI genie back in the bottle will not work. So I think the Partnership on AI is a good first step.


Healthcare technology dependent on new Healthcare Regulation

Recently, Andy Slavitt, CMS Acting administrator, wrote “Pitching Medicaid IT in Silicon Valley“. His requests are sensible and well founded, but will fail. I’m not saying he won’t have modest gains, but his gains will be held back by overly burdensome Code of Federal Regulations that regulates all regulated industries including health care providers, pharmaceuticals, utilities, etc.

I am currently working at a global Pharma company and it is embarrassing the volume of labor and the amount of paper we consume in the name of quality. We are switching to a digital system, but the effort will remain the same. Every item has to be written up for expectations, dry run testing to get 100% correct, written, printed in screen print, and then signed. Any exception such as an extra temporary file in the directory results in a hand written explanation. I’m not convinced that any of this QA work will ever add anything to quality of the pharamceuticals the company produces.

I fully get the need to test, but this test so we can’t be questioned model of micro-testing and lack of awareness of digital images will continue to hold back the industry. Cloud, at any level (IaaS, PaaS, SaaS) requires the ability to certify the template and assume all instantiations are also certified if they have no errors in the instantiation process. Again, I’m not against testing or quality, but the redundancy is a waste.

The other 2 issues that will need to be conquered for healthcare is: 1) privacy and 2) standardized data formats at least for header records. Privacy is pretty obvious in that I don’t want anyone reading my information unless required and released by me. At the same time, it is important that my “data” goes into the pool for analytics to give researchers the ability to learn from the population. There is no perfect answer for anonymizing data, but good security and good tools are well-known, understood, and can be implemented.

Data standards are not a new issue in our industry. While 80% of data is non-structured, the 20% that is structured never seems to line up so we spend anywhere from 60-80% of our analytics money messaging data. To tie together all this data, we’ll need to have at least some standards for header records.

Again, I salute the efforts by Andy Slavitt (@aslavitt), CMS Acting Administrator, for his efforts. I hope he gets his interns and can make a big impact. He’s got a big job to do.


Why I believe IBM will succeed

I believe IBM will succeed even in this next era of rapid innovation. There is no doubt IBM is founded on innovation. Whether you measure it by 23 years of leading in number of patents or by sheer number of innovations found in its history (DRAM, Hard drives, Tabulation Machines, System 360, major innovation around relational datbases, etc.), IBM is innovative.

I think the question is not “can IBM innovate”, but can IBM innovate with enough speed and follow through. It is tough for any large company to move fast with heirarchies, communities, and sheer mass. It can be done.

One key is having a clear vision. IBM’s vision is Cloud, Cognitive, and Industries. Cloud in all it’s forms including IaaS, PaaS, and SaaS. Recent announcements like putting IBM Box, IBM’s cloud for file sharing, on Amazon shows a willingness to follow requirements of the market. Clients are saying no one cloud solution, even IBM’s cloud, is enough. Speed and diversity are as important as cost, or more.

Cognitive is the peak of IBM’s data strategy. Beneath is everything from ETL to IoT to cloud based integration. Getting to Watson is rarely a first step for most clients. Rather we find we need to do a lot of data hygene just to be ready for standard analytics. Eventually, they do get to Watson and Cognitive services. It is a journey.

I really find Watson on Bluemix especially interesting. IBM is offering access in nibble size chunks access to Watson via standard APIs. It is an amazing shift to see IBM offering the power of its flagship product for pennies. It is a new model for IBM. IBM has always ruled in the realm of big projects with high margins. To take on the tiny, an API at a time and a penney at a time, is huge change in business model for IBM. You can check out the services, via RESTfull API’s, on the developer cloud and for modest use it is even FREE

Under the banner of Cognitive is IoT. The ability to interact and understand our world via the digital world seems like a SciFi dream. The possibilities are endless. We see capabilities like controling our environment just by thinking about it. I love the story about the IBMer who is using his mind to control a Sphero toy. I confess, I want one.  or Youtube (~3 mins).

Industries runs through everthing at IBM. IBM’s entire organization is organized by Sector (Industrial, Distribution, Financial, etc.) and below that into Industries. Every go to market effort is passed through an industry focus and a lot of the investment in new ideas is based on the question of “what does this industry require.” You can even filter our Institute for Business Value by Industries to find unique value for your business. Watson even has its own Watson Healthcare division – another focus on an industry.

In the fast moving world of IT innovation, being innovative last year is not going to save you; however, IBM has a long history of remaining an innovation leader. We working to see how we can leverage all IBMers’ great minds.  I’m optomistic as we are now working on innovations for rapid innovation at cloud speed and beyond. Cloud, Cognitive, and Industries is great springboard into our future.


The Data Scientist Oath (Part 2)

Today, we are bombarded by data and information as FACTS. We accept what is in chart from any printed source as if it is a FACT and therefore true. Let me illustrate with sublime example.

half & half pie chart
Chart 1: A simple pie chart

If the title of this chart is “Households with Gourmet Cooks”, you might be influenced to run out an buy stock in a company that makes gourmet cooking equipment like Middleby Corporation the makers of Viking kitchen equipment. If the title of this chart is “Households with Gourmet Cooks (sample size = 2)” or “Households with Gourmet Cooks (Std. Dev. = N/A), you’d probably not. In fact, you’d probably wonder why I built the chart, and yet we frequently fall into this trap and don’t even ask for the transparency. The data scientist presenting his data should always tell you how they arrived at their conclusion.

Unfortunately it happens almost everyday in trusted sources ranging from news magazines, newspapers, documents at work, and in every aspect of the Internet (social, web sites, e-mail, etc.). The Internet is huge force multiplier, which enables one zealot to look like an entire movement and their arguments look like widely accepted FACTS.



There are even more subtle ways of influencing you especially with cultural and emotional cues. If you look at the charts below, what happens if the title for both charts which are identical is “Evil is winning the war over Good.”

In the first chart, you might initially draw that conclusion that evil is wining big time. We assume red represents evil since red is the color of the devil in the Western world. We react to the color and not the fact fact that evil is a 2% green slice. Plus the text in the legend is small and hard to read. In the second chart, you’d look at it and think the author is an idiot since the evil slice is clearly a tiny 2%. How information is represented is almost as important as the source data and methods used to turn it into information. Even good information can be displayed poorly.

PT Barnum said “you can’t fool all the people all the time, but you can fool all of the people some of the time” and I agree we all do get fooled occasionally. It is each individual’s responsibility to consume data carefully and consider the source and how the information is being displayed to minimize the “some of the time” to almost always never.

It is the responsibility of the data scientist to try to present the information in good faith, transparently, and with as little bias as possible. Stan Lee via Spiderman Comics said via wise Uncle Ben Parker “that with great power comes great responsibility.” If you work with data and publish it then understand the potential influence and power over people’s opinions, thoughts, feelings, and potential actions and use it wisely.

Start with a simple Data Scientist oath or code. “As a Data Scientist, I will understand the veracity and validity of my data and its sources, and I will clearly, transparently and with minimal cultural bias display the results so the end consumer can make valid conclusions.”

It is that simple. How cool is that you get to side with Stan Lee and all his comic book heroes and become a real hero by making every effort to represent the truth as plainly and obviously with unflinching transparency as humanly possible.


The Data Scientist Oath (Part 1)

I believe in the truth. Truth is in the eye of the beholder. When the beholder is a single person, or very small part of the population that has an idea that group will try to influence the rest of the population it is correct. Almost every conversation is a negotiation of what should be the dominant truth.

“There are three kinds of lies: lies, damned lies, and statistics [Data Scientist Output]”Benjamin Disraeli.

Today, we employ data scientist to distill data and facts into information. I believe we should hold data scientist to higher standard than most people. It is their sworn duty to ensure that they understand and make veracity of the data and their conclusions 100% transparent.


While Veracity is just one of the 4-V’s of Big Data, it is the most critical element. Up until now, we assumed data was gathered through a scientific process, using a scientific instrument, guided by a scientific review, and driving to scientific conclusion of a hypothesis. Unfortunately, today, it is an incorrect assumption.

We must understand that all data must be considered big data. I realize that technically all data is NOT big data, but as we shift from using data only in science to using it in every aspect of our lives, we must now treat it all as big data.

Part one of the Data Scientist Oath is about the input of data into the model. It is critical that the Data Scientist always utilizes the highest quality of data possible for his model from known source so that he can treat the veracity of the data correctly and be transparent to his clients in the final products.

In the next part, I’ll address the output side of the oath with some more simple examples that clearly illustrate how easy facts can be distorted. I’ll conclude with a summary and some next steps towards a Data Scientist Oath.


Why do I blog (and write stupid, long winded emails)

The drive to communicate in writing. Why we still tolerate and love mail.

Because I think, I write blogs (and write stupid, long winded emails). Honestly, it is how I think or more precisely how I organize. The harder the activity without being overbearing, it forces me to organize my thoughts. Talking is the least of the offenses, so if I’m droning on like sewing machine, I’m working out a problem with you. If I’m in the shower, its my senses that are filtering my thoughts (water temp, standing, shaving my head (I’m bald damn it!), wondering where the water goes, if that fly I flushed when I was 12 has landed in nuclear waste facility and is now coming back via the drain to kill me, etc.). Maybe even while driving with me. I’ve heard Bill Clinton talks with his hands even when he drives. Strongly suggest Uber if you are with Bill.

On standing – Standing is remarkably tough if you look at as biologist so it counts as non-demanding filter. Ever tried standing and thinking and suddenly notice you’ve walked to the grocery store. Not walking; that is a reaction. Douglas Adams (author, not sane person, Hitchhikers Guide to Galaxy, et. al.) talks about flying as the art of throwing yourself at the ground and missing. So the better word for walking is “a reaction for not falling while traversing a plane (level space) in a 3 dimensional world with Newtonian gravity”. Babies start on their toes and try to go fast to overcome forward inertia. My brother was very fast because he always walked on his toes. It also is genetic as his daughter does, too.

I LOVE you. Oh don’t go all homophobic on me, please! And since it in Magenta you don’t believe me, but it really is my heterosexual brain messing with you, really, I think, I mean it is; maybe its only my brain that is heterosexual, Oh shit!, maybe; grunt “gun”; grunt “football”, say “lumberjack”, cry “man”, sing “show tunes”, OH shit!; never mind. I mean I really, really like you, like you, had friends when you were in grade school and not like Facebook which one of the more insulting but viral trends of our generation and maybe the millennium but I digress and it is Mark Andreas fault who begat Mosaic who begat the WWW (world wide web) who begat Mark Zuckerberg after about sea of begetting (plus you learn stuff, more later, beget) and digress more and it is still Facebook’s fault and only because I really like the word begat and found out it came from the word beget (this is now later) which totally cool, wonderful and a Sesame Street moment and I’m Ernie and he’s gay (SHIT!).

Yeah, I’m definitely staring to not like hate like and fine (but that was for extra points since my wife says FINE and THE F-word are the same words spelled differently and lots more on that later). Fabulous is a good F-word. And so what is it about sex. Freud “shut up,” you are sick!


Tornado breath and lightening brain or vice versa.

Back to the other side of my brain, writing helps me slow the output so I can organize my thoughts. I used to think, it slowed my thoughts down or sharpened them, but the truth is they come at me like summer thunder storm; fast, heavy, loud, dangerous, and then not at all. When I can contain them like lightening in a bottle, I’m an utter genius. When not, lets just say I get offered pink slips, my friends hate me, and my even close friends buy me breakfast to start my day over including strong adult beverages.

Back to email [in case you thought I forgot], if you are getting my email, it means we are friends and I respect you even if I don’t like you and I want your opinion and I’m alive, thinking, filtering, and just maybe I think you’d be interested. Hey, you are actually being helpful. And I may like you or really like you (Damn!). And from my view, I’d rather you respected me than like me. It is less painful in every musical I’ve ever seen on Broadway. I have friends and their OK! (here we go again with Mark Zuckerberg and Facebook, damn). In this decade it is the equivalent of hanging out on the corner, the barber shop or the market or dare my highly heterosexual brain think it, the beauty parlor. This is big, harry, scary stuff, like a Bridget Bardot and Bouffants (here we go again!).

Why I wrote this blog entry. First, I’ve always been fascinated with how people think. Second, I love humor and I thought if 5 miserable people in NYC could be funny in a show about nothing, then this is definitely funny. But finally, and everything after the but is the real truth, it is Friday and I don’t want to do my EOW work bureaucratic administrative paperwork on the laptop. In short, I’m procrastinating and dragging you with me.

And for those of you who are my friends and even family and read my blog, you can stop reading now because you know enough about me that my apology for sexual, gender, race, ethnicity, discrimination, innuendo; and any harm that may have been done to dogs, cats, gerbils, hamsters, amoebas, pine cones, mushrooms, small furry creatures from Alpha Centauri, etc.; real or imagined, I beg your forgiveness in perpetuity since this all in jest; (with my high squeaky voice) maybe