Support our nonpartisan, nonprofit research and insights which help leaders address societal challenges.Donate
25 Nov. 2014 | Comments (0)
The rise of powerful and easy-to-use software (e.g., software as a service) and analytic programming languages (e.g., R) have made it possible for people across the entire organization — not just the database trolls — to ask questions of business performance. Suddenly everyone(-ish) can see just about anything about the business.
This transparency allows more people to engage on important business problems, making success more likely. But there is one notable downside: Just because you got an answer, even a “significant” one, doesn’t mean you got a real answer, or an important one.
Many people have called this the “button effect”: The ghost in the machine gives the correct answer, every time, and I don’t need to think about it. In the messy real world, though, there are some bits of knowledge you should have so that you can interpret the button’s offering.
In big data analysis, you need to know, among other things, about “data distributions.” Many statistical tests — and almost all that are taught in statistics classes — require that the data to which the test is being applied be “normally distributed.” There is a mathematical description of this distribution, but everyone knows it as the “bell curve,” where the number of observations is plotted on the y axis; the high point is the mean, the distribution is perfectly symmetrical on both sides of the mean, and the number of observations drops off fairly rapidly on both sides of the mean. As a result of this particular structure, 66% of all observations lie within one standard deviation above and below the mean, with 95% occurring within 2 standard deviations above and below the mean. These facts allow us to use simple math to determine whether two groups differ on some set of measures to a “significant” degree. That’s a lot of words. The key element is that most of us want to know the answer to some question like “are sales up month on month?” or “do people spend more time on our new site than our old one?”
Answering these questions often involves a simple statistical test called a “z-test.” The details don’t matter, except that it’s important to realize that the z-test depends on the normal distribution and, more specifically, on symmetric standard deviation values.
Let’s work through an example of how statistics can get messed up by non-normal distributions: Are men taller than women? In the US, men are, at about 5’10” median height, about 9% taller than women, so the answer should be yes. I set out to confirm this statistic by measuring the height of about 20 women at my company and comparing them to an equivalent number of men. As expected, the statistics showed that men were taller. Then I pretended to hire another man: Dylan Postl, a professional wrestler measuring in at about 4.5 feet. With the addition of the immortal Hornswoggle, the statistics show that men and women are the same height. Oops. Then I “hired” Sandy Allen, a 7.5’ tall woman. Now the statistics show that women are significantly taller than men. What happened? The addition of Dylan and Sandy were “outliers,” values that shouldn’t really exist in the normal distribution. The simple tests I used are very sensitive to outliers, making incorrect results quite common.
It’s pretty easy to tell that Dylan and Sandy are outliers, because we all have been observing for decades how tall the people around us are. With lots of the other statistics we rely on, though, that isn’t the case. Large public policy issues are argued using statistics; if the stats can be so wrong about something trivial like height, they might be wrong in, say, health care reform or gun control. The same issue arises in business.
Doing an A/B test comparing whether people like a new product design more than an old one? The person who runs that test might come back providing average ratings for each design showing that B’s average rating is significantly better than A’s. All done, right? Well, “average” usually implies testing based on means. Ratings are ordinal numbers that don’t have means or symmetric standard deviations. How can they? On a scale of 1 to 5, 4 is not twice 2. Thus, that “significant difference” is … well… not. Yup, that number you are using to reform your entire product mix? It might be right. It also might be random.
I’m not trying to say that means and standard deviations aren’t useful. However, there is knowledge that users of powerful big data buttons need to know in order to understand the output. Some of this knowledge is simple, although often ignored even by researchers. Other bits of the knowledge are art, only gained through experience.
Regardless, the most important thing, with all due respect to Lewis Carroll, is: Beware the button, my son, the assumptions that bite, the findings that catch!
This blog first appeared on Harvard Business Review on 08/05/2014.
View our complete listing of Human Capital Analytics blogs.