There are three easy ways to lie with statistics, ways anybody can use, and will pass peer review. It is important for the readers to know this, because of the following:
It’s no secret that there’s a big problem with science nowadays. While many view “science” as a trump card for “true, no argument,” the evidence just keeps getting stronger that something is terribly wrong in modern scientific research.
Groundbreaking research, after years of being taken as gospel, has been found to violate a key scientific principle: replication. We can’t repeat the experiments to get the same result.
We’re told Jesus turned water into wine but…that’s not science, because there’s no way to repeat the experiment under controlled conditions. No problem, such a thing is filed under “religion” and we move on.
Every day we’re told about a new scientific wonder, particularly in the medical and psychological fields, as well as in the social. We’re told of wonders in other fields, and those wonders are also drawing increasing suspicion, but I want to focus on those medical/psych and social wonders, because these almost exclusively rely upon statistical analysis.
Using statistics in these fields is perfectly understandable. We all realize each human being is unique, and even each ailment is to some extent unique. Even a simple headache can vary quite a bit, and the response of that headache to, say, aspirin, will vary.
For every statistical study, the key concept is the “p-value,” the probability that there’s nothing going on despite our belief that we’ve discovered something. Naturally you want your p-value to be very small. The results reported on Sunday morning coffee shows have p-values around 0.05, which is usually called “significant.” Done honestly, a p-value of 0.05 or less means there’s a 5% chance that the result is just dumb luck, and means nothing—if statistics were done honestly, about 1 in 20 “new study shows” story on a Sunday morning show would be rubbish (rubbish, of course, is much more common than 1 study in 20 on TV).
For p-values below 0.01, we’re talking a 1% chance, 1 time in 100, of pure luck giving us this result. This is a pretty significant result, the kind of thing that’s considered reliable, and it is, if statistics were done honestly (I’ll stop with the “if statistics were done honestly” qualifier past this point, just assume I’m writing it nearly every sentence, although it seldom seems to apply to the real world).
Before I get to the three ways I can manipulate the statistics to get my results, I really feel the need to point out: I’ve yet to see a single statistics textbook that goes over these illicit methods. I taught my first statistics course over 20 years ago…it’s never discussed, and yet, “somehow” these methods are the foundation of results in many fields, for many studies.
So suppose I have a new headache drug, and I want to show the drug works via statistical analysis. Being a statistician, I’m virtually guaranteed to do so, and I’d like to go over the three trivial ways I can show my drug works (and I won’t even need to use the placebo affect if I don’t want to!).
I’ll find people with headaches, and ask them to rate their pain on a 1 to 10 scale, give them the drug (a glass of water), wait a while, and ask them to again rate their pain on a 1 to 10 scale. I won’t bother describing the study in detail, but the previous sentence describes the basic way statistical studies are done.
(There is something called a “control” you really should use, but in many medical studies, it’s hard to establish a control—you can’t exactly give people headaches, or cancer, and not treat them, the better to compare to the people you’re treating, for example.)
I don’t care if the drug is “a glass of water” I’m going to produce a study with a result and p-value below 0.01. All I have to do is manipulate the data in the way it’s done every day now.
There are three common ways to manipulate statistics; allow me to start with the first, easiest, method to lie with statistics:
Method One: Data Mining
I trust the gentle reader has filled out a survey before, and I assure you that such are quite common in serious experimental tests. Such a survey might not include name, but ethnicity, age, birthplace, religion, birthday, political leanings, income, number of siblings (and type, problematic nowadays with all the transgenderism brouhaha), gender (self assigned or otherwise), home ownership, car ownership, education level, education level of parents, and blood type might be on it…all sorts of questions can appear on a survey.
Let’s for the sake of argument assume the survey for my experiment has a mere 12 questions that might be relevant to headaches.
Now let’s get our significant result!
So, first I do the honest thing: compare the all the people in my study, and compare their pain ratings before and after they drink water. Again, I spare the calculation methods, but if, say, pain levels for the whole group drop from an average of “7” to an average of “1,” then I got lucky, my p-value is already below 0.01. Realize, I could just get lucky, which why you never use the phrase “statistics prove.” Those two words should be close to each other as often as Trump and Clinton (either or both) share a shower. While statistics can prove nothing, we sure like to say we’ve proved something, so we use the bogus phrase, “clinically proven,” even though nothing has been proven, or can be proven, with statistics.
So I run the study the honest way, hoping to get lucky.
No luck? No problem. I now go along each variable. I check the males’ pain level (one variable), I check the “under the age of 18” pain level (another variable), and I keep going, with all 12 variables. Twelve more chances to get lucky! I have a 1% chance of getting lucky, and I’ve tried 13 things.
No luck? No problem. I now compare two of my variables, say, gender and age. Again, avoiding the math here, there are 66 ways to compare two variables in 12. Keep in mind, I’ve now identified 79 ways to get lucky and arrive at a significant result. Maybe I’ll get my 1% chance in that (I have a better than 50% chance of getting lucky at this point).
No luck? No problem! I now look at three of my variables, and compare to pain levels…still no luck, I’ll go to four variables. Skipping over the math, there are thousands of ways of checking for a result using data mining on my very small survey. I’m going to make that 1% chance at some point…and now I’m on a Sunday morning show.
If this seems farfetched, I ask the gentle reader to simply watch TV, and wait until you hear a line like “for women under 40 who smoke this drug may…” and realize you’re listening to a result that came from data mining, simply running test after test among various variables (in this case, three: gender, age, and smoking status) until something significant and reasonable-sounding came up.
“Reasonable sounding” is the problem with this method.
Using this method I might get “Republican males under 30 who drink a glass of water will have their headache pain reduced,” but by the time my study makes it to the Sunday show, the talking heads will simply say “A study clinically proves some males can use water as a pain reliever.”
I won’t have much motivation to clarify what the media says, because it’s totally not in my best interest to hurt my own publicity, and they’re technically correct anyway. I’ll stand by my results, the data will pass peer review (I’ve seen the like enough times, and many doctoral theses in education/administration pass the doctoral review committee doing this). Yes, the result is rubbish, but here’s the kicker:
Nobody will check my work.
There’s no money in research for verifying my results the proper way (that is, by creating a new study just with Republican males and seeing what happens when they drink water). My groundbreaking study will last for decades, probably, and I can make a new career out of using water as a pain reliever. I might even set up a web page selling “Professor Doom’s Miracle Water” that, I promise you, “may” be even more effective (as shown by my statistical studies) in pain reduction for “some people.” Anyone will buy my water, given enough pain.
This is much of what passes as science in many fields today: huge data mining efforts, silly results that are just the result of dumb luck and repeated effort, and absolutely no attempt to verify because our scientific system wants results, not honesty.
Jesus turned water into wine, and this is not science because we can’t replicate the event. Every day we get the results of another study, one that is not replicated (and many studies simply cannot be replicated).
Why are these studies called science?
This trivial way to get a statistical result is why we have so many drugs that do nothing (as the CEO of a drug company admits), waste so much time buying things “medical studies have shown to work” that do no such thing.
Data mining is the most common, most trivial way to get a result, but I promised my readers two more ways to lie with statistics, ways that never get mentioned in statistics textbooks even though they are (illegitimately) used often today.
I’ll cover those less popular ways next time.