Mathematics

Understanding Statistis and Lies



Tweet
Ken Smauthi's image for:
"Understanding Statistis and Lies"
Caption: 
Location: 
Image by: 
©  

Statistics lie, this we know for a fact. When both sides of any argument can quote competing statistics in order to prove their point, it becomes obvious that these prized facts and figures are nowhere near as reliable as we want them to be. Is the truth out there? Can statistics help us find it? Maybe.... but we're going to need to completely change the way that we think about statistics if we're ever going to get there.

Statistics have a power over us. It's a power that we willingly give them when we throw critical thinking to the wind in favour of numbers that purport to be hard little nuggets of truth. We want statistics to give us firm answers that we can use to bolster our arguments. And while this helps statistics to mislead and confuse us, it's an understandable reaction. After all, who likes uncertainty?

But the world is filled with uncertainty, and our best efforts to bring light to an issue often amount to little more than a weak flashlight beam. And while the truth is that one can easily lie with statistics, the greater truth is that it's far easier to lie without them. So even if we're not always able to make perfect sense of an issue with statistics, we should still come up with some criteria for evaluating statistics rationally. By doing so we can decide what claims to give greater weight, even while admitting that the answer remains uncertain.

Ideally, what we want to do when assessing any study or statistic is to look at the original data. Find out how the numbers were calculated, what methods were used for collecting the data, what definitions were used, and all the other nitty gritty little details so that we can come to a conclusion about the reliability of the conclusions. But, of course, that's not always possible. And even when it is, who has the time?

Fortunately, there are less demanding ways of assessing statistics. There are a number of tips and rules of thumb that one can use to help evaluate claims and statistics in the newspaper, on television, or quoted by friends and family. This article will lay out some of the most useful of these for your consideration.

1. Try To Put The Claims Into Perspective

Numbers are often presented to us without any attempt to put them into context. When numbers are given out of context, they can seem more dramatic than they really are. Big numbers often seem scary, and we may assume that a big number naturally suggests a big problem. So if we hear, for example, that 6,000 people a year die of reallyscarydiseaseitis, we might assume that this is a grave concern.

But if we put that number in perspective, we can notice that there are 2.4 million deaths in the US per year. And in addition, if we look at the list of the top causes of death in the US by the National Vital Statistics Report, it becomes clear that every cause listed has a far greater death tole than reallyscarydiseaseitis. The lowest on the list, homicide (#14), kills close to 17,000 Americans yearly. That's close to triple the number killed by reallyscarydiseaseitis.

Of course, that wouldn't be to say that reallyscarydiseaseitis should be belittled. However, if some article or report is trying to make the case that you should be very concerned about your chances of dying from this disease, you may want to show some healthy skepticism.

The Internet is a good source for finding the numbers to put these kinds of claims in context. However, it's best to make sure that the numbers you're using are from reliable sources. Try to find the most authoritative numbers possible. Here are a few authoritative numbers regarding the United States that may help you put some statistics you've heard into perspective:

- The US population is approximately 300 million.

- The US death rate is approximately 2.4 million per year

- The US birth rate is approximately 4 million per year. (Using this, you can estimate the number of people in most age cohorts as well - for example the number of people aged under 14 is approximately 14 times 4 million, which equals 56 million)

When you look at numbers and statistics in their proper context, things can be made clear that you wouldn't have realized otherwise.

2. Be Skeptical Of Numbers That Seem Too Large

Many organizations are trying to convince you that the problem they're fighting is extremely important. As such, these groups often feel pressure to supply the most alarming statistics they can in order to get your attention.

This is not to accuse such groups of lying. But as mentioned before, there's often a great deal of uncertainty when dealing with statistical information. Groups with a cause obviously feel that the problem they want to tackle is very alarming, and so it's not surprising that where uncertainty exists such groups will err towards interpreting the data in ways that make it look as alarming as they feel the issue actually is.

But often, this goes too far. And the key rule of thumb to remember here is that in general, the more alarming an incident of a given type is, the less common it is likely to be.

So, for example, murders are less common than assaults. Armed robbery is less common than petty theft. And multi-car pileups are less common than small time fender benders.

This relates to putting the numbers into perspective. If you see a statistical claim that seems too large, especially if the number is larger than that of a related but less alarming figure, be suspicious.

Numbers that seem surprisingly large can also be the result of calculation errors, logical errors, and best guess estimates that amount to little more than speculation.

3. Ask Yourself "How Do They Know That?"

Often statistics are presented in a way that should cause immediate suspicion. Most of the time, you should be able to, in your mind, think through the process that would be required to find out the information presented. A statistic on the average height of Americans would involve gathering together a representative sample of Americans and measuring their heights, for example.

But often you will hear statistics that don't lend themselves to such easy measurements. For example, it is widely claimed that China is the source of 70% of the worlds pirated goods. But this raises a number of questions. First and foremost, how did the people who came up with this statistic figure out the number of pirated goods in the world? The pirating industry does not release annual reports.

Obviously anybody making such an estimate will have to rely on more fragmented and less reliable data. In addition, such estimates are necessarily going to rely on a lot of choices and perhaps even more uncertain estimates in order to come to a final number.

When a statistic makes a claim that would be very difficult to substantiate, that's a clue that the statistic is likely to be far less certain than claimed.

4. Beware Of Long Term Forecasts Based On Short Term Trends

It's tempting to think that we can accurately predict the far future based on what we observe happening today. Unfortunately, predicting the future remains a very uncertain art.

Take for example the subject of religious affiliation in the United States. In 2001, a report was released on religious demographics in the United States by the American Religious Identification Survey. This report estimated that people who identified themselves as Christians made up 79.8% of the population.

The difficulty came when people noticed that this was lower than the percentage of Christians estimate from 1990 of 88.3%. When you do the math, and round it off, that's about one percentage point drop per year. The message to a lot of people was obvious. Christianity was in decline! The estimates began coming in. Sometime around 2030, Christianity was going to become a minority religion in the United States!

The problem, of course, is that there is no reason to believe that this was a consistent trend, or one that would continue on into the future. If we were able to map out the changes in self reported religious affiliation throughout the 1990's, we would probably have not seen a steady decline of about 1% per year. More likely, we would have seen an inconsistent graph that may have remained steady for several years, then taken a dip the next, perhaps climbing a little once in a while, but culminating at a low point in 2001. There should be no reason to think that the average decline is going to continue on ad-infinitum.

In fact, we now have more updated numbers from 2007 released in a study by the PEW Forum On Religion & Public Life in February of 2008. These more updated figures estimate the percentage of Christians in the United States to be 78.4%. That's a decrease of only 1.4% in 6 years - a far cry from the 1% average per year estimated during the 1990's. It appears that since 2001, Christianity in the US has only experienced an approximate 0.2% decline per year. At this rate, Christianity won't become a minority religion in the US until sometime after 2149.

But, of course, there's no reason to expect that this new rate of 0.2% will remain steady either. In fact, the 2007 number is so close to the 2001 number that the difference may not be statistically significant. The lesson here may be that the percentage of Christians in America has levelled off and is showing no evidence of change.

If you play around with the concept of projecting current trends out into the future, you can come up with some decidedly silly results. For example, a man may want to train for the 100 meter dash. At first, he is able to run it in 35 seconds. A week later, he is able to do it in 30 seconds. A week after that, he is proud of his progress of being able to run it in 25 seconds. He does some calculations, and is excited to estimate that in 5 more weeks, he should be able to do the 25 meter dash in 0 seconds! He is very pleased at the prospect of learning how to teleport to the finish line instantaneously.

What's more, he reasons that if he keeps on practising after that, he should be able to cross the finish line even before he starts. He will have acquired the powers of teleportation and time travel simply by training for the 100 meter dash. How exciting!

And, of course, there's the classic example of Elvis Impersonators. In 1977, there were 37 Elvis Impersonators in the world. By 1993, there were an estimated 48,000 Elvis Impersonators in the world. When you do the math, this means that by 2010, one out of every three humans will be an Elvis Impersonator. At the time of this writing, 2010 is only a little over a year away. That's something to look forward to!

5. Ask How Strong Is The Case For Causation?

Often statistics will try to make the case that one phenomenon causes another. For example, one might argue that certain law enforcement policies cause a decrease in crime. The most common way to support this argument is with statistics that show that crime rates dropped at about the same time these law enforcement policies were being put in place.

The problem is that it's perfectly possible, even common, for two phenomena to be correlated with each other and not to be causally related. This is often embodied in the mantra "Correlation Does Not Imply Causation". We should ask for a little more evidence than mere coincidence before accepting statements of causation.

And make no mistake, it's remarkably easy to find correlations and make them look related. There's a famous example of how the decline of pirates has been linked to global warming. There is a graph showing that as piracy has declined, global temperatures have risen. The rise in temperature is matched pretty evenly with the decline of piracy in the graph, leading to the inevitable conclusion that pirates keep global warming at bay, and that their decline has been bad for the planet.

Of course, it's nonsense and was created for a laugh. But it was also created to illustrate a valid point. Any two events that show some sort of consistent change over a fixed period of time can be mashed together on a graph to make it look like one caused the other.

Hopefully, with the recent pirate activity of hijacking that oil tanker, global temperatures will begin to recover.

Of course, if we're able to perform a study, and if we can control for as many variables as possible, then we may be able to find strong correlations that can be counted as good evidence for causation. The best kind of study to perform is called a double blind placebo controlled clinical trial. This kind of study is usually done in order to determine the effectiveness of medicines. This kind of study seeks volunteers, who are put into two or more groups. One group is given a placebo, while the other(s) are given the specified treatment.

If these studies are carefully controlled for a number of factors, they can produce extremely reliable data. Of course, such studies are often not possible to perform for a number of reasons. So we're usually forced to perform observational studies of varying natures.

The problem here is that these studies are just not able to control for as many variables as a clinical trial. Therefore, the data they generate are bound to be more uncertain, and to carry more random noise and variation. As hard as it is to determine causation at the best of times, these studies make it even harder.

Imagine a study which finds that eating ambrosia is correlated with a 28% reduction in the incidence werewolfism. Of course, for moral reasons it's clear that they didn't do a clinical trial where they gave some subjects ambrosia, some a placebo, and then subjected the lot of them to the werewolf virus. Instead, they performed an observational study. Likely, they surveyed a number of people. They asked each person how often they ate ambrosia, and whether they were afflicted with werewolfism. Then, when they crunched the data, they found this correlation.

Can we really conclude from this study that ambrosia has a real effect on werewolfism? How much of a correlation would be considered adequate evidence? Well, the fact is that, especially in epidemiological studies, many scientists don't feel that the case can be made for causation if the link is below 100%. Some say even 200% or 300%.

This rule of thumb is attested by such people as Marcia Angell, former editor of the New England Journal of Medicine, Dr. Kabat, IAQC epidemiologist. and Robert Temple, director of drug evaluation at the Food and Drug Administration,

A study was done by Canadian researchers to prove just this point. They studied the injuries of hospital patients and looked for correlations with their astrological signs. It turns out that Saggitarians are 38% more likely to break their legs than other star signs, and Leos are 15% more likely to suffer from internal bleeding. This wasn't a validation of astrology. Instead, it illustrates the folly of assuming that any correlation can be said to be a causal factor.

So be suspicious of studies which find correlations that are this weak. Also, it is prudent to ask: if there really is a true link in this correlation, could it be something other than what's claimed? There could be a third factor that causes both events. For example, falling asleep with your shoes on may be strongly correlated with waking up with a headache. Does falling asleep with your shoes on cause a headache? Probably not. Falling asleep with your shoes on is also strongly correlated with coming home drunk, which causes you to fall asleep with your shoes on and also causes your headache.

If one event does cause the other, it could be the case that the causal chain is the opposite of the one being argued. For example, a study by the University of Pennsylvania determined that children who slept with the lights on were more likely to develop myopia later in life. The obvious argument is that sleeping with the lights on causes myopia. However, later studies have shown that myopia is a very inheritable condition, and that parents with myopia are more likely to leave their children's lights on for their own visual needs. When genetics are accounted for, the correlation disappears.

So the causal relationship is reversed. Sleeping with the lights on doesn't make you more likely to develop myopia. Being genetically susceptible to myopia causes you to have an increased likeliness of sleeping with the lights on.

When there are so many ways that correlations can lead us astray, we must be very careful before accepting them as evidence of causation. Make sure that the evidence is more substantial than mere coincidence.

-

Obviously, this article is not meant as an exhaustive guide to interpreting statistics. There are many additional factors, as well as guidelines and subtleties that can be analyzed in order to assess the statistics you find in the newspaper, magazines, and on television. Hopefully this article will serve as a starting point to get you thinking about data and understanding that statistics are products of human choices and not hard little nuggets of truth.

If you use these guidelines as a starting point, you will be well on your way towards becoming a critical consumer of information.

Tweet
More about this author: Ken Smauthi

From Around the Web




ARTICLE SOURCES AND CITATIONS