Online: | |

Visits: | |

Stories: |

Story Views | |

Now: | |

Last Hour: | |

Last 24 Hours: | |

Total: |

Wednesday, April 5, 2017 13:25

% of readers think this story is Fact. Add your two cents.

**Paradoxes of probability and other statistical strangeness**

Stephen Woodcock, *University of Technology Sydney* *Statistics is a useful tool for understanding the patterns in the world around us. But our intuition often lets us down when it comes to interpreting those patterns. In this series we look at some of the common mistakes we make and how to avoid them when thinking about statistics, probability and risk.*

You don’t have to wait long to see a headline proclaiming that some food or behaviour is associated with either an increased or a decreased health risk, or often both. How can it be that seemingly rigorous scientific studies can produce opposite conclusions?

Nowadays, researchers can access a wealth of software packages that can readily analyse data and output the results of complex statistical tests. While these are powerful resources, they also open the door to people without a full statistical understanding to misunderstand some of the subtleties within a dataset and to draw wildly incorrect conclusions.

Here are a few common statistical fallacies and paradoxes and how they can lead to results that are counterintuitive and, in many cases, simply wrong.

One example of this paradox is where a treatment can be detrimental in all groups of patients, yet can appear beneficial overall once the groups are combined.

The overall results make it look like the treatment was beneficial to patients, with a higher recovery rate for patients with the treatment than for those without it.

However, when you drill down into the various groups that made up the cohort in the study, you see in all groups of patients, the recovery rate was 50% higher for patients who had *no* treatment.

But note that the size and age distribution of each group is different between those who took the treatment and those who didn’t. This is what distorts the numbers. In this case, the treatment group is disproportionately stacked with children, whose recovery rates are typically higher, with or without treatment.

If, for example, we hear that someone loves music, we might think it’s more likely they’re a professional musician than an accountant. However, there are many more accountants than there are professional musicians. Here we have neglected that the

Let’s say there is a test for the condition, but it’s not perfect. If someone has the condition, the test will correctly identify them as being ill around 92% of the time. If someone

So if we test a group of people, and find that over a quarter of them are diagnosed as being ill, we might expect that most of these people really do have the condition. But we’d be wrong.

According to our numbers above, of the 4% of patients who are ill, almost 92% will be correctly diagnosed as ill (that is, about 3.67% of the overall population). But of the 96% of patients who are not ill, 25% will be

What this means is that of the approximately 27.67% of the population who are diagnosed as ill, only around 3.67% actually are. So of the people who were diagnosed as ill, only around 13% (that is, 3.67%/27.67%) actually are unwell.

Worryingly, when a famous study asked general practitioners to perform a similar calculation to inform patients of the correct risks associated with mammogram results, just 15% of them did so correctly.

The name comes from the American comedian Will Rogers, who joked that “when the Okies left Oklahoma and moved to California, they raised the average intelligence in both states”.

Former New Zealand Prime Minister Rob Muldoon provided a local variant on the joke in the 1980s, regarding migration from his nation into Australia.

The patients who have life expectancies of 40 and 50 have been diagnosed with a medical condition; the other four have not. This gives an average life expectancy within diagnosed patients of 45 years and within non-diagnosed patients of 75 years.

If an improved diagnostic tool is developed that detects the condition in the patient with the 60-year life expectancy, then the average within both groups rises by 5 years.

This can occur when the subset is not an unbiased sample of the whole population. It has been frequently cited in medical statistics. For example, if patients only present at a clinic with disease A, disease B or both, then even if the two diseases are independent, a negative association between them may be observed.

If the school admits only students who are excellent academically, excellent at sport or excellent at both, then within this group it would appear that sporting ability is negatively correlated with academic ability.

To illustrate, assume that every potential student is ranked on both academic and sporting ability from 1 to 10. There are an equal proportion of people in each band for each skill. Knowing a person’s band in either skill does not tell you anything about their likely band in the other.

Assume now that the school only admits students who are at band 9 or 10 in at least one of the skills.

If we look at the whole population, the average academic rank of the weakest sportsperson and the best sportsperson are both equal (5.5).

However, within the set of admitted students, the average academic rank of the elite sportsperson is still that of the whole population (5.5), but the average academic rank of the weakest sportsperson is 9.5, wrongly implying a negative correlation between the two abilities.

While each pair is extremely unlikely to look dependent, the chances are that from the half million pairs, quite a few will look dependent.

In a group of 23 people (assuming each of their birthdays is an independently chosen day of the year with all days equally likely), it is more likely than not that at least two of the group have the same birthday.

People often disbelieve this, recalling that it is rare that they meet someone who shares their own birthday. If you just pick two people, the chance they share a birthday is, of course, low (roughly 1 in 365, which is less than 0.3%).

However, with 23 people there are 253 (23×22/2) pairs of people who might have a common birthday. So by looking across the whole group you are testing to see if any one of these 253 pairings, each of which independently has a 0.3% chance of coinciding, does indeed match. These many possibilities of a pair actually make it statistically very likely for coincidental matches to arise.

For a group of as few as 40 people, it is almost nine times as likely that there is a shared birthday than not.

Stephen Woodcock, Senior Lecturer in Mathematics, *University of Technology Sydney*

This article was originally published on The Conversation. Read the original article.

Source: http://gmopundit.blogspot.com/2017/04/logical-traps-that-can-snare-unwary.html