Online: | |
Visits: | |
Stories: |
Story Views | |
Now: | |
Last Hour: | |
Last 24 Hours: | |
Total: |
Statistics is mathematics, a way of representing relationships. Mathematics are axiomatic systems: they make some assumptions about basic units (numbers), and about basic relationships (adding, subtracting), and rules of inference (deductive logic), and they elaborate these to draw conclusions that are too intricate to reason out in other ways. But it is just reasoning, and when applied to the real world it is only as true or accurate as the assumptions. Sometimes that's very accurate indeed. But sometimes it verges on fiction; among those times are many applications of probability and statistics.
Anyone with R or SAS or SPSS can be a push-button scientist (and R and other stat programs are even free!). Anyone with a keyboard and SurveyMonkey or other free (or buy)ware can do a survey. If you're a professional epidemiologist, you can propose a complex, intricate, jargon-dense, expansive, and expensive survey. You can call the results 'data', and you're off and running, sounding like an insightful pro. You may be very intelligent and well-trained. There's only one little problem: much of what you and your peers do is non-sense (and some of it nonsense).
The fault is not in the statistics, and often not even in the design, among those currently recognized as legitimate. The problem is in the degree of fit (or not) to the assumptions–with the emphasis on the 'or not'. Statisticians in the know know this but of course might be out of business if they said so very clearly. Many statisticians may be in the know but as in other fields, their jobs depend on following the current mythology, because that's how you get funding, publications and the like.
There is the problem of the problems–the problems we want to solve, such as in understanding the cause of disease so we can do something about it. When causal factors fit the assumptions, statistical or survey studies work very well. But when causation is far from fitting the assumptions, we seem mainly to increase the sample size and scale (and cost and duration) of studies. There may be plenty of careful thought about refining statistical design, but basically that means within the boundaries of current methods and knowledge, and the need to have big projects–that is, staying safely within the boundaries.
The BBC Radio 4 program called More Or Less keeps a watchful eye on public and scientific statistical claims, letting you know what is really known (or not) about them. Here is a recent installment of theirs on the efficacy (or believability, or neither) of dietary surveys if you want to hear a brief discussion of the subject. And here is a FiveThirtyEight link link on the same subject, on which the podcast was based.
Promissory science
(use stats to show things that aren't really accurate. Do this to justify more funding. Just like centuries of preachers)
The American Statistical Association has noted at least one important part of the problem, the use and (mainly) misuse of p-values to try to identify causation. Here on the public watchdog site FiveThirtyEight is a good discussion of the p-value statement. FiveThirtyEight is a statistical information, analysis, and watchdog site that is very worth reading.
Hey, trashing p-values is becoming a new cottage industry! Now JAMA is on the bandwagon, with an article that shows a far disproportionate xxxx (Here is the information, though it is not public domain yet, check the JAMA webpage:
JAMA. 2016;315(11):1141-1148. doi:10.1001/jama.2016.1952.)
Basically, the authors found in an extensive biomedical literature search that there was a heavy, and increasing predominance of significant results in papers that published p-values, and that those papers generally failed to provide adequate fleshing out of that result. For example, they did not report confidence intervals or other measures of the degree of 'convincingness' that might illuminate the p-value. They report a non-random effect, but often didn't give the effect size, that is, say how large the effect was. A significant increase of risk from 1% to 1.01% is trivial, even if one could accept all the assumptions of the sampling and analysis.
We have written about this before, in a series of posts link. Here we make some different but complementary points.
BUT, we think all of this largely misses the point. The situation is much worse than the public panic is saying. None of the recommended supplementary measures will address the deeper underlying issues with the shell game that is statistical analysis. In many ways, the whole enterprise is bogus, a house largely made of cards (but you don't know which are bricks and which are cards). That is because of the nature of statistical assumptions by which the math could be expected to mimic reality.
The problem is that we don't know which measures and which assumptions are violated or how seriously. We can make guesses and do all sorts of auxiliary tests and the like, but as decades of experience in the social, behavioral, and biomedical (especially epidemiological) and even the evolutionary biology and ecology worlds, we often don't have serious ways to check these things.
The p-values are either bogus because the assumptions are not met, and for the reasons mentioned above, but the plea to include confidence intervals also is rather desperate, and for the same reason. CIs do give some sense of the range of possible test values that might be compatible with the data, but they still do depend on the self-same data and the relevant assumptions. They are very useful, but not a panacea (and, by the way, be alert: weak studies sometimes give 80%, or +/- 1 StdDev, rather than 95% CIs).
But how 'accurate' are these estimates? That question almost has no meaning. If there were an analytic truth, derived from a rigorous causal theory, we could ask how many decimal places off our answers are from the truth. But we are far–usually very far, indeed to be fair, but not reassuring, unknowably far–from that situation. That is because we don't have such theory.
If we did have a theory, then in confronting a new situation we would be able to apply that theory to our data, and estimate the theory's parameters and the like under that particular situation. If our fit was poor, and we'd know what 'poor' meant, we would then challenge the theory, the study design, or our understanding of the circumstances. But, at least, the theory comes to the problem from the outside, and the data are contrasted with that externally derived theory.
In epidemiology, genetics, and evolution, we have a very different situation. We have no such theory, so we have to concoct a kind of data that will reveal whether 'something' of interest (often that we don't or cannot specify) is going on. Rather than that, we have to make an internal comparison–cases vs controls, for example, and that forces us to make statistical assumptions about the difference, such as that, other than (say) exposure to coffee, our sample of diseased vs normal subjects differ only in their coffee consumption (or that the distribution of other variation is random with regard to coffee consumption. Such internally based comparisons are the problem, and a major reason why theory-poor fields do so, well, so poorly.
It is nobody's fault if we don't have adequate theory. The fault, dear Brutus, is in ourselves, that we use Promissory Science, and feign far deeper knowledge than we actually have. There are many reasons, human foibles, economic, hopes, and so on for this. That we cannot refrain is blameworthy, and that we have built up the huge hierarchy of automaton statistical approaches is the professional edifice that perpetuates the problem. It isn't just preachers who work a Promissory field.
But the bogus nature of the whole enterprise can be seen in a different way. If all studies were reported in the literature (including failed drug trials), then it is only right that the major journals should carry those findings that are most likely true and important. A top journal is, almost by definition, where the most important results are to be published.
The problem is that non-significant results are badly under-reported, and therefore we can't tell whether JAMA and Lancet and Nature et al. are publishing rubbish. Studies have shown that they publish studies more likely not to be replicated than other 'lesser' journals, so there is certainly a bias in what they publish. This is hard to fix, because you don't know how important a positive result is until you've also seen the negative ones–and they often haven't been done, much less not reported.
Something is rotten, and it's rotten in a lot more places than Denmark.
Now, if you want to know one way we should really be spending our public health research funds, for more than research welfare programs, here's another cogent broadcast link to the same series. There is a serious, perhaps loomingly acute, problem with antibiotic resistance. That's where properly constrained research (not just for mega profits) would warrant public investment. If you want a quick discussion, here is another link on the public watchdog program MoreOrLess antibiotic resistance economics