11-6-10 Hodgepodge
Saturday, November 06, 2010
Lies, Damned Lies, and ... Medical Research?
Years ago, in our lab's journal club, a colleague presented a curiously-titled 2005 paper by one John Ioannidis, "Why Most Published Research Findings Are False." Insufficiently intrigued by the presentation and too busy at the time to read it or follow up on it, I all but forgot about the paper.
But Ioannidis hasn't gone away. He is now getting attention in the popular press for this work and a companion paper, "Contradicted and Initially Stronger Effects in Highly Cited Clinical Research" (PDF). The first of these papers is a mathematical model of a suspicion I have had for some time about certain areas of research and the second takes a look at the research literature as a check.
The Atlantic presents the above work for a lay audience. It's long, but worth thinking about.
Ioannidis was putting his contentions to the test not against run-of-the-mill research, or even merely well-accepted research, but against the absolute tip of the research pyramid. Of the 49 articles, 45 claimed to have uncovered effective interventions. Thirty-four of these claims had been retested, and 14 of these, or 41 percent, had been convincingly shown to be wrong or significantly exaggerated. If between a third and a half of the most acclaimed research in medicine was proving untrustworthy, the scope and impact of the problem were undeniable. That article was published in the Journal of the American Medical Association.This being science, Ioannidis is not being taken at his word. For example, two authors argued in 2007 that Ioannidis's analysis is suspect.
We agree with the paper's conclusions and recommendations that many medical research findings are less definitive than readers suspect, that P-values are widely misinterpreted, that bias of various forms is widespread, that multiple approaches are needed to prevent the literature from being systematically biased and the need for more data on the prevalence of false claims. But calculating the unreliability of the medical research literature, in whole or in part, requires more empirical evidence and different inferential models than were used. The claim that "most research findings are false for most research designs and for most fields" must be considered as yet unproven.Ioannidis has since answered them. The debate goes on, but the point is well taken.
Weekend Reading
"Don't forget who put you in office and why -- namely, the independent-minded Tea Party voters." -- Paul Hsieh in "GOP: Dance With The One Who Brung You" at Pajamas Media
"... [M]y message to conservatives this time is, 'I've given you a Republican voter... if you can keep me.'" -- Jared Rhodes in "A Republican Voter ..." at the web site of the Lucidicus Project (HT: Amit Ghate)
"What really frightens me -- as both an investor and a citizen of the [planet] -- is 'eco-horror,' a genre of storytelling now seen in fine art galleries, movie theaters and basic cable. Right now, there's a prime example on display inside the Museum of London." -- Jonathan Hoenig in "Forget Halloween: It's the Greens that Scare Me" at SmartMoney
Comment of the Week
A fellow blogger makes an important clarification.
"[T]he fact of morally condemning in and of itself is not sufficient to obligate explanation." -- Kendall J
What if ...
... you really needed to move an enormous generator?
-- CAV
4 comments:
Yo, Gus, another couple of good explanations of some aspects of Ioannidis' findings are available at a math and statistics blog I follow. From the first post:
Here’s an example that shows how p-values can be misleading. Suppose you have 1,000 totally ineffective drugs to test. About 1 out of every 20 trials will produce a p-value of 0.05 or smaller by chance, so about 50 trials out of the 1,000 will have a “significant” result, and only those studies will publish their results. The error rate in the lab was indeed 5%, but the error rate in the literature coming out of the lab is 100 percent!
The second link concerns a follow-up study that found that popular research areas produce more false results by the same mechanism as above: "The more people who test an idea, the more likely someone is going to find data in support of it by chance."
(And I'll point to a third blogpost there on the particular matter of microarray studies; the author does work as a medical statistician, I believe. It's more technical than the other posts.)
This is related to the strong tendency to rely almost exclusively on significance levels in interpreting results, which a number of statisticians have rightfully decried as a procrustean abuse. Some ways of supplementing the usual statistical analyses include universally reporting effect size, the power of the test, confidence intervals, and the like as well as significance levels; but the real problem lurking underneath all this is that scientists often aren't trained to think critically enough about the underlying statistical models, so that they don't automatically test all of the assumptions underlying their tests of choice against the data on hand as professional statisticians are trained to do (testing the independence, equal variance, and normality of the samples, for example, before doing anything else--and preferably in that order, since the reliablility of the usual tests is much more sensitive to the first than the last).
And of course in the social sciences the problems are much worse and more deeply-ingrained. I recently proofread an article for a friend on a phonetic study she performed, written up for a general journal (philological, you might say, rather than devoted to phonetics), so besides running inapt statistical tests she had the problem of explaining the tests she used, which were the usual mix of ANOVA and discriminant analysis. Turns out that logisitic regression was much better suited to the problem (though in the end the results were much the same...only this time accurate enough for anything better than government work), but as she told me when we discussed it, even in phonetics people are taught to use statistics like "trained monkeys," as she put it--that is, to use the ol' "plug'n'chug" approach as it's usually described. Sad, but all too common.
And on that score, for anyone interested, there's a good overview of the problems with statistical testing in the sciences with a worthwhile historical overview of the development of the standard significance-level tests in the first half of the 20th century by Gerd Gigerenzer, "Mindless Statistics" (1994), available to download here. It's worth contemplating just how much pernicious nonsense has been borthed and propagated in the social sciences by stupid, wrong-headed, dimwitted misuse of statistics by uncritical "trained monkeys."
Snedcat,
Thanks for this comment. The links are worthwhile and the first example is particularly striking.
Gus
Yo, Gus, there's another aspect to Ioannidis's research that I think needs to be addressed ASAP by the medical community: How many episodes of House are knocked into a cocked hat by his findings?
Hmmm. Does that quip make you a buzzkill -- or Ioannidis?
Also, the man is one patient for whom, "It's okay, we're using the latest techniques to save you," might not be the most reassuring thing to say!
Post a Comment