What can be done about fraudulent data?

5/22/2015 12:40:00 PM
Tweetable
You've probably heard of Michael LaCour, the political science grad student who will probably never graduate now that everyone knows his provocative and surprisingly conclusive study, in fact, used data he fabricated. Political science isn't my field, and I don't have as good an eye for context as Andrew Gelman, so when the study originally came out I didn't grasp how significant the results were--or rather would have been, if they were real. The paper described an experiment in which gay and straight canvassers went around trying to persuade people to support gay marriage, and found that gay canvassers persuaded a lot more people than straight ones, suggesting that personal acquaintances have a big effect on people's political views. I was vaguely aware of previous (real) research that had shown that in-person canvassing was the most effective form of political persuasion, so I didn't think much of the new results. But actually LaCour's results were way out of place, showing a massively higher effect than anything previously. As Gelman said when the paper first came out:
"A difference of 0.8 on a five-point scale . . . wow! You rarely see this sort of thing. Just do the math. On a 1-5 scale, the maximum theoretically possible change would be 4. But, considering that lots of people are already at “4” or “5” on the scale, it’s hard to imagine an average change of more than 2. And that would be massive. So we’re talking about a causal effect that’s a full 40% of what is pretty much the maximum change imaginable. Wow, indeed. And, judging by the small standard errors (again, see the graphs above), these effects are real, not obtained by capitalizing on chance or the statistical significance filter or anything like that."
People more queued into political science research than me spotted the implications right away, and this, in addition to the fact that the subject matter of gay marriage was timely, catapulted LaCour and his paper into fame.

But it turns out the data was fake. I recommend reading the excellent documentation from the researchers who expertly caught the fraud. It really is an example not just of talent, but professionalism. Accusations of fraudulent data are thrown around with surprising frequency--mostly at results the accusers personally disagree with--but it's rare to see real investigative work uncover it, offering every reasonable benefit of the doubt in the process.

I suspect that there's a media bias towards fraudulent research. LaCour's paper became so famous precisely because the underlying data was fake, producing a result that was untrue but also wildly surprising and newsworthy. Because the media focuses on interesting results, they will be disproportionately likely to report fake results which are inherently interesting because they don't match the real world. Given this, we need media outlets to cover new research more like the way Gelman did--he did not suspect fraud, but did emphasize how out-of-place the results were, and I think that goes a long way to dampen the misinformation spreading when retractions do happen. And of course, not all retractions are fraud--honest mistakes happen and, to be honest, it doesn't take any mistakes or fraud to get an occasional odd-ball dataset that yields a false positive.

Still, I worry that the reason we eventually learned of this fraud was because LaCour knew just enough statistics to get himself into trouble (what if he had understood "heaping" effects better?) and the two researchers Brookman and Kalla wanted to perform a similar extension to the paper. What if no one happened to try to replicate the methods? How many fraudulent papers and datasets are still out there?

Adam Marcus and Ivan Oransky tell the story of a much bigger fish than LaCour whose fishy data, published in papers undetected for decades, produced a record-breaking 183 retractions, comprising about 7 percent of all retractions over the entire period from 1980 to 2011. Researchers eventually caught fraudster, Yoshitaka Fujii, by using statistical methods to test whether the datasets Fujii produced matched what would be expected by sampling real-world data:
'Using these techniques, Carlisle concluded in a paper he published in Anaesthesia in 2012 that the odds of some of Fujii’s findings being experimentally derived were on the order of 10-33, a hideously small number. As Carlisle dryly explained, there were “unnatural patterns” that “would support the conclusion that these data depart from those that would be expected from random sampling to a sufficient degree that they should not contribute to the evidence base.”'
Marcus and Oransky strongly imply that we should apply John Carlisle's statistical method automatically to all papers submitted to journals, to weed out fraudulent papers. They express frustration at journal editors and others who resist this proposal.

I think regular use of statistical forensics is unlikely to get us anywhere in the long run. Like LaCour, Fujii was caught because he knew just enough statistics to get himself into trouble. If we start screening all datasets using these methods, fraudsters will simply learn how to simulate data that can pass the test. When you aren't constrained to using real-world dataset, it is possible to pass any statistical test. That leaves us with a Bayesian problem: the vast majority of researchers don't commit fraud, and the ones who do will (if they adapt their methods) be the least likely to fail the test. So the proposal actually leaves us with a ton of false positives and a plenty of false negatives as well. Certainly statistical methods are a handy thing to keep in our arsenal, but by itself I do not have great faith that it will detect fraud with the degree of confidence we need it to. It's not fit for routine, universal screening--at best it would keep a few legitimate papers unpublished without finding much actual fraud, and at worst honest researchers would find their careers ruined by false positives.

Does anyone have other ideas for routine, universal fraud screening? I have one: journals should validate the data collection process. The most devastating evidence against LaCour, in my view, is the fact that the survey firm he claimed to use never heard of him, did not have the ability to do the type of survey in question, and never employed the person LaCour claimed was his contact there. A simple phone call from the journal's editorial office could have spotted the fraud right a way. There were other clues related to the data collection: LaCour claimed to have paid for the survey firm with a grant that--as best I can tell--he was never awarded. Journals should make phone calls to grant benefactors, survey firms, and other third parties who would have knowledge about the data collection process for every single paper they accept. Submissions to journals should include not just a section describing what's in their data, but actual documentation of how they got their data, including independent references and their phone numbers.