Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Unintended Consequences of Trying to Replicate Research (slate.com)
60 points by tokenadult on April 24, 2016 | hide | past | favorite | 13 comments


Doesn't wash. If replication attempts are subject to publication bias, we are still better off because the subsequent meta-analyses overcome sampling error better and are able to show publication bias using funnel plots or other techniques like p-uniform, and sometimes can even correct the bias to give you a less biased and more accurate estimate. In contrast, you cannot show publication bias for a particular novel result nor can you easily correct for it. So given the choice between 5 studies on a single hypothesis (all afflicted by publication bias) and 5 studies on 5 hypotheses (with publication bias), you are better off in the former scenario. (In the latter scenario, you could try to use informative priors estimated from field-wide demonstrations of publication bias, but at least so far, this is an extremely unpopular approach.)


Yes, a much better headline would be "The value of study replications is positive but suppressed without some combination of pre-registration and result-blind refereeing".


Pre-registration doesn't help because it simply gives the author's a narrative to fake. As pointed out in the article, result-blind refereeing is officially claimed by most journals, although from the rejection I have received this does not appear to be the case in practice.


> Pre-registration doesn't help because it simply gives the author's a narrative to fake.

Like timroy, I don't know what your objection to pre-registration is. I'm definitely talking about pre-registering the analysis, not just the data collection.

> As pointed out in the article, result-blind refereeing is officially claimed by most journals,

Did I miss something? That doesn't seem to be what the article is saying:

> One proposed remedy is to modify the peer review process so that reviewers grade manuscripts on the quality of their introductions and methods sections rather than the novelty of the findings. Similarly, journals could assess submissions based solely on the rigor of their methods—as does Plos One.

PLoS One is an anomaly. This is not normal.


Can you explain what you mean by "it simply gives the author's a narrative to fake"?

At the very least, pre-registration helps guard against the "file-drawer" effect, where a negative result simply goes in the trash, and no one ever knows it occurred.

If you mean that pre-registration will not prevent jiggling the results until the researcher gets a significant p-value, then I agree that many forms of pre-registration which lack registration of the analysis and so on will allow the authors to fake the narrative.


Tangential point: Because of frictions and biases, the need for high-power and replicated research is even larger if you want to translate the science into real-world action. I highly recommend this post by Holden Karnofsky of GiveWell (and formerly of Bridgewater):

http://blog.givewell.org/2016/01/19/the-importance-of-gold-s...

GiveWell is an organization that tries to estimate the impact per philanthropic dollar of various charities, and they partially rely on controlled trials of health interventions in the developing world.

> Chris Blattman worries that there is too much of a tendency toward large, expensive, perfectionist studies, writing: "...each study is like a lamp post. We might want to have a few smaller lamp posts illuminating our path, rather than the world’s largest and most awesome lamp post illuminating just one spot. I worried that our striving for perfect, overachieving studies could make our world darker on average."

> My feeling – shared by most of the staff I’ve discussed this with – is that the trend toward “perfect, overachieving studies” is a good thing...

> Bottom line. Under the status quo, I get very little value out of literatures that have large numbers of flawed studies – because I tend to suspect the flaws of running in the same direction. On a given research question, I tend to base my view on the very best, most expensive, most “perfectionist” studies, because I expect these studies to be the most fair and the most scrutinized, and I think focusing on them leaves me in better position than trying to understand all the subtleties of a large number of flawed studies.

> If there were more diversity of research methods, I’d worry less about pervasive and correlated selection bias. If I trusted academics to be unbiased, I would feel better about looking at the overall picture presented by a large number of imperfect studies. If I had the time to understand all the nuances of every study, I’d be able to make more use of large and flawed literatures. And if all of these issues were less concerning to me, I’d be more interested in moving beyond a focus on internal validity to broader investigations of external validity. But as things are, I tend to get more value out of the 1-5 best studies on a subject than out of all others combined, and I wish that perfectionist approaches were much more dominant than they currently are.


Seems like replication attempts should have reverse publication bias by default - they are much more interesting when they contradict the original paper.


I really look forward to the day (because it seems it must come) when the whole publication system is restructured into an open system. I have no idea what it will look like (I.e. how merit will be calculated in such a system), but for the reasons mentioned in this article and several others, I think it will be a glorious day for science!


At the heart of the problem is that "replication" experiment have the same "novelty" or "sensationalist" demands required of regular research papers.

It's like the difficulty in creating "anti-art" and finding it exhibited in a museum.

We need to get rid of this format and focus on deliverables, and more modern forms of discourse such a collective knowledge curating like wiki pages, git repositories, or patents.


I was thinking that this might cause some sort of self correcting mechanism to come into place. If a study is useful, and others want to build upon it, it will need to be replicated.

If on the other hand its just a novelty study that doesn't lead to any further studies (and happens to be flawed), it will be forgotten about.

Curious if other think this would be the case?


An overaggressive pattern recognizer coupled with an overaggressive double-checker (something that pattern matches for incorrect results) is still a stronger system than the overaggressive pattern matcher alone. And depending on the shape of the data, it can be far more effective than a single statistically correct pattern recognizer. There's an interesting bit of math here that I'm not sure has been laid out in any accessible fashion yet, but I'd have to check around a bit.

The human brain more or less works because of this - it overtrains on every signal it takes in, recognizing patterns that it has no statistical justification to pick out. But since it's simultaneously (over)training the process of rejecting those false patterns, it all works out okay, and it's actually far more effective than if it only made "proper" inferences from the data coming in, at least in the world we live in.


> There's an interesting bit of math here that I'm not sure has been laid out in any accessible fashion yet, but I'd have to check around a bit.

I'd love it if you would.


This is more of a problem if you use a weak definition of replication such as: "statistically significant effect in the same direction". If instead the effect sizes need to be similar to each other (+/- some uncertainty), then a correspondence between the multiple results means much more.

Of course, if people will just game this system to only publish results that fit whatever criteria then you will get a biased literature. Requiring a more precise range of results to qualify just makes it more difficult.

I don't see how people gaming the system is a consequence of replicating each others results though. That seems like it is due to deeper cultural problems.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: