The benefits of replication studies in science seem obvious and intuitive. Yet they are not particularly prevalent nor encouraged. The typical reasoning is that there’s no value for being the second scientist or group to observe a result. Some1 take this to suggest that the current scientific publishing system is flawed and promotes papers with provocative results rather than technically sound methods. Journals like PLOS One that disregard perceived importance are the exception. There are, however, a number of advantages of the status quo.
The expectation of many journals for submitted papers to be in someway novel has three perhaps unintuitive benefits. First, it better develops scientific consensus on a given topic. Second, it limits the rate at which papers are published and, consequently, how many need to be read to keep up with the literature. Third, it can reduce biases that exist at the level of individual studies.
Building scientific consensus on a topic is like building a shelter in the woods. Because it’s science, it doesn’t come with a blueprint; the shelter’s appearance is unknown in advance. As each log or stick is laid down, it reduces the gaps and influences the subsequent logs and sticks that need to be added. A replication study equates to placing another log where one already sits. This will ultimately add strength to the shelter, but do we want strength in one spot when holes exist elsewhere? Yes, some sticks will not carry much weight or even ultimately break (studies are found to be insignificant or clearly flawed.) However, if there is no expectation for a stick to fill a gap, then the collective labour of the shelter builders is inefficient.
In this analogy2, peer review equates to qualified shelter builders standing in front of the shelter checking the proposed next stick for structural integrity and making sure the stick’s owners (the paper’s authors) have adequately considered the existing structure so they know where their specific stick is best placed to fill the most gaps and be supported by the most existing sticks.
The flaw in my analogy so far is that any shelter will presumably benefit from additional sticks. There’s never harm in putting down more sticks, regardless of whether they add much to the shelter. In other words, why discourage or reject technically correct papers that simply repeat existing findings?
Requiring some novelty in a paper forces authors to be much more aware of the literature. If we could simply undertake whatever study personally interests us, we might all end up with tunnel vision, or at least be much less inclined to spend time learning what others do. This would increase the quantity of literature at the expense of quality. There are already enough long-winded, mediocre, or incohesive studies being published.
The other argument against replication is that it can actually provide a false sense of security. Simply repeating an experiment can increase the statistical significance of a correlation without any improvement in understanding of the causation. Biases, confounding variables, or tacit assumptions in the original experiment will remain in the new experiment. For these reasons, I agree with Stroebe and Strack who encourage (at least within their field of psychology) conceptual replication: addressing the same fundamental or underlying mechanism with a different experiment or method. In my field of physical oceanography, this typically equates to checking whether phenomena can be observed or predicted by direct measurements, numerical simulations, laboratory studies, and mathematical theories.
An alternate name, but ultimately the same fundamental idea as conceptual replication, is triangulation. The metaphor invokes the trigonometric technique of using two simpler measurements to calculate a desired, but more difficult, measurement. Or as succinctly stated in a Nature Comment: Robust research needs many lines of evidence.
To conclude, I will note a curious suggestion I’ve come across in favour of replication studies: that such a study should be a required component of a PhD. It’s an appealing and practical suggestion. Indeed, as I write this, I find myself thinking it my be a good idea despite my somewhat contrary arguments above. There are already various requirements, why not make one a replication study that is related to the student’s thesis work? The replication study could even take the place of required coursework, so as to not make the whole thesis process take noticeably longer. Perhaps this really the best compromise between whether or not replication studies should be undertaken: everyone who gets a PhD gets a chance to do one replication study and, in doing so, discover what reproducible science really means. After that, it’s time to go help fill the gaps in the shelter.
Footnotes
1. Even John Oliver has chimed in on the issue. I highly recommended the linked episode and its reflection on the abuses of armchair science.
2. The analogy to shelter building can be stretched pretty far. Some studies are like thatching. They couldn’t exist without the existing structure and most people won’t notice the individual contribution, but they are worthwhile nonetheless. By contrast, review studies are like a piece of plywood. They are robust, being comprised of numerous wood grains (existing studies), and much more tempting to use to bolster the shelter, just like it’s tempting to cite a review article to support a claim rather than the paper with the original findings.