Scientific Replication, Reproducibility, and R: An Example

Scientific replication, or verifying the results of an experiment by an independent researcher, is one of the pillars of the scientific method and is one of the strongest ways to strengthen a scientific claim. In one of the earliest examples of replication, Christiaan Huygens traveled to the the laboratory of Robert Boyle in 1663 in order to help Boyle and Robert Hooke replicate Huygens’s experiment related to the discovery of the vacuum¹:

It became clear that unless the phenomenon could be produced in England with one of the two pumps available, then no one in England would accept the claims Huygens had made, or his competence in working the pump.

In the modern era, many scientific fields are under criticism because a large number of their papers are not able to be replicated. As an example², the author studied 49 of the most highly regarded medical research studies over the previous 13 years:

Of 49 highly cited original clinical research studies, 45 claimed that the intervention was effective. Of these, 7 (16%) were contradicted by subsequent studies, 7 others (16%) had found effects that were stronger than those of subsequent studies, 20 (44%) were replicated, and 11 (24%) remained largely unchallenged.

In other words, 34 of these most highly regarded medical studies had been retested and 14 of these (41%) had their claims shown to be false or exaggerated.

Similarly, a recent issue of Social Psychology³ devoted an entire issue to scientific replicability within that field, and found a number of studies to be non-replicable.

And the state of scientific research in the field of education is so poor that almost no one even bothers to attempt to replicate another education study⁴:

Only 0.13 percent of education articles published in the field’s top 100 journals are replications, writes Matthew Makel, a gifted-education research specialist at Duke University, and Jonathan Plucker, a professor of educational psychology and cognitive science at Indiana University.

There are several spots in the research study chain where replicability can be addressed, but one of them regards reproducibility, or the ability to recreate the exact steps of a particular experiment. In other words, is the exact sequence of transformations and operations on the data able to be performed by the original researcher or even someone else at a later date? Often scientists do not keep detailed records of the exact steps they took to achieve a particular outcome, and this sloppiness results, in part, in studies that cannot be replicated.

Literate programming was pioneered by computer scientist Donald Knuth⁵ and involves embedding actual computer code with human-readable language in one machine-compilable, human-readable document.

Building on this, and in an attempt to help solve the reproducibility problem in scientific research, literate statistical programming combines all the scripts and code to operate on a dataset along with descriptive human-readable language detailing the content of the report.

This may seem like a subtle thing, but this idea is really powerful, for a variety of reasons:

At the push of a button, the entire report, including all the report text, as well as the outputs of the statistical code, including the generation of all figures, is immediately reproduced.
The resulting report is generated in (ideally) HTML, which can be immediately published onto a website so team members or anyone anywhere can read it.
No more cutting, resizing, and pasting of report figures and graphs is needed. No more wondering how a particular figure was generated or how to reproduce it. The code is all there.
The author has control over what code is visible versus hidden in the report, depending on the audience of the resulting document.
Anyone reading the document can see exactly what what done to get the results–anyone can reproduce a particular experiment.
This may also help encourage scientists to publish scientific reports to the Internet and thereby to interact more with the general public, which can only help society as a whole.

Just as an example, I used the programming language R along with the knitr package to write a report of a typical data analysis that is both human readable as well as machine compilable. The report is titled “Exploring Harmful Weather Events in the NOAA Storm Database”, and involves a study of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.

Here is the link to my report: “Exploring Harmful Weather Events in the NOAA Storm Database,” by Kendall Giles.

Note that this study was just an example data analysis–I did it mostly to demonstrate the use and power of literate (statistical) programming. The document was generated with one button click, and was published to the Internet with another.

Now anyone–including myself–can reproduce my study, can find errors within it, can suggest improvements, and can build on it.

Scientific reproducibility helps facilitate finding errors in research studies, helps disseminate knowledge to other scientists and to the general public, and helps solve the scientific replicability problem.

^{1. Shapin, S. and Schaffer, S. (1985). Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life. Princeton University Press, Princeton, NJ.}

^{2. Ioannidis JA. Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. JAMA. 2005;294(2):218-228.}

^{3. http://www.psycontent.com/content/l67413865317/?sortorder=asc.}

^{4. https://www.insidehighered.com/news/2014/08/14/almost-no-education-research-replicated-new-article-shows.}

^{5. Knuth, Donald E. (1984). “Literate Programming” (PDF). The Computer Journal (British Computer Society) 27 (2): 97–111.}