The Reproducible R Toolkit provides tools to ensure the results of R code are repeatable over time, by anyone. Most R scripts rely on packages, but new versions of packages are released daily. To ensure that results from R are reproducible, it’s important to run R scripts using exactly the same package version in use when the script was written.
The Reproducible R Toolkit provides an R function checkpoint, which ensures that all of the necessary R packages are installed with the correct version. This makes it easy to reproduce your results at a later date or on another system, and makes it easier to share your code with the confidence that others will get the same results you did.
The Reproducible R Toolkit also works in conjunction with the “checkpoint-server”, which makes a daily copy of all CRAN packages, to guarantee that every package version is available to all R developers thereby ensuring reproducibility.
RRT is a collection of R packages and the checkpoint server that together make your work with R packages more reproducible over time by anyone.
To achieve reproducibility, daily snapshots of CRAN are stored on our
checkpoint server. At midnight UTC each day, we refresh our mirror of CRAN is refreshed. When the rsync
process is complete, the checkpoint server takes and stores a snapshot
of the CRAN mirror as it was at that very moment. These daily snapshots
can then be accessed on the MRAN website or using
the checkpoint
package, which installs and consistently use
these packages just as they existed at midnight UTC on a specified
snapshot date. Daily snapshots are available as far back as
2014-09-17
. For more information, visit the checkpoint
server GitHub site.
The goal of the checkpoint
package is to solve the
problem of package reproducibility in R. Since packages get updated on
CRAN all the time, it can be difficult to recreate an environment where
all your packages are consistent with some earlier state. To solve this
issue, checkpoint
allows you to install packages locally as
they existed on a specific date from the corresponding snapshot (stored
on the checkpoint server) and it configures your R session to use only
these packages. Together, the checkpoint
package and the
checkpoint server act as a CRAN time machine so that anyone using
checkpoint()
can ensure the reproducibility of their
scripts or projects at any time.
The checkpoint
package has 3 main functions.
The create_checkpoint
function
~/.checkpoint
by default, but you can
change its location.library()
and
require()
calls, as well as the namespacing operators
::
and :::
.The use_checkpoint
function
options(repos)
create_checkpoint
, i.e. modifies
.libPaths()
This means the remainder of your script will run with the packages from your specified date.
Finally, the checkpoint
function serves as a unified
interface to create_checkpoint
and
use_checkpoint
. It looks for a pre-existing checkpoint
directory, and if not found, creates it with
create_checkpoint
. It then calls
use_checkpoint
to put the checkpoint into use.
To reset your session to the way it was before checkpointing, call
uncheckpoint()
. Alternatively, you can simply restart
R.
To update an existing checkpoint, for example if you need new
packages installed, call create_checkpoint()
again. Any
existing packages will remain untouched.
The functions delete_checkpoint()
and
delete_all_checkpoints()
allow you to remove checkpoint
directories that are no longer required. They check that the
checkpoint(s) in question are not actually in use before deleting.
Each time create_checkpoint()
is run, it saves a series
of json files in the main checkpoint directory. These are outputs from
the pkgdepends
package, which checkpoint
uses
to perform the actual package installation, and can help you debug any
problems that may occur.
<date>_<time>_refs.json
: Packages to be
installed into the checkpoint<date>_<time>_config.json
: Configuration
parameters for the checkpoint<date>_<time>_resolution.json
: Dependency
resolution result<date>_<time>_solution.json
: Solution to
package dependencies<date>_<time>_downloads.json
: Download
result<date>_<time>_install_plan.json
: Package
install plan<date>_<time>_installs.json
: Final
installation resultFor more information, see the help for
pkgdepends::pkg_installation_proposal
.
First, create a new folder and change your working directory to this folder. If you use an IDE like RStudio, this is identical to creating a new RStudio project. Otherwise, or alternatively, you can do it in code:
Next, add a script to the project folder, adding the snippet
mentioned above to the top. Here’s a simple example, using the
darts
package. Save this file in the
~/temp_project
folder as script1.R
.
library(checkpoint)
checkpoint("2020-01-01")
# Example from ?darts
library(darts)
x <- c(12,16,19,3,17,1,25,19,17,50,18,1,3,17,2,2,13,18,16,2,25,5,5,
1,5,4,17,25,25,50,3,7,17,17,3,3,3,7,11,10,25,1,19,15,4,1,5,12,17,16,
50,20,20,20,25,50,2,17,3,20,20,20,5,1,18,15,2,3,25,12,9,3,3,19,16,20,
5,5,1,4,15,16,5,20,16,2,25,6,12,25,11,25,7,2,5,19,17,17,2,12)
mod <- simpleEM(x, niter=100)
e <- simpleExpScores(mod$s.final)
oldpar <- par(mfrow=c(1, 2))
drawHeatmap(e)
drawBoard(new <- TRUE)
drawAimSpot(e, cex=5)
par(oldpar)
When you run this script, checkpoint
will create the
checkpoint by scanning your project for packages, and then downloading
them from the MRAN snapshot for 2020-01-01
. As this
happens, you should see a number of messages appear in your R window
that tell how the installation is proceeding. It then sets the library
search path and CRAN mirror for your session, to point to the local
checkpoint directory and MRAN snapshot respectively.
After running the above script, you can verify that the checkpointing has succeeded:
## CRAN
## "https://mran.microsoft.com/snapshot/2020-01-01"
## [1] "C:/Users/hongo/Documents/.checkpoint/2020-01-01/lib/x86_64-w64-mingw32/3.6.2"
## [2] "C:/Program Files/R/R-3.6.2/library"
## [1] "darts"