Skip to content

Latest commit

 

History

History
30 lines (25 loc) · 2.48 KB

paper.md

File metadata and controls

30 lines (25 loc) · 2.48 KB
title tags authors affiliations date bibliography
The drake R package: a pipeline toolkit for reproducibility and high-performance computing
R
reproducibility
high-performance computing
pipeline
workflow
Make
name orcid email affiliation
William Michael Landau
0000-0003-1878-3253
1
name index
Eli Lilly and Company
1
4 January 2018
paper.bib

Summary

The drake R package [@drake] is a workflow manager and computational engine for data science projects. Its primary objective is to keep results up to date with the underlying code and data. When it runs a project, drake detects any pre-existing output and refreshes the pieces that are outdated or missing. Not every runthrough starts from scratch, and the final answers are reproducible. With a user-friendly R-focused interface, comprehensive documentation, and extensive implicit parallel computing support, drake surpasses the analogous functionality in similar tools such as Make [@Make], remake [@remake], memoise [@memoise], and knitr [@knitr].

In reproducible research, drake's role is to provide tangible evidence that a project's results are re-creatable. drake quickly detects when the code, data, and output are synchronized. In other words, drake helps determine if the starting materials would produce the expected output if the project were to start over and run from scratch. This approach decreases the time and effort it takes to evaluate research projects for reproducibility.

Regarding high-performance computing, drake interfaces with a variety of technologies and scheduling algorithms to deploy the steps of a data analysis project. Here, the parallel computing is implicit. In other words, drake constructs the directed acyclic network of the workflow and determines which steps can run simultaneously and which need to wait for dependencies. This automation eases the cognitive and computational burdens on the user, enhancing the readability of code and thus reproducibility.

References