Skip to content

Opinionated Devops for R Data Products Strictly Without Magic

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

maxheld83/muggle

Repository files navigation

muggle

Main Codecov test coverage CRAN status Lifecycle: experimental muggle-buildtime Docker Pulls muggle-runtime Docker Pulls

Overview

Reproducible DevOps Strictly Without Magic

{muggle} is an R package to implement DevOps best practices for R data products.

Data products are somewhere between one-off scripts and CRAN-bound packages. One the one hand, they are more complex than simple scripts and take longer to build. To ensure value for their creators, such data products must be reproducible and better scale to accomodate more users or developers. On the other hand, data products face fewer requirements than mainstream packages: They can have more dependencies and they need not run in different computing environments.

{muggle} addresses the needs of such often domain-specific data products by implementing these priorities:

  1. No Magic (hence the name) {muggle} will never infer intent, but users have to state it explicitly. For example, dependencies are never inferred from a project source files, but have to be listed in the DESCRIPTION file. This exposes more technical underpinnings, but makes it easier to reason about a project if things go wrong.

  2. Every R Data Product is an R Package {muggle} organises all data products as an R package. For example, a shiny app can be a function in R/ and a report can be an RMarkdown document in vignettes/. This adds minimal overhead, but also enforces various best practices and structures projects in a single, familiar way.

  3. One Image Rule them All {muggle} provides one fully versioned and reproducible compute environment as a docker image, including:

    • operating system,
    • R version,
    • system dependencies,
    • and R dependencies snapshotted by date (via RStudio Package Manager).

    Across

    • local development (via RStudio Server Open Source or vscode),
    • continuous integration / continuous delivery (CI/CD) scripts on GitHub Actions,
    • batch jobs on in high-performance computing clusters,
    • and even shiny apps or plumber web APIs, the project will run in the exact same computing environment. This reduces flexibility, but minimises the time wasted on "but-it-works-on-my-machine"-problems.
  4. Fast Iterations {muggle} speeds up development iterations as much as possible, using

  5. Only Humans git commit. {muggle} is designed so that human-edited sources files are under version control. Copy-pasted boilerplate and compiled assets are avoided as much as possible (with the exception of man/ so as to not break remotes::install_github()). This requires a bit more discipline, but enhances reproducibility and cleans up git diffs.

  6. R Packages are for Code, not Data. {muggle} does not ship data with a package, but only wrapper functions which either call databases or git lfs storage. Only small or unchanging datasets can be stored inside packages.

Getting Started

make Targets

All DevOps-steps are stored as make targets. To see all targets, simply run make.