Skip to content
This repository has been archived by the owner on Sep 18, 2019. It is now read-only.

Latest commit

 

History

History
28 lines (21 loc) · 1.84 KB

bit002_tidying-lotr-data.md

File metadata and controls

28 lines (21 loc) · 1.84 KB
title output
Data Carpentry lesson on tidy data
html_document
toc toc_depth
true
4

This is a lesson on tidying data. Specifically, what to do when a conceptual variable is spread out over 2 or more variables in a data frame.

Data used: words spoken by characters of different races and gender in the Lord of the Rings movie trilogy

  • directory of this lesson in the Data Carpentry GitHub repo
  • 01-intro shows untidy and tidy data. Then we demonstrate how tidy data is more useful for analysis and visualization. Includes references, resources, and exercises.
  • 02-tidy shows how to tidy data, using gather() from the tidyr package. Includes references, resources, and exercises.
  • 03-tidy-bonus-content is not part of the lesson but may be useful as learners try to apply the principles of tidy data in more general settings. Includes links to packages used.

Learner-facing dependencies:

  • files in the tidy-data sub-directory of the Data Carpentry data directory
  • tidyr package (only true dependency)
  • ggplot2 is used for illustration but is not mission critical
  • dplyr and reshape2 are used in the bonus content

Instructor dependencies:

  • curl if you execute the code to grab the Lord of the Rings data used in examples from GitHub. Note that the files are also included in the datacarpentry/data/tidy-data directory, so data download is avoidable.
  • rmarkdown, knitr, and xtable if you want to compile the Rmd to md and html