Skip to content

materials to study and learn about principled data processing

License

Notifications You must be signed in to change notification settings

HRDAG/training-docs

Repository files navigation

training-docs

This repo contains materials to study and learn about principled data processing, including:

  • onboarding materials to help you get setup with the right tools and procedures
  • demo-tasks to walk-through how to use some of those tools in practice
  • templates to serve as outlines for routine files like Makefiles and scripts in python or R
  • checklists to refer to as you work and contribute to projects (updating a repo, writing a script, adding a new task)
  • languages and language-specific tips to consider when writing scripts (like scalability, missingness)
  • notebooks to walk-through various topics in context of a specific language (ie. set operations in python)

helpful repos

There are a few repos outside of this one that house various tools and/or guidance that may be useful.

  • sample-project
    • This is a dummy repo to test out git functionality like cloning, pushing, and pull requesting
  • resource-utils/faqs
    • There's a few help articles related to HRDAG workflow, in particular:
      1. data-hacking-on-server.md includes instructions for making ssh keys and running Jupyter notebooks
      2. safe-logout.txt instructions for safely disconnecting from eleanor and notebooks
      3. data-work-faq.txt questions we've asked ourselves enough to write down for others
  • resource-utils/notes
    • There's a useful document from a previous intern
      1. internship_notes_2016.md includes some walk-throughs, suggested tools, frequently used commands
  • gnutools
    • A useful guide to using GNU tools more effectively with examples
  • record-hash-comparisons
    • An introduction and overview of creating unique identifiers with hashes
  • form-extraction
    • A place for some common tools and code we use to extract info from different kinds of forms
  • tool-suite
    • A home for some tools related to performance improvements and benchmarking
  • dotfiles
    • An example all kinds of dotfiles you might want to explore and use in your working environment, like vimrc, bash_profile, zshrc, and gitconfig

more helpful topics

on vim:

on git:

on workflow:

other suggested reading

books

done.

About

materials to study and learn about principled data processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •