Data Science Cookie Cutter

Note: This template uses poetry. If you prefer using pip, go to the pip branch instead.

What is this?

This repository is a template for a data science project. This is the project structure I frequently use for my data science project.

Tools used in this project

Poetry: Dependency management - article
hydra: Manage configuration files - article
pre-commit plugins: Automate code reviewing formatting - article
DVC: Data version control - article
pdoc: Automatically create an API documentation for your project

Project Structure

.
├── config                      
│   ├── main.yaml                   # Main configuration file
│   ├── model                       # Configurations for training model
│   │   ├── model1.yaml             # First variation of parameters to train model
│   │   └── model2.yaml             # Second variation of parameters to train model
│   └── process                     # Configurations for processing data
│       ├── process1.yaml           # First variation of parameters to process data
│       └── process2.yaml           # Second variation of parameters to process data
├── data            
│   ├── final                       # data after training the model
│   ├── processed                   # data after processing
│   ├── raw                         # raw data
│   └── raw.dvc                     # DVC file of data/raw
├── docs                            # documentation for your project
├── dvc.yaml                        # DVC pipeline
├── .flake8                         # configuration for flake8 - a Python formatter tool
├── .gitignore                      # ignore files that cannot commit to Git
├── Makefile                        # store useful commands to set up the environment
├── models                          # store models
├── notebooks                       # store notebooks
├── .pre-commit-config.yaml         # configurations for pre-commit
├── pyproject.toml                  # dependencies for poetry
├── README.md                       # describe your project
├── src                             # store source code
│   ├── __init__.py                 # make src a Python module 
│   ├── process.py                  # process data before training model
│   └── train_model.py              # train model
└── tests                           # store tests
    ├── __init__.py                 # make tests a Python module 
    ├── test_process.py             # test functions for process.py
    └── test_train_model.py         # test functions for train_model.py

How to use this project

Install Cookiecutter:

pip install cookiecutter

Create a project based on the template:

cookiecutter https://github.com/khuyentran1401/data-science-template

Find detailed explanation of this template here.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
{{cookiecutter.directory_name}}		{{cookiecutter.directory_name}}
README.md		README.md
cookiecutter.json		cookiecutter.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

{{cookiecutter.directory_name}}

{{cookiecutter.directory_name}}

README.md

README.md

cookiecutter.json

cookiecutter.json

Repository files navigation

Data Science Cookie Cutter

What is this?

Tools used in this project

Project Structure

How to use this project

About

Releases

Packages

Languages

PantherML/data-science-template

Folders and files

Latest commit

History

Repository files navigation

Data Science Cookie Cutter

What is this?

Tools used in this project

Project Structure

How to use this project

About

Resources

Stars

Watchers

Forks

Languages