GitHub - zamaniali1995/ml-pipeline: Our goal with this ML pipeline template is to create a user friendly utility to drastically speed up the development and implementation of a machine learning model for all sorts of various problems.

Machine Learning Pipeline Template

A useful template to enable simple and efficient machine learning projects
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project
Directory Information

App
Config
Pipeline Components
Src

Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contact

About The Project

While developing our machine learning pipeline template, we wanted to create an efficient and purposeful environment that could resolve many of the annoyances and issues we have faced in the past.

Our goal with this ML pipeline template is to create a user friendly utility to drastically speed up the development and implementation of a machine learning model for all sorts of various problems. Many of our past experiences with other templates or machine learning projects had left us hoping for a better working environment and a more efficient process.

This template enables fast experimentation, easy execution, and simple debugging for all components.

(back to top)

Directory Information

app/

.
├── run.py                        # Python remote server control
├── static
│   └── img
│       ├── example_image.jpg     # Example image for README
│       ├── iris_setosa.jpeg
│       ├── iris_versicolor.jpeg
│       └── iris_virginica.jpeg
└── templates
   ├── go.html
   └── master.html                 # Main html file for front end

The app component of the directory controls the front end flask service which produces the user-friendly environment for interacting with the model.

(back to top)

config/

.
├── config.yaml           # Main global configuration file 
├── data_acquisition      
│   └── config.yaml       # Data acquisition configuration 
├── data_processing
│   └── config.yaml       # Data processing configuration
├── model_training
│   └── config.yaml       # Model training configuration 
└── model_validation
   └── config.yaml       # Model validation configuration

The config component of the directory is where the most controls reside for this pipeline template. There is a config file for each of the main sections:

Data Acquisition
Data Processing
Model Training
Model Validation

There is also an additional configuration file for general settings that relate to each of these different sections and are shared.

The configuration files are intended to be the primary point of access and control for this pipeline. Any changes or utility additions should be controlled from their corresponding configuration file in order to keep an organized and properly modularized codebase.

(back to top)

pipeline_components/

.
├── 1_data_acquisition
│   └── main.py         # Main file for data acquisition step
├── 2_data_processing
│   └── main.py         # Main file for data processing step
├── 3_model_training
│   └── main.py         # Main file for model training step
├── 4_model_validation
│   └── main.py         # Main file for model validation step
└── 5_model_registration
   └── main.py         # Main file for model registration step (Optional)

The pipeline_components folder in the directory is the host to the main files for each step in the pipeline flow. Here, there exist only main files for each step (ordered numerically to represent the order of runtime). These main files should not be altered unless required to implement an additional utility function, or some other task. Changes made to this pipeline should remain within the utility functions in the /src/ directory and in the configuration files.

(back to top)

src/

.
├── __init__.py   
├── data
│   ├── __init__.py
│   ├── acquisition     
│   │   ├── __init__.py
│   │   └── utils.py    # Data acquisition utility functions
│   ├── processing
│   │   └── utils.py    # Data processing utility functions
│   └── utils.py        # General utility functions related to data
├── model
│   ├── __init__.py
│   ├── training
│   │   ├── __init__.py
│   │   └── utils.py    # Data model training utility functions
│   └── utils.py        # General utility functions related to models
└── utils.py            # Main general utility functions

The src component of the directory is the core of our pipelines functionality. This directory stores the utility functions for each of the pipeline steps. When running the pipeline, these utility functions will be built as a package and can be imported and used in the main functions during runtime.

(back to top)

Built With

(back to top)

Getting Started

Here we will describe the necessary actions and steps that should be followed in order to run this pipeline.

Prerequisites

There are only a couple of prerequisite steps required to run this pipeline. The first of which is to have Conda / Anaconda installed and the second is to be able to utilize MakeFiles.

Installation

Below is an example of how you can instruct your audience on installing and setting up your app. This template doesn't rely on any external dependencies or services.

Clone the repo

git clone https://github.com/zamaniali1995/ml-pipeline.git

Setup the conda environment using MakeFile
```
make create-env
```
Activate the newly created conda environment
```
conda activate ml-env
```
Create package
```
create-package
```

(back to top)

Usage

Here we will describe how to use this ML pipeline template, as well as how to run each component and build the front end display at the end.

This pipeline was designed so that configuration files are the primary means of controlling and altering the pipeline. These configuration files control the paths to the data, what kind of data processing to perform, how to split the training and testing data, which models to train, the range of potential hyper parameters to search through, which evaluation methods to use on the models, and many other similar selections.

These configuration files allow for changes to be made in one place, not requiring someone to dig through code and alter each place where some variable could exist.

If there is a desire to implement some additional processing method or some specific functionality for a given dataset, we have created a simple process to add utility functions that can be used and connected with the configuration files easily.

Running Each Step

To validate that our template is working, we have included a sample dataset which can be used to run each component of the pipeline and which will produce a useable front-end local server. If everything is working as intended, the following steps should be able to produce a functioning predictor.

Acquire the data
```
make acquire-data
```
Process the data
```
make process-data
```
Train the model
```
make train-model
```
Evaluate the model
```
make evaluate-model
```
Generate the local Flask front-end
```
make run-server
```

Access the local Flask server

http://localhost:3001/

or

http://0.0.0.0:3001/

(back to top)

Roadmap

Develop Base Pipeline Template
Implement Example Dataset and Functional Front-End
Add additional data processing utility functions to be available for use.
Implement Test-Cases to be used for validation of the different pipeline steps
Run the whole pipeline with one single command like run-pipeline
Add more ways to load data
- AWS
- Google Cloud
- Microsoft Azure

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contact

Ali Zamani - LinkedIn - [email protected]

Jacob Mish - LinkedIn - [email protected]

Project Link: https://github.com/zamaniali1995/ml-pipeline

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
EDA		EDA
app		app
config		config
pipeline_components		pipeline_components
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pydocstyle		.pydocstyle
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docstring.mustache		docstring.mustache
environment.yaml		environment.yaml
flake8.ini		flake8.ini
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

License

zamaniali1995/ml-pipeline

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Pipeline Template

About The Project

Directory Information

app/

config/

pipeline_components/

src/

Built With

Getting Started

Prerequisites

Installation

Usage

Running Each Step

Roadmap

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages