Skip to content

Our goal with this ML pipeline template is to create a user friendly utility to drastically speed up the development and implementation of a machine learning model for all sorts of various problems.

License

Notifications You must be signed in to change notification settings

zamaniali1995/ml-pipeline

Repository files navigation


Machine Learning Pipeline Template

A useful template to enable simple and efficient machine learning projects
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Directory Information
  3. Built With
  4. Getting Started
  5. Usage
  6. Roadmap
  7. Contact

About The Project

Product Name Screen Shot

While developing our machine learning pipeline template, we wanted to create an efficient and purposeful environment that could resolve many of the annoyances and issues we have faced in the past.

Our goal with this ML pipeline template is to create a user friendly utility to drastically speed up the development and implementation of a machine learning model for all sorts of various problems. Many of our past experiences with other templates or machine learning projects had left us hoping for a better working environment and a more efficient process.

This template enables fast experimentation, easy execution, and simple debugging for all components.

(back to top)

Directory Information

app/

.
├── run.py                        # Python remote server control
├── static
│   └── img
│       ├── example_image.jpg     # Example image for README
│       ├── iris_setosa.jpeg
│       ├── iris_versicolor.jpeg
│       └── iris_virginica.jpeg
└── templates
   ├── go.html
   └── master.html                 # Main html file for front end

The app component of the directory controls the front end flask service which produces the user-friendly environment for interacting with the model.

(back to top)

config/

.
├── config.yaml           # Main global configuration file 
├── data_acquisition      
│   └── config.yaml       # Data acquisition configuration 
├── data_processing
│   └── config.yaml       # Data processing configuration
├── model_training
│   └── config.yaml       # Model training configuration 
└── model_validation
   └── config.yaml       # Model validation configuration

The config component of the directory is where the most controls reside for this pipeline template. There is a config file for each of the main sections:

  • Data Acquisition
  • Data Processing
  • Model Training
  • Model Validation

There is also an additional configuration file for general settings that relate to each of these different sections and are shared.

The configuration files are intended to be the primary point of access and control for this pipeline. Any changes or utility additions should be controlled from their corresponding configuration file in order to keep an organized and properly modularized codebase.

(back to top)

pipeline_components/

.
├── 1_data_acquisition
│   └── main.py         # Main file for data acquisition step
├── 2_data_processing
│   └── main.py         # Main file for data processing step
├── 3_model_training
│   └── main.py         # Main file for model training step
├── 4_model_validation
│   └── main.py         # Main file for model validation step
└── 5_model_registration
   └── main.py         # Main file for model registration step (Optional)

The pipeline_components folder in the directory is the host to the main files for each step in the pipeline flow. Here, there exist only main files for each step (ordered numerically to represent the order of runtime). These main files should not be altered unless required to implement an additional utility function, or some other task. Changes made to this pipeline should remain within the utility functions in the /src/ directory and in the configuration files.

(back to top)

src/

.
├── __init__.py   
├── data
│   ├── __init__.py
│   ├── acquisition     
│   │   ├── __init__.py
│   │   └── utils.py    # Data acquisition utility functions
│   ├── processing
│   │   └── utils.py    # Data processing utility functions
│   └── utils.py        # General utility functions related to data
├── model
│   ├── __init__.py
│   ├── training
│   │   ├── __init__.py
│   │   └── utils.py    # Data model training utility functions
│   └── utils.py        # General utility functions related to models
└── utils.py            # Main general utility functions

The src component of the directory is the core of our pipelines functionality. This directory stores the utility functions for each of the pipeline steps. When running the pipeline, these utility functions will be built as a package and can be imported and used in the main functions during runtime.

(back to top)

Built With

(back to top)

Getting Started

Here we will describe the necessary actions and steps that should be followed in order to run this pipeline.

Prerequisites

There are only a couple of prerequisite steps required to run this pipeline. The first of which is to have Conda / Anaconda installed and the second is to be able to utilize MakeFiles.

Installation

Below is an example of how you can instruct your audience on installing and setting up your app. This template doesn't rely on any external dependencies or services.

  1. Clone the repo

    git clone https://github.com/zamaniali1995/ml-pipeline.git
  2. Setup the conda environment using MakeFile

    make create-env
  3. Activate the newly created conda environment

    conda activate ml-env
  4. Create package

    create-package

(back to top)

Usage

Here we will describe how to use this ML pipeline template, as well as how to run each component and build the front end display at the end.

This pipeline was designed so that configuration files are the primary means of controlling and altering the pipeline. These configuration files control the paths to the data, what kind of data processing to perform, how to split the training and testing data, which models to train, the range of potential hyper parameters to search through, which evaluation methods to use on the models, and many other similar selections.

These configuration files allow for changes to be made in one place, not requiring someone to dig through code and alter each place where some variable could exist.

If there is a desire to implement some additional processing method or some specific functionality for a given dataset, we have created a simple process to add utility functions that can be used and connected with the configuration files easily.

Running Each Step

To validate that our template is working, we have included a sample dataset which can be used to run each component of the pipeline and which will produce a useable front-end local server. If everything is working as intended, the following steps should be able to produce a functioning predictor.

  1. Acquire the data

    make acquire-data
  2. Process the data

    make process-data
  3. Train the model

    make train-model
  4. Evaluate the model

    make evaluate-model
  5. Generate the local Flask front-end

    make run-server
  6. Access the local Flask server

    http://localhost:3001/
    

    or

    http://0.0.0.0:3001/
    

(back to top)

Roadmap

  • Develop Base Pipeline Template
  • Implement Example Dataset and Functional Front-End
  • Add additional data processing utility functions to be available for use.
  • Implement Test-Cases to be used for validation of the different pipeline steps
  • Run the whole pipeline with one single command like run-pipeline
  • Add more ways to load data
    • AWS
    • Google Cloud
    • Microsoft Azure

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contact

Ali Zamani - LinkedIn - [email protected]

Jacob Mish - LinkedIn - [email protected]

Project Link: https://github.com/zamaniali1995/ml-pipeline

(back to top)

About

Our goal with this ML pipeline template is to create a user friendly utility to drastically speed up the development and implementation of a machine learning model for all sorts of various problems.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published