Next Watch: E2E MLOps Pipelines with Spark!

Prerequisites

Python
Conda or Venv
Docker

Installation and Quick Start

Clone the repo

git clone https://github.com/brnaguiar/mlops-next-watch.git

Create environment

make env

Activate conda env

source activate nwenv

Install requirements / dependencies and assets

make dependencies

Pull the datasets

make datasets

Configure containers and secrets

make init

Run Docker Compose

make run

Populate production Database with users

make users

Useful Service Endpoints

- Jupyter `http://localhost:8888`
- MLFlow `http://localhost:5000`
- Minio Console `http://localhost:9001`
- Airflow `http://localhost:8080`
- Streamlit Frontend `http://localhost:8501`
- FastAPI Backend` http://localhost:8000/`
- Grafana Dashboard `http://localhost:3000`
- Prometheus `http://localhost:9090`
- Pushgateway `http://localhost:9091`
- Spark UI `http://localhost:8081`

Architecture

Note: In "Monitoring and Analytics", it should be Grafana instead of Streamlit.

Project Organization

├── LICENSE
│
├── Makefile             <- Makefile with commands like `make env` or `make run`
│
├── README.md            <- The top-level README for developers using this project
│
├── data
│   ├── 01-external      <- Data from third party sources
│   ├── 01-raw           <- Data in a raw format
│   ├── 02-processed     <- The pre-processed data for modeling
│   └── 03-train         <- Splitted Pre-Processed data for model training
├── airflow
│   ├── dags             <- Airflow Dags
│   ├── logs             <- Airflow logging
│   ├── plugins          <- Airflow default directory for Plugins like Custom Operators, Sensors, etc... (however, we use the dir `include` in dags for this purpose)
│   └── config           <- Airflow Configurations and Settings
│
├── assets               <- Project assets like jar files used in Spark Sessions
│
├── models               <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks            <- Jupyter notebooks used in experimentation 
│
├── docker               <- Docker data and configurations
│
├── images               <- Project images
│
├── requirements.local   <- Required Site-Packages 
│                         
├── requirements.minimal <- Required Dist-Packages 
│                         
├── Makefile             <- File containing rules and dependencies to automate building processes
│
├── setup.py             <- Makes project pip installable (pip install -e .) so src can be imported 
│
├── src                  <- Source code for use in this project.
│   │
│   ├── collaborative    <- Source code for the collaborative recommendation strategy
│   │   └── models       <- Collaborative models
│   │   └── nodes        <- Data processing, validation, training, etc. functions (or nodes) that represent units of work.
│   │   └── pipelines    <- Collection of orquestrated data processing, validation, training, etc. nodes, arranged in a sequence or a directed acyclic graph (DAG)
│   │
│   ├── conf           <- Configuration files and parameters for the projects
│   │
│   ├── main.py        <- Main script, mostly to run pipelines
│   │
│   ├── scripts        <- Scripts, for instance, to create credentials files and populate databases
│   │
│   └── frontend       <- Source code for the Application Interface
│   │
│   └── utils          <- Project utils like Handlers and Controllers
│
└── tox.ini            <- Settings for flake8
│
└── pyproject.toml     <- Settings for the project, and tools like isort, black, pytest, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.dvc		.dvc
.github/workflows		.github/workflows
airflow		airflow
assets		assets
data		data
docker		docker
images		images
logs		logs
notebooks		notebooks
src		src
tests		tests
.dvcignore		.dvcignore
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
logging.config		logging.config
pyproject.toml		pyproject.toml
requirements.local		requirements.local
requirements.minimal		requirements.minimal
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

License

brnaguiar/mlops-next-watch

Folders and files

Latest commit

History

Repository files navigation

Next Watch: E2E MLOps Pipelines with Spark!

Prerequisites

Installation and Quick Start

Useful Service Endpoints

Architecture

Project Organization

UI Showcase

Streamlit Frontend App

MLflow UI

Minio UI

Airflow UI

Grafana UI

Prometheus UI

Prometheus Drift Detection Example

About

Topics

Resources

License

Stars

Watchers

Forks

Languages