Skip to content

An example Airflow Pipeline with Deeplake for Machine Learning

Notifications You must be signed in to change notification settings

NormanTrinh/airflow-deeplake-ml

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Airflow + Deeplake Example Pipeline for Machine Learning (Local Development)

This repository provides an example Apache Airflow pipeline and Deeplake for local development of machine learning projects. The pipeline demonstrates how to load dataset and parse into Deeplake.

Overview

In this example, we leverage Apache Airflow to automate the following steps:

  1. create_deep_lake_data: The pipeline retrieves images and annotations from a folder and then prepares them for Deeplake.

  2. show_example_in_deeplake: Show the image with the bounding box from Deeplake dataset

The first and second tasks are executed in separate Docker-containers. By deploying the first two tasks in separate containers, we ensure efficient resource utilization and maintain modularity in the pipeline's execution environment.

Usage

To use this pipeline for local development, follow the steps below:

  1. Ensure that your Docker Engine has sufficient memory allocated, as running the pipeline may require more memory in certain cases.

  2. Сhange path to your local repo in dags/test_dag.py. Replace "<absolute_path_to_your_airflow-ml_repo>/data" and "<absolute_path_to_your_airflow-ml_repo>/results" with your path.

  3. Before the first Airflow run, prepare the environment by executing the following steps:

    • If you are working on Linux, specify the AIRFLOW_UID by running the command:
    echo -e "AIRFLOW_UID=$(id -u)" > .env
    • Perform the database migration and create the initial user account by running the command:
    docker compose up airflow-init

    The created user account will have the login airflow and the password airflow.

  4. Start Airflow and build custom images to run tasks in Docker-containers:

    docker compose up --build
  5. Access the Airflow web interface in your browser at http://localhost:8080.

  6. Trigger the DAG convert_to_deeplake to initiate the pipeline execution.

  7. When you are finished working and want to clean up your environment, run:

    docker compose down --volumes --rmi all

To do

  • Create API manual in Airflow_API folder
  • Trigger dag with next execution time (e.g. Trigger in next 5 minutes, 1 hour, ...) completed
  • XCom in DockerOperator

About

An example Airflow Pipeline with Deeplake for Machine Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.5%
  • Dockerfile 1.8%
  • Shell 0.7%