- ensure Install Docker Compose v1.29.1 and newer on your workstation.
in ubuntu
sudo apt-get update
sudo apt-get install docker-compose-plugin
- echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
- sudo docker compose up airflow-init
-
sudo docker compose up
-
signin page - type airflow and airflow
docker/airflow/fileinput/input_dag.json
replace myip with your ip in 3human....py by running
#in ubuntu
hostname -I
ensure you run both docker composes
#using docker-compose version 1.27.4,
sudo docker-compose -f docker-compose19.yml build
sudo docker-compose -f docker-compose19.yml up
sudo docker compose up
....19.yml starts API and mongo normal ynl starts airflow and rabbit
after it is triggered.
you will see that pending runs infinitely
it is waiting for human
so copy the jobid from xcom tab of mongo_task
then call api by either visiting http://localhost:60001/docs
or
call
curl -X 'GET' \
'http://localhost:60001/sethumaninput/22-0530-1314?status=false' \
-H 'accept: application/json'
depending on which (status) accurate or inaccurate flow is called
ENDPOINT_URL="http://localhost:8080/"
curl -X GET \
--user "airflow:airflow" \
"${ENDPOINT_URL}/api/v1/pools"
Run the 4th dag
visit graph and get details as highlighted
call the following endpoint and mark as success (i.e emulated human action)
ENDPOINT_URL="http://localhost:8080/"
curl -X POST \
--user "airflow:airflow" \
-H 'Content-Type: application/json' \
"${ENDPOINT_URL}api/v1/dags/4human_non_polling/updateTaskInstancesState" \
-d "{ \"dry_run\": false, \"task_id\": \"manual_sign_off\", \
\"include_upstream\": false, \"include_downstream\": false, \"include_future\": false, \"include_past\": false, \
\"dag_run_id\": \"manual__2022-05-31T13:01:13.894084+00:00\", \
\"new_state\": \"success\" }"
then the next step will execute.
concept is derived from link
- add aws conn string like this
, ref
- I created a file s3://git-experiments/airflow-exp-p5-file-list.json with contents
["plate_mh_p1.jpg", "rec_part2_p2.jpg", "survey_like_p2.png", "rec_appliance_p2.png", "rec_walmart_p1.jpg", "tax_us_p1.png"]
- elyra on this link mentions that 'Apache Airflow version 2.x is currently not supported.'
- cwl docs also talk about airflow 1.10 ui, pip install of cwl-airflow in 2.x also failed
- so will stop attempts on 2.x and try on 1.x
-
Using Spiff we can run user based workflow and standard bpmn diagrams. Example project and somehow merge with airflow
-
to view bpm https://demo.bpmn.io/
-
to run
859 git clone https://github.com/sartography/spiff-example-cli
860 cd spiff-example-cli/
861 sudo docker build . -t spiff_example_cli
862 docker run -it --rm spiff_example_cli
- make json work
- wait for human action though ui and then proceed with next task
- Generate dag using drawing
- drawing rules (priority codes, processs docments as per priority)
- parallel flow and merge (one that waits for manual)
approve reject alternate flow
- queue - jira or library
docker compose down --volumes --rmi all
using docker-compose version 1.27.4,
- Productionalizing Data Pipelines with Apache Airflow by Axel Sirota | Pluralsight
- git study
- express youtube guide, whose git is
git productionalizing-data-pipelines-airflow
sudo docker-compose -f docker-compose19.yml build
sudo docker-compose -f docker-compose19.yml up
- Hello world
- with hello_world.py
- docker-compose up
- view UI
- Follow steps 1-6 to trigger the run and view logs
- drop files into ./docker/airflow/fileinput folder to see fresh 30seconds polled based filesensor
- Create http connection (optional - if not done - right now python code is emulating it via airflow add connection command) Admin>connections>add new and enter params like this image
- after starting http_rabbit dag
- to test mongo insert run
sudo docker exec -it airflow-experiments_mongo_1 sh
# mongosh
test> show collections
postcollection
test> db.postcollection.count()
test> db.postcollection.find({}).take(1)
- docs for 1.9/1.10 https://airflow.apache.org/docs/apache-airflow/1.10.3/api.html#endpoints
- test http://localhost:8080/api/experimental/test from webbrowser
- get latest run
http://localhost:8080/api/experimental/latest_runs
- trigger dag run using api
curl -X POST \
http://localhost:8080/api/experimental/dags/hello_world/dag_runs \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{"conf":"{\"key\":\"value\"}"}'
because for a few plugins - example rabbitmq - we run into operator incomptibility
- rabbit
- http - pip install 'apache-airflow-providers-http', sample dag, blog, example rest apis that we could call
- mongo - pip install 'apache-airflow-providers-mongo' - connection troubleshooting, sample dag
- first dag
- parallel dag
- files detector link1, troubleshooting
- branch
- Stopping a dag: You can set the running tasks to failed with Set a state of task instances , Alternately If you want the DAG to not schedule any more tasks you can set it to Pause with Update a DAG
- trigger via API blog, start api blog
- create dag via UI discussion, using cwl we can describe workflow in yml syntax
- Convert json to Apache Dag: one blog by geek culture
- share data between tasks in Airflow, eg from airflow docs, Push and Pull same ID from several operator