Skip to content

Latest commit

 

History

History
143 lines (99 loc) · 5.96 KB

exercise_1.md

File metadata and controls

143 lines (99 loc) · 5.96 KB

EX 1. Preconditions: Variables, Tables and Connection 🔝

ex1

Now that both Scheduler and Webserver are running, we can start getting familiar with the Airflow User Interface at http://localhost:8080 and create the preconditions for orchestrating our ML pipelines.

1.1 Add Variables

Let's start to customise Airflow adding two Variables that store the tablenames: training and prediction:

✅ go to Admin/Variables and create a new Key training_table with the value training.

✅ Do the same for prediction: add the Key prediction_table with the value prediction.

add variables

1.2 Create SQLite DB Connection

✅ Go to Admin/Connections and search the connection with Conn Id sqlite_default.

📌 The connections that you see are examples, are not in use.

✏️ Edit the Conn Id value from sqlite_default to sqlite_ml.

toggle dag

1.3 DAG Table Creation and Connection

Now that you have:

  • saved the variables with the table names, one for training and one for prediction
  • created the SQLite DB connection

let's activate the DAG create_ml_tables to let the Scheduler pick it up and create the Database and the tables in it.

✅ go to DAGS section and toggle ON the button of the create_ml_tables dag.

toggle dag

The Scheduler will pickup the DAG and it will run it (there aren't dependecies that prevent the execution).

Click on the DAG name create_ml_tables: we are now in the Graph View. With the Graph View you can visualise the task dependencies and the current status.

🕚 The create_ml_tables dag is running. Refresh the status clicking on the 🔁 REFRESH button.

create tables

✅ If you click on the Code button on the DAG menu, you can see (but not modify) the Python code:

dag code

📌 Note: for running the SQL that creates the tables and the index we instantiated the SqliteOperator that makes use of the sqlite_ml connection we have previously created.

create_training_table = SqliteOperator(
    task_id="create_training_table",
    # triple-quoted spans code on multiple lines
    sql=f"""
            CREATE TABLE IF NOT EXISTS {TRAINING_TABLE}(
            timestamp DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
            mape_test REAL,
            rmse_test REAL,
            days_in_test REAL
            );
        """,
    # 'sqlite_ml' is the connection created from the Connection View UI
    sqlite_conn_id="sqlite_ml"
    )

📌 Note: To define the execution order we use the bitshift operator >>. You can see it at the bottom of the code:

[create_training_table , create_prediction_table] >> create_prediction_index

the create_prediction_index task is executed only after create_training_table and create_prediction_table have been successfully executed.

❗ Let's go back to the DAG View: something bad happened! The task create_prediction_index is having some issues. The previous 2 tasks, create_training_table and create_prediction_table, finished in success.

retry index view log

Let's DEBUG using Airflow: click on the task create_prediction_index: it will open a new window.

retry index view log

Click on View Log, you can see the error message:

create index error log

✏️ We need to fix the bug in the code (we CAN'T do it in the Airflow UI).
Open with the editor the file /dags/create_ml_tables.py in the repository.
Go through the code and find the SQL where we created the index.

CREATE UNIQUE INDEX idx_date_to_predict
ON {PREDICTION_TABLE} (date_to_predict --!!! FIXME add a ) parenthesis
;

❗ Is missing the closing parenthesis ) after date_to_predict. Add it: date_to_predict).

✅ Save the file and go back to the Web UI, in the Graph View of the create_ml_tables dag.

Click again on the task create_prediction_index, but this time, on the open window, click on the Clear button and in the next window confirm the operation clicking on OK!.

retry index view log

📌 You are now resetting the task status. :clock11: Wait some seconds to let the scheduler pickup the task and re-run it.

🏆 Once all the tasks will have been executed and terminated in success, we'll have 2 tables: training and prediction.

success create tables

🏆 We have also created an index for the prediction table on the column date_to_predict: this will guarantee to save only one prediction per day that we want to predict.

Go to EX 2. Train the model.