Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit Airflow metrics to support analysing Cosmos performance #991

Open
1 task
tatiana opened this issue May 21, 2024 · 2 comments
Open
1 task

Emit Airflow metrics to support analysing Cosmos performance #991

tatiana opened this issue May 21, 2024 · 2 comments
Assignees
Labels
area:performance Related to performance, like memory usage, CPU usage, speed, etc area:rendering Related to rendering, like Jinja, Airflow tasks, etc parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing
Milestone

Comments

@tatiana
Copy link
Collaborator

tatiana commented May 21, 2024

Context

We want more visibility on how much Cosmos spends while parsing the dbt project and building the Airflow DAG.

We'd like to leverage Airflow Metrics collection system by using:

Stats.timer("ol.emit.attempts")

To collect the following metrics:

  • cosmos.load_method_custom.duration: time taken to run DbtGraph.load_via_custom_parser
  • cosmos.load_method_dbt_ls.duration: time taken to run DbtGraph.load_via_dbt_ls
  • cosmos.load_method_dbt_ls_file.duration: time taken to run DbtGraph.load_via_dbt_ls_file
  • cosmos.load_method_manifest.duration: time taken to run DbtGraph.load_from_dbt_manifest
  • cosmos.convert_to_airflow.duration: time taken to run `build_airflow_graph``
  • cosmos.dag_init.duration: time taken to initialise the Airflow DAG
  • cosmos.dag_new.duration: time taken to create the Airflow DAG
  • cosmos.task_group_init.duration: time taken to initialise the Airflow DAG (__init__)
  • cosmos.task_group_new.duration: time taken to create the Airflow DAG (__new__)

Relevant parts of the code:

LoadMode.CUSTOM: self.load_via_custom_parser,
LoadMode.DBT_LS: self.load_via_dbt_ls,
LoadMode.DBT_LS_FILE: self.load_via_dbt_ls_file,
LoadMode.DBT_MANIFEST: self.load_from_dbt_manifest,

def build_airflow_graph(

https://github.com/astronomer/astronomer-cosmos/blob/main/cosmos/airflow/dag.py
https://github.com/astronomer/astronomer-cosmos/blob/main/cosmos/airflow/task_group.py

Acceptance criteria

  • All these metrics are sent to statsd when running Cosmos DAGs, when Airflow is configured to do so
@tatiana tatiana added this to the Cosmos 1.5.0 milestone May 21, 2024
@tatiana tatiana added area:performance Related to performance, like memory usage, CPU usage, speed, etc area:rendering Related to rendering, like Jinja, Airflow tasks, etc labels May 21, 2024
@dosubot dosubot bot added parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing labels May 21, 2024
@tatiana tatiana changed the title Create Airflow metrics to evaluate Cosmos performance Emit Airflow metrics to support analysing Cosmos performance May 21, 2024
@dwreeves
Copy link
Collaborator

A few questions:

  • Is it possible to inject the dag_id and task_group_id into the metric names, when appropriate?
    • DbtDags don't create task groups right? In that case it may be necessary to do something like replace task_group_id with self or dag or something like that, so the metric naming is a little more consistent.
  • You have it as cosmos.load_method_custom.duration, cosmos.load_method_dbt_ls.duration, etc. but would it make sense to do something more like cosmos.load_graph.duration or cosmos.graph.{dag_id}.{task_group_id}.duration? My thinking is:

@tatiana tatiana self-assigned this Jun 3, 2024
@tatiana
Copy link
Collaborator Author

tatiana commented Jun 6, 2024

Hey, @dwreeves, these are very valid points.

I'm improving the logs on a per DAG/TaskGroup as part of #1014 (e.g., https://github.com/astronomer/astronomer-cosmos/pull/1014/files#diff-61b585fb903927b6868b9626c95e0ec47e3818eb477d795ebd13b0276d4fd76cR293). This will probably be switched to DEBUG and be further improved, but this would help to address the granularity your suggestion. I'll probably create a PR only for this :)

The goal with having the metrics proposed in this PR is to really have a "group" that helps to have an overview of the health of these numbers across multiple DAGs - and help spot overall if any of these metrics are looking more troublesome than others. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:performance Related to performance, like memory usage, CPU usage, speed, etc area:rendering Related to rendering, like Jinja, Airflow tasks, etc parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing
Projects
None yet
Development

No branches or pull requests

2 participants