New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use back_populates
instead of sqlalchemy.orm.backref
in bidirectional relationships
#39430
base: main
Are you sure you want to change the base?
Conversation
back_populates
instead of backref
in bidirectional relationshipsback_populates
instead of sqlalchemy.orm.backref
in bidirectional relationships
1a1e18d
to
168fc11
Compare
# This is never executed, but tricks static analyzers (PyDev, PyCharm,) | ||
# into knowing the types of these symbols, and what they contain. | ||
from airflow.models.dag import DAG # noqa: TCH004 | ||
from airflow.models.dataset import Dataset # noqa: TCH004 | ||
from airflow.models.xcom_arg import XComArg # noqa: TCH004 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of doing noqa
you can add these to __all__
instead. Not sure. That would be a lot better if it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already in __all__
, as result ruff complain that it should be moved outside of TYPE_CHECKING (TCH004) block because it unable to detect PEP 562 imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought as an option to add this rule into the in exclusion list into the pyproject.toml
, by the same way as we already done for airflow/models/__init__.py
Lines 346 to 348 in 242a169
"airflow/__init__.py" = ["F401"] | |
"airflow/models/__init__.py" = ["F401", "TCH004"] | |
"airflow/models/sqla_models.py" = ["F401"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already: #39750
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As soon as we merge #39750 I will drop commit which add this fix from this pr
1e76071
to
3a0db83
Compare
…l.dag_owner_links`
3a0db83
to
9ed927a
Compare
# This is never executed, but tricks static analyzers (PyDev, PyCharm,) | ||
# into knowing the types of these symbols, and what they contain. | ||
from airflow.models.dag import DAG # noqa: TCH004 | ||
from airflow.models.dataset import Dataset # noqa: TCH004 | ||
from airflow.models.xcom_arg import XComArg # noqa: TCH004 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done in a separate PR
|
||
Only makes sense for SchedulerJob and BackfillJob instances. | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this? Let's move it to the task_instances_enqueued. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This do not required anymore, you could achieve it by explicit bidirectional relationships expressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO
This construction looks more clear
primaryjoin="Job.id == foreign(DagRun.creating_job_id)"
Rather than this
primaryjoin=lambda: Job.id == foreign(_resolve_dagrun_model().creating_job_id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking about the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... my bad. It already moved into the L104
Line 104 in 9ed927a
dag_runs = relationship( # Only makes sense for SchedulerJob and BackfillJob instances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original comment stored information about two different relationships
"TaskInstances which have been enqueued by this Job." -> task_instances_enqueued
"Only makes sense for SchedulerJob and BackfillJob" -> dag_runs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some proofs
DAG
from __future__ import annotations
from datetime import datetime, timezone
from airflow.decorators import task
from airflow.models.dag import DAG
START_DATE = datetime(2024, 2, 1, tzinfo=timezone.utc)
with DAG("pr_39430", schedule="@once", tags=["pr", "39430", "bi-directional"], start_date=START_DATE):
@task
def do_nothing(): ...
do_nothing()
Run one time by scheduler and one manual run, by trigger DAG from the UI, results in Tables
Table job
column | ||||
---|---|---|---|---|
id | 3 | 4 | 1 | 2 |
dag_id | pr_39430 | pr_39430 | null | null |
state | success | success | running | running |
job_type | LocalTaskJob | LocalTaskJob | TriggererJob | SchedulerJob |
start_date | 2024-05-22 11:45:23.867096 +00:00 | 2024-05-22 11:45:29.782799 +00:00 | 2024-05-22 11:44:49.709191 +00:00 | 2024-05-22 11:44:49.813874 +00:00 |
end_date | 2024-05-22 11:45:24.143576 +00:00 | 2024-05-22 11:45:30.040894 +00:00 | null | null |
latest_heartbeat | 2024-05-22 11:45:23.853896 +00:00 | 2024-05-22 11:45:29.766046 +00:00 | 2024-05-22 11:45:35.885734 +00:00 | 2024-05-22 11:45:37.722701 +00:00 |
executor_class | null | null | null | null |
hostname | cd2591016c8c | cd2591016c8c | cd2591016c8c | cd2591016c8c |
unixname | root | root | root | root |
Table task_instance
column | ||
---|---|---|
task_id | do_nothing | do_nothing |
dag_id | pr_39430 | pr_39430 |
run_id | scheduled__2024-02-01T00:00:00+00:00 | manual__2024-05-22T11:45:28.453199+00:00 |
map_index | -1 | -1 |
start_date | 2024-05-22 11:45:23.921193 +00:00 | 2024-05-22 11:45:29.821869 +00:00 |
end_date | 2024-05-22 11:45:24.085409 +00:00 | 2024-05-22 11:45:29.979149 +00:00 |
duration | 0.164216 | 0.15728 |
state | success | success |
try_number | 1 | 1 |
max_tries | 0 | 0 |
hostname | cd2591016c8c | cd2591016c8c |
unixname | root | root |
job_id | 3 | 4 |
pool | default_pool | default_pool |
pool_slots | 1 | 1 |
queue | default | default |
priority_weight | 1 | 1 |
operator | _PythonDecoratedOperator | _PythonDecoratedOperator |
custom_operator_name | @task | @task |
queued_dttm | 2024-05-22 11:45:23.394469 +00:00 | 2024-05-22 11:45:29.546020 +00:00 |
queued_by_job_id | 2 | 2 |
pid | 2375 | 2377 |
executor | null | null |
executor_config | 0x80057D942E | 0x80057D942E |
updated_at | 2024-05-22 11:45:23.930163 +00:00 | 2024-05-22 11:45:29.831787 +00:00 |
rendered_map_index | null | null |
external_executor_id | null | null |
trigger_id | null | null |
trigger_timeout | null | null |
next_method | null | null |
next_kwargs | null | null |
task_display_name | do_nothing | do_nothing |
Table dag_run
column | ||
---|---|---|
id | 1 | 2 |
dag_id | pr_39430 | pr_39430 |
queued_at | 2024-05-22 11:45:23.351323 +00:00 | 2024-05-22 11:45:28.490357 +00:00 |
execution_date | 2024-02-01 00:00:00.000000 +00:00 | 2024-05-22 11:45:28.453199 +00:00 |
start_date | 2024-05-22 11:45:23.373351 +00:00 | 2024-05-22 11:45:29.516604 +00:00 |
end_date | 2024-05-22 11:45:24.476332 +00:00 | 2024-05-22 11:45:30.609079 +00:00 |
state | success | success |
run_id | scheduled__2024-02-01T00:00:00+00:00 | manual__2024-05-22T11:45:28.453199+00:00 |
creating_job_id | 2 | null |
external_trigger | false | true |
run_type | scheduled | manual |
conf | 0x80057D942E | 0x80057D942E |
data_interval_start | 2024-02-01 00:00:00.000000 +00:00 | 2024-05-22 11:45:28.453199 +00:00 |
data_interval_end | 2024-02-01 00:00:00.000000 +00:00 | 2024-05-22 11:45:28.453199 +00:00 |
last_scheduling_decision | 2024-05-22 11:45:24.474830 +00:00 | 2024-05-22 11:45:30.606623 +00:00 |
dag_hash | cd6f55b5c736bf50d2f341663c0ad83f | cd6f55b5c736bf50d2f341663c0ad83f |
log_template_id | 2 | 2 |
updated_at | 2024-05-22 11:45:24.477225 +00:00 | 2024-05-22 11:45:30.609766 +00:00 |
clear_number | 0 | 0 |
Relationships
LocalJob (id: 3) create TI (run_id: scheduled__2024-02-01T00:00:00+00:00
)
LocalJob (id: 4) create TI (run_id: manual__2024-05-22T11:45:28.453199+00:00
)
SchedulerJob (id: 2) queued TI (run_id: scheduled__2024-02-01T00:00:00+00:00
)
SchedulerJob (id: 2) queued TI (run_id: manual__2024-05-22T11:45:28.453199+00:00
) - Even if it Manual DAG Run
DagRun (run_id: scheduled__2024-02-01T00:00:00+00:00
) created by SchedulerJob (id: 2)
DagRun (run_id: manual__2024-05-22T11:45:28.453199+00:00
) doesn't created by any Job
queued_by_job = relationship( | ||
"Job", | ||
back_populates="task_instances_enqueued", | ||
primaryjoin="Job.id == foreign(TaskInstance.queued_by_job_id)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have uselist=False since it was removed from backref at the other end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was removed from backref at the other end?
TBH it the other way round, references right now setup in the both way.
I think we should have uselist=False
Maybe it should be setup uselist=False
in both way if it one-to-one relationship
backref
is a legacy approach for construct relationships, instead of that preferable way is useback_populates
keyword argument inrelationship
for built it explicitly^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.