Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Deterministic foreach task id's for Argo Workflows #1704

Merged
merged 16 commits into from May 6, 2024

Conversation

saikonen
Copy link
Collaborator

changes the Task ID generation for tasks in foreach branches to be deterministic. This is so that the following join steps do not require an enormous list of ID's to be passed in through the input-paths input Parameter, which has limited the number of foreach tasks we can run on Argo Workflows until now.

Note: Still requires rigorous testing for different split cases to confirm this is ready for prime time.

Closes #1538 properly. Supersedes #1655

@nflx-mf-bot
Copy link
Collaborator

Testing[608] @ d7a320b

@nflx-mf-bot
Copy link
Collaborator

Testing[608] @ d7a320b had 1 FAILURE.

@romain-intel
Copy link
Contributor

I'll check but failure is probably that card test that I need to increase timeout for. There is no change to common code.

@saikonen saikonen changed the title fix: Introduce deterministic task id's to Argo Workflows fix: Deterministic foreach task id's for Argo Workflows Jan 29, 2024
@nflx-mf-bot
Copy link
Collaborator

Testing[608] @ bcc2026

@@ -883,6 +885,34 @@ def _visit(node, exit_node=None, templates=None, dag_tasks=None):
)
)
]
# We need to add the split-index and root-input-path for the last step in any foreach to be able to generate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more comments in the code about the need for root-input-path. Given the complexity of this work, it will be non-trivial to ship further changes without full context on this change.

Copy link
Collaborator

@savingoyal savingoyal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still reviewing

from hashlib import md5


def generate_input_paths(step_name, timestamp, input_paths, max_splits):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input_paths have been flow/run/step/task previously but it seems that now input_paths are run/step/task?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what the join step used to receive as input-paths was the following

"argo-{{workflow.name}}/%s/{{tasks.%s.outputs.parameters}}"

It seems to be a case of mislabeled variables in the original script?

# For foreach splits generate the expected input-paths based on max_split and base_id via.
# newline required at the end due to 'echo' appending one in the shell side task_id creation.
task_str = "%s-%s\n" % (base, idx)
hash = md5(task_str.encode("utf-8")).hexdigest()[-8:]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be desirable to get task ids of the nature - t-<hash>-<idx> - in that scenario - we can probably get a much tighter compression of input_paths (assuming we change the input_paths compression/decompression logic) in addition to the UI looking a lot more nicer.

metaflow/plugins/argo/argo_workflows.py Outdated Show resolved Hide resolved
@savingoyal savingoyal merged commit cb200c5 into master May 6, 2024
26 checks passed
@savingoyal savingoyal deleted the fix/introduce-deterministic-task_id-to-argo branch May 6, 2024 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Argo Workflows template exceeds max size with preceding large foreach
4 participants