Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about async mode of DML #34

Closed
Sean58238 opened this issue Sep 27, 2023 · 3 comments
Closed

Question about async mode of DML #34

Sean58238 opened this issue Sep 27, 2023 · 3 comments
Assignees

Comments

@Sean58238
Copy link

We have some question about async mode of DML, thanks for comments:
(1) If one device config two or more SWQ, does DML async mode submit the job to every WQ, how to select and allocate these jobs to these WQs ,is each WQ get the jobs balanced?
(2) If a WQ has two or more engines, the completion of the jobs is disordered or sequential, in other words, and does any flag can control the dml_wait_job() function keep the jobs exaction sequence.

@mzhukova
Copy link
Contributor

mzhukova commented Oct 9, 2023

Hi @Sean58238,
As for the first questions, in short DML would iterate over all available DSA instances on a NUMA node (without crossing the node boundary), and within each instance it would iterate over all the work queues until the job is submitted successfully (for instance, if everything is free, we would submit to device 0, queue 0; if this is busy, we would go to device 0, queue 1 ... queue N until submission happens), then the position would be stored and the next time, the process would be started over from this place.
As for the second questions, I didn't quite get what you mean by "disordered or sequential" in case of having multiple engines, could you please clarify? But the balancing mechanism is not available to the user and couldn't be altered.

Hope this clarifies things.

@mzhukova mzhukova self-assigned this Oct 9, 2023
@mzhukova
Copy link
Contributor

hi @Sean58238 does this answer your questions?

@Sean58238
Copy link
Author

Sean58238 commented Oct 13, 2023

For second question, I thinks it means if want to move a large data(e.g. 100MB), it can split with some small chunks (or may these chunks has different size, like 4k or 8k) to submit, the question is wait can ensure the results of transfer still in original order?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants