-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Add equivalent of cudaDeviceSynchronize() #118
Comments
I dont see a problem adding this. cudaDeviceSynchronize "Blocks until the device has completed all preceding requested tasks." https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g10e20b05a95f638a4071a655503df25d so it means we would just wait all task, submitted to all engines, to complete. does it need to be per context or for all contexts running in the device? CUDA is just void, so I guess is the latter for them. |
BTW: @jbrodman this is the repo for the loader. Please open it in the repo for the spec in level-zero |
@jbrodman If you added ticket to spec repo, could you close this one? (link to that spec bug you filed would be nice too) |
Many applications written in CUDA rely on the whole device synchronization behavior of cudaDeviceSynchronize().
Trying to migrate applications that use this to SYCL, for example, is not really possible.
Question: Why does this need to be in L0? Why can't you solve it at a higher level?
Answer: The higher level layer may not have 100% visibility over how level zero is being used. If the L0 plugin in DPC++ tried to add something like this - programs may have incorrect behavior if the user application ALSO uses L0 directly - the plugin has no knowledge of any queues created in the user application or libaries.
The text was updated successfully, but these errors were encountered: