Demo for Drafting: On-GPU Data Access (CUDA, CuPy, PyTorch, DLPack) #2429
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a scratch work showcasing concepts for gpu data access coming together. It's a quick half-working thing.
#1986 #1985 #210 #2391 #120 (note: #120 includes opencl work!) #2426 #57
Demo (watch the text output: attaches a vertex buffer to a program, then exports it to pytorch and cupy in-place):
I copied from the dlpack repository to try to write a basic dlpack wrapper for vispy's GL buffers. I spent maybe six hours on this, and am sharing it now that it actually runs at all. I'm not sure whether I will return to this or not, but I expect others would find it helpful (as well as frustrating and undocumented).
EDIT: I first posted this without striding the merged program data. It is now manually unstrided, and the output data is correct.
EDIT: I first posted this with an extant cupy crash. I've now addressed byte offset quirks for both torch and cupy, and passed the device type as a python int rather than a ctypes value. cupy now loads the data correctly.
EDIT: The next remaining issue would be organizing the code with vispy's GL pipeline. I'm not sure how to flush GL commands from the client without breaking the pipeline (which I haven't studied). A similar but separate issue may be synchronizing the CUDA stream with the GLIR queue.
But it was pretty exciting for me to see similar data output in the torch tensor as I passed in to vispy's program, once I got this far.
EDIT: Current output is: