-
-
Notifications
You must be signed in to change notification settings - Fork 776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How does CuPy work? #8228
Comments
Thanks for the feedback @tornikeo, indeed it's better to have docs covering CuPy internals. Here are quick answers:
This depends on the fucntion. Some are backed by
The blocksize is 128 for ElementwiseKernels and 512 for ReductionKenrels.
For most functions, NVRTC (take it as a library version of |
Description
There is no dicumentation on how CuPy works, end-to-end. Explanations for...
__global__
function occur?cuda.jit
-ed functions? How are they made available to each thread?nvcc
get called? with what args?would greatly help incoming developers to see the kernel issues before they arise. Like, why do I get an 1024 blocksize, but not at 512 blocksize?
Idea or request for content
No response
The text was updated successfully, but these errors were encountered: