Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New CUDA changes completely break rwkv.cpp #272

Closed
LoganDark opened this issue Jun 20, 2023 · 1 comment
Closed

New CUDA changes completely break rwkv.cpp #272

LoganDark opened this issue Jun 20, 2023 · 1 comment

Comments

@LoganDark
Copy link
Contributor

Reposting this here with no changes because we're too upset to perfect it, sorry.


It looks like the latest GPU changes have completely broken rwkv.cpp inference - here is a pull request that seems to reproduce the issue: RWKV/rwkv.cpp#103

without cuBLAS:

image

with cuBLAS:

image

Removing the calls to ggml_cuda_assign_buffers fixes the issue...

image

...but of course then it might not actually be doing anything with cuBLAS~

(In practice, I know it probably is, because the precision seems slightly messed up, but I don't know if this is making use of the full acceleration or not.)

AIUI, the usage contract for cuBLAS acceleration has changed, but I can't seem to figure out how it has changed.

Any help would be much appreciated~

-Emily

@LoganDark
Copy link
Contributor Author

OK, it looks like the API usage has just changed. You have to manually change the tensor->backend to GGML_BACKEND_GPU or GGML_BACKEND_GPU_SPLIT before calling ggml_cuda_transform_tensor on it. The pointer is to the tensor data, or wherever the weights actually are (say, in a mmap'd file).

Thanks to @JohannesGaessler for giving us this info~

-Emily

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant