New CUDA changes completely break rwkv.cpp #272

LoganDark · 2023-06-20T16:55:08Z

Reposting this here with no changes because we're too upset to perfect it, sorry.

It looks like the latest GPU changes have completely broken rwkv.cpp inference - here is a pull request that seems to reproduce the issue: RWKV/rwkv.cpp#103

without cuBLAS:

with cuBLAS:

Removing the calls to ggml_cuda_assign_buffers fixes the issue...

...but of course then it might not actually be doing anything with cuBLAS~

(In practice, I know it probably is, because the precision seems slightly messed up, but I don't know if this is making use of the full acceleration or not.)

AIUI, the usage contract for cuBLAS acceleration has changed, but I can't seem to figure out how it has changed.

Any help would be much appreciated~

-Emily

The text was updated successfully, but these errors were encountered:

LoganDark · 2023-06-20T18:13:14Z

OK, it looks like the API usage has just changed. You have to manually change the tensor->backend to GGML_BACKEND_GPU or GGML_BACKEND_GPU_SPLIT before calling ggml_cuda_transform_tensor on it. The pointer is to the tensor data, or wherever the weights actually are (say, in a mmap'd file).

Thanks to @JohannesGaessler for giving us this info~

-Emily

LoganDark closed this as completed Jun 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New CUDA changes completely break rwkv.cpp #272

New CUDA changes completely break rwkv.cpp #272

LoganDark commented Jun 20, 2023

LoganDark commented Jun 20, 2023

New CUDA changes completely break rwkv.cpp #272

New CUDA changes completely break rwkv.cpp #272

Comments

LoganDark commented Jun 20, 2023

LoganDark commented Jun 20, 2023