-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to accelerate inference? #16
Comments
Take a look at #15. Minigpt4 model is composed of two models (vision and text). The vision model does not support GPU usage, but the text model (vicuna) does. Try enabling LLAMA_CUBLAS and see if you can run part of the model on the GPU. I haven't tested these flags before, but I would assume that they would work. |
I tried setting option(MINIGPT4_CUBLAS "minigpt4: use cuBLAS" ON) in the CMakeLists.txt. But when i run cmake --build . --config Release, i get this error below unfortunately : - Any advice to deal with is highly appreciated |
Hi,
I enabled the cublas compilation option.
The problem is that not charge o process all in GRAM memory?
What is the best line command to construct and execute in a CUDA 3090 with 24GB GRAM in the more fast posibility for each model?
The text was updated successfully, but these errors were encountered: