Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to accelerate inference? #16

Open
dengtianbi opened this issue Sep 14, 2023 · 2 comments
Open

How to accelerate inference? #16

dengtianbi opened this issue Sep 14, 2023 · 2 comments

Comments

@dengtianbi
Copy link

Hi,

I enabled the cublas compilation option.

The problem is that not charge o process all in GRAM memory?

What is the best line command to construct and execute in a CUDA 3090 with 24GB GRAM in the more fast posibility for each model?

@Maknee
Copy link
Owner

Maknee commented Sep 14, 2023

Take a look at #15. Minigpt4 model is composed of two models (vision and text). The vision model does not support GPU usage, but the text model (vicuna) does.

Try enabling LLAMA_CUBLAS and see if you can run part of the model on the GPU. I haven't tested these flags before, but I would assume that they would work.

@deadpipe
Copy link

@Maknee

I tried setting option(MINIGPT4_CUBLAS "minigpt4: use cuBLAS" ON) in the CMakeLists.txt.

But when i run cmake --build . --config Release,

i get this error below unfortunately : -

C__Windows_System32_cmd exe 24_11_2023 00_27_05

Any advice to deal with is highly appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants