How to accelerate inference? #16

dengtianbi · 2023-09-14T11:08:59Z

Hi,

I enabled the cublas compilation option.

The problem is that not charge o process all in GRAM memory?

What is the best line command to construct and execute in a CUDA 3090 with 24GB GRAM in the more fast posibility for each model?

Maknee · 2023-09-14T21:16:37Z

Take a look at #15. Minigpt4 model is composed of two models (vision and text). The vision model does not support GPU usage, but the text model (vicuna) does.

Try enabling LLAMA_CUBLAS and see if you can run part of the model on the GPU. I haven't tested these flags before, but I would assume that they would work.

deadpipe · 2023-11-23T19:04:28Z

@Maknee

I tried setting option(MINIGPT4_CUBLAS "minigpt4: use cuBLAS" ON) in the CMakeLists.txt.

But when i run cmake --build . --config Release,

i get this error below unfortunately : -

Any advice to deal with is highly appreciated

deadpipe mentioned this issue Nov 26, 2023

Unable to use GPU acceleration #21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to accelerate inference? #16

How to accelerate inference? #16

dengtianbi commented Sep 14, 2023

Maknee commented Sep 14, 2023

deadpipe commented Nov 23, 2023

How to accelerate inference? #16

How to accelerate inference? #16

Comments

dengtianbi commented Sep 14, 2023

Maknee commented Sep 14, 2023

deadpipe commented Nov 23, 2023