New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AIO - memory issue - embedding #2126
Comments
Hello shuther, Thank you for reporting this issue. I apologize for the inconvenience you're experiencing with the LocalAI container image. Based on the information you've provided, it seems this is a memory allocation issue on your VM. Could you please provide the output of the following command to help us gather more details about your GPU and system capabilities? nvidia-smi This command will give us information about the GPU usage and memory details. Additionally, please try to run the command you provided but with the following modification: CUDA_LAUNCH_BLOCKING=0 This will disable CUDA launch blocking, which might help to prevent the GPU memory error. You should run the command like this: CUDA_LAUNCH_BLOCKING=0 curl http://linuxmain.local:8445/embeddings \
-X POST -H "Content-Type: application/json" \
-d '{"input": "Your text string goes here", "model": "text-embedding-ada-002}"' I understand that this issue is crucial for your project, and I assure you that we'll do our best to investigate this problem and provide a solution as soon as possible. Best regards, The Github Bot (Experiment of @mudler) |
works fine if I stop the docker and start it again; likely we need a better eviction approach when we switch models? |
extra logs
I spotted this error also:
Now with LOCALAI_SINGLE_ACTIVE_BACKEND=true we get the embedding working. nvidia-smi
|
LocalAI version:
container image: AIO Cuda12-latest
Environment, CPU architecture, OS, and Version:
VM ubuntu 22.04 latest
nvidia 2600
Describe the bug
get memory issue while switching and testing multiple prompts.
Error for embeddings while image generation works fine.
To Reproduce
all the curl tests published in the documentation
Expected behavior
no error, old models are evicted if memory pressure is too high
Logs
Additional context
The text was updated successfully, but these errors were encountered: