Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mode tts, gpt4 or vision frozes from time to time without timeout #2258

Open
DavidGOrtega opened this issue May 7, 2024 · 5 comments
Open
Labels
bug Something isn't working unconfirmed

Comments

@DavidGOrtega
Copy link

DavidGOrtega commented May 7, 2024

LocalAI version:
2.14.0

Environment, CPU architecture, OS, and Version:
Linux Ubuntu SMP PREEMPT_DYNAMIC x86_64 x86_64 x86_64 GNU/Linux

90GB RAM 22 vcores
nvidia L4 24GB

Describe the bug
Requests frozen from time to time
My logs are continuously producing:

2024-05-06T14:09:55.116298748Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
api_1  | 2024-05-06T14:09:55.116344987Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
api_1  | 2024-05-06T14:09:55.116358667Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
api_1  | 2024-05-06T14:09:55.116369597Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
api_1  | 2024-05-06T14:09:55.116380867Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
api_1  | 2024-05-06T14:09:55.116394727Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
api_1  | 2024-05-06T14:09:55.116405866Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
api_1  | 2024-05-06T14:09:55.116433847Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
api_1  | 2024-05-06T14:09:55.116445007Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1
@DavidGOrtega DavidGOrtega added bug Something isn't working unconfirmed labels May 7, 2024
@localai-bot
Copy link
Contributor

Thank you for reporting this issue, DavidGOrtega. We will investigate and try to reproduce the problem in a controlled environment. In the meantime, if you have any other details or find anything that could help, please let us know. We'll get back to you as soon as we have more information or a resolution to the issue. This is an ongoing experiment by @mudler, and we're here to help improve LocalAI.

@mudler
Copy link
Owner

mudler commented May 7, 2024

This happens when the prompt exceeds the context size and there is no more space for the response - looks like something we could handle on our side and fail cleanly instead.

What's your context size window? Can you share your model config/ setup?

@DavidGOrtega
Copy link
Author

@mudler Im no even using that model as I use mine, and nothing is apparently requesting it. The only thing I did with that model was install it and then delete it after try it. Is that model gpt-4?

@DavidGOrtega
Copy link
Author

An easy way to hang the system is to make several requests to tts endpoint in a row (in my case no more than three) to generate the speech of a larger text. it hangs and never timeouts.

Tested with piper and bark

@netandreus
Copy link

netandreus commented May 9, 2024

I confirm this issue. Use LocalAI (v2.14.0) with Orca2. Here are logs:

5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_kv_cache_init:      Metal KV buffer size =  1600.00 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: KV self size  = 1600.00 MiB, K (f16):  800.00 MiB, V (f16):  800.00 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model:        CPU  output buffer size =     0.14 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model:      Metal compute buffer size =   204.00 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model:        CPU compute buffer size =    14.01 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: graph nodes  = 1286
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: graph splits = 2
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"initialize","line":502,"message":"initializing slots","n_slots":1}
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"initialize","line":514,"message":"new slot","slot_id":0,"n_ctx_slot":2048}
5:57PM INF [llama-cpp] Loads OK
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"launch_slot_with_data","line":887,"message":"slot is processing task","slot_id":0,"task_id":0}
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"update_slots","line":1787,"message":"kv cache rm [p0, end)","slot_id":0,"task_id":0,"p0":0}
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr Context exhausted. Slot 0 released (0 tokens in cache)
...
...
...
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to decode the batch, n_batch = 1, ret = 1
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working unconfirmed
Projects
None yet
Development

No branches or pull requests

4 participants