Mode tts, gpt4 or vision frozes from time to time without timeout #2258

DavidGOrtega · 2024-05-07T13:46:08Z

LocalAI version:
2.14.0

Environment, CPU architecture, OS, and Version:
Linux Ubuntu SMP PREEMPT_DYNAMIC x86_64 x86_64 x86_64 GNU/Linux

90GB RAM 22 vcores
nvidia L4 24GB

Describe the bug
Requests frozen from time to time
My logs are continuously producing:

2024-05-06T14:09:55.116298748Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
api_1  | 2024-05-06T14:09:55.116344987Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
api_1  | 2024-05-06T14:09:55.116358667Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
api_1  | 2024-05-06T14:09:55.116369597Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
api_1  | 2024-05-06T14:09:55.116380867Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
api_1  | 2024-05-06T14:09:55.116394727Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
api_1  | 2024-05-06T14:09:55.116405866Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
api_1  | 2024-05-06T14:09:55.116433847Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
api_1  | 2024-05-06T14:09:55.116445007Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1

The text was updated successfully, but these errors were encountered:

localai-bot · 2024-05-07T13:47:40Z

Thank you for reporting this issue, DavidGOrtega. We will investigate and try to reproduce the problem in a controlled environment. In the meantime, if you have any other details or find anything that could help, please let us know. We'll get back to you as soon as we have more information or a resolution to the issue. This is an ongoing experiment by @mudler, and we're here to help improve LocalAI.

mudler · 2024-05-07T14:30:45Z

This happens when the prompt exceeds the context size and there is no more space for the response - looks like something we could handle on our side and fail cleanly instead.

What's your context size window? Can you share your model config/ setup?

DavidGOrtega · 2024-05-07T14:34:47Z

@mudler Im no even using that model as I use mine, and nothing is apparently requesting it. The only thing I did with that model was install it and then delete it after try it. Is that model gpt-4?

DavidGOrtega · 2024-05-07T18:54:13Z

An easy way to hang the system is to make several requests to tts endpoint in a row (in my case no more than three) to generate the speech of a larger text. it hangs and never timeouts.

Tested with piper and bark

netandreus · 2024-05-09T13:54:58Z

I confirm this issue. Use LocalAI (v2.14.0) with Orca2. Here are logs:

5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_kv_cache_init:      Metal KV buffer size =  1600.00 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: KV self size  = 1600.00 MiB, K (f16):  800.00 MiB, V (f16):  800.00 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model:        CPU  output buffer size =     0.14 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model:      Metal compute buffer size =   204.00 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model:        CPU compute buffer size =    14.01 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: graph nodes  = 1286
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: graph splits = 2
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"initialize","line":502,"message":"initializing slots","n_slots":1}
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"initialize","line":514,"message":"new slot","slot_id":0,"n_ctx_slot":2048}
5:57PM INF [llama-cpp] Loads OK
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"launch_slot_with_data","line":887,"message":"slot is processing task","slot_id":0,"task_id":0}
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"update_slots","line":1787,"message":"kv cache rm [p0, end)","slot_id":0,"task_id":0,"p0":0}
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr Context exhausted. Slot 0 released (0 tokens in cache)
...
...
...
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to decode the batch, n_batch = 1, ret = 1
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16

DavidGOrtega added bug Something isn't working unconfirmed labels May 7, 2024

mudler mentioned this issue May 10, 2024

ERROR: stderr update_slots : failed to find free space in the KV cache #2282

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mode tts, gpt4 or vision frozes from time to time without timeout #2258

Mode tts, gpt4 or vision frozes from time to time without timeout #2258

DavidGOrtega commented May 7, 2024 •

edited

localai-bot commented May 7, 2024

mudler commented May 7, 2024 •

edited

DavidGOrtega commented May 7, 2024

DavidGOrtega commented May 7, 2024

netandreus commented May 9, 2024 •

edited

Mode tts, gpt4 or vision frozes from time to time without timeout #2258

Mode tts, gpt4 or vision frozes from time to time without timeout #2258

Comments

DavidGOrtega commented May 7, 2024 • edited

localai-bot commented May 7, 2024

mudler commented May 7, 2024 • edited

DavidGOrtega commented May 7, 2024

DavidGOrtega commented May 7, 2024

netandreus commented May 9, 2024 • edited

DavidGOrtega commented May 7, 2024 •

edited

mudler commented May 7, 2024 •

edited

netandreus commented May 9, 2024 •

edited