llamacpp chat/completions response unrelated to prompt on cpu local deploy #1919

semsion · 2024-03-28T10:43:43Z

semsion
Mar 28, 2024

OS: Ubuntu 23.04.
CPU: Intel i7-11370H 4.8Ghz (x8)
RAM: 32GB

Via a local deployment, when calling the chat/completions endpoint via llamacpp, with the Docker AIO image, and a basic prompt, an unexpected response is being receiving. This has happened repeatedly over multiple tries.

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "bakllava.gguf", "messages": [{"role": "user", "content": "Hello", "temperature": 0.9}] }'
{"created":1711583083,"object":"chat.completion","id":"bdffe1ef-5cff-4846-86ea-dce4fffb64bc","model":"bakllava.gguf","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":" Theophilus and all the saints. In the Name of Our Lord and Saviour, Jesus Christ. Amen.\n Teil III.  5.  St. Theophilus of Salamis.  114-1155.  St. Theophilus.  Salamis.  Cyprus.  St. Irenaeus.  Lyons. 115.  St. Theophilus.  Salamis.  Cyprus.  St. Irenaeus.  Lyons. 115.  St. Theophilus.  Salamis.  Cyprus.  St. Irenaeus.  Lyons. 115.  St. Theophilus.  Salamis.  Cyprus.  St. Irenaeus.  Lyons. 115.  St. Theophilus.  Salamis.  Cyprus.  St. Irenaeus.  Lyons. 115.  St. Theophilus.  Salamis.  Cyprus.  St. Irenaeus.  Lyons. 115.  St. Theophilus.  Salamis.  Cyprus.  St. Irenaeus.  Lyons. 115.  St. Theophilus.  Salamis.  Cyprus.  St. Irenaeus.  Lyons. 115.  St. Theophilus.  Salamis.  Cyprus.  St. Irenaeus.  Lyons. 115.  St. Theophilus.  Salamis.  Cyprus.  St. Irenaeus.  Lyons. 115.  St. Theophilus.  Salamis.  Cyprus.  St. Irenaeus.  Lyons. 115.  St. Theophilus.  Salamis.  Cyprus.  St. Irenaeus.  Lyons. 115.  St. Theophilus.  Salamis.  Cyprus.  St. Irenaeus.  Lyons. 115."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

Does anyone have any information as to why this could possibly be happening?!

Answered by mudler

Mar 28, 2024

Don't use the model file as model in the request unless you want to handle the prompt template for yourself.

Just use the model names like you would do with OpenAI. For instance gpt-4-vision-preview, or gpt-4 are already present in the AIO images, just use those as model when doing the curl calls.

View full answer

mudler · 2024-03-28T10:58:47Z

mudler
Mar 28, 2024
Maintainer

Don't use the model file as model in the request unless you want to handle the prompt template for yourself.

Just use the model names like you would do with OpenAI. For instance gpt-4-vision-preview, or gpt-4 are already present in the AIO images, just use those as model when doing the curl calls.

0 replies

semsion · 2024-03-28T11:37:05Z

semsion
Mar 28, 2024
Author

Thank you @mudler - responses are coming through in a more coherent manner now.

1 reply

mudler Mar 28, 2024
Maintainer

thanks for raising the issue - it is good point actually, updating the docs now to mention this explicitly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llamacpp chat/completions response unrelated to prompt on cpu local deploy #1919

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

llamacpp chat/completions response unrelated to prompt on cpu local deploy #1919

semsion Mar 28, 2024

Replies: 2 comments · 1 reply

mudler Mar 28, 2024 Maintainer

semsion Mar 28, 2024 Author

mudler Mar 28, 2024 Maintainer

semsion
Mar 28, 2024

Replies: 2 comments 1 reply

mudler
Mar 28, 2024
Maintainer

semsion
Mar 28, 2024
Author

mudler Mar 28, 2024
Maintainer