New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: Using llama.cpp serve hosted model #13107
Comments
To address the issues you're facing with connecting to a self-hosted remote LLM using
For detailed implementation, refer to the
|
@dosu why should i install |
The issue with the Regarding the For your question about installing
|
Hi @JulianOestreich90, in addition to the API you provided, llama.cpp also supports OpenAI-compatible Chat Completions API (search |
Question Validation
Question
I am using llama.cpp http server to serve a model with a docker container:
docker run -d -p ipadress:8080:8080 -v ~/models:/models --gpus all ghcr.io/ggerganov/llama.cpp:server-cuda -m models/mistral-7b-instruct-v0.2.Q4_0.gguf -c 32000 --host 0.0.0.0 --port 8080 --n-gpu-layers 99
I can talk with the model via curl, e.g.:
And it returns a response.
However integrating it into llama index does't seem so easy. I tried these two options:
The OpenLLMAPI Options errors with
AttributeError: 'HTTPStatusError' object has no attribute 'message'
and the OpenAILike class also gives connection errors.What can i do to connect properly to a self hosted remote llm?
The text was updated successfully, but these errors were encountered: