[Question]: Using llama.cpp serve hosted model #13107

JulianOestreich90 · 2024-04-25T15:18:49Z

Question Validation

I have searched both the documentation and discord for an answer.

Question

I am using llama.cpp http server to serve a model with a docker container:
docker run -d -p ipadress:8080:8080 -v ~/models:/models --gpus all ghcr.io/ggerganov/llama.cpp:server-cuda -m models/mistral-7b-instruct-v0.2.Q4_0.gguf -c 32000 --host 0.0.0.0 --port 8080 --n-gpu-layers 99

I can talk with the model via curl, e.g.:

curl --request POST \                        
    --url http://ipadress:8080/completion \
    --header "Content-Type: application/json" \
    --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'

And it returns a response.

However integrating it into llama index does't seem so easy. I tried these two options:

import os
from llama_index.llms.openllm import OpenLLMAPI

os.environ["OPENLLM_ENDPOINT"] = "http://ipadress:8080"  # replace with your server address
remote_llm = OpenLLMAPI()

response = remote_llm.complete("Hello World!")
print(str(response))

##########################################################

from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(api_base="http://ipadress:8080")

response = llm.complete("Hello World!")
print(str(response))

The OpenLLMAPI Options errors with AttributeError: 'HTTPStatusError' object has no attribute 'message' and the OpenAILike class also gives connection errors.

What can i do to connect properly to a self hosted remote llm?

The text was updated successfully, but these errors were encountered:

dosubot · 2024-04-25T15:19:10Z

To address the issues you're facing with connecting to a self-hosted remote LLM using llama.cpp within a Docker container, here are some focused steps and clarifications:

Error Handling Update: The AttributeError: 'HTTPStatusError' object has no attribute 'message' occurs because the .message attribute is deprecated in Python 3. Instead of exception.message, use str(exception) or exception.args to access the error message. This adjustment is necessary for both OpenLLMAPI and OpenAILike classes when handling exceptions.
Using LlamaCPP with Docker: Ensure you have the llama-cpp-python package installed in your environment. This package is crucial for interfacing with llama.cpp models. If it's not installed, you can add it using pip install llama-cpp-python.
Model Configuration: When setting up your model, you can specify either a model_path for a pre-downloaded model or a model_url for dynamic downloading. Adjust your Docker setup accordingly to ensure the model is accessible or the container can download the model.
Text Generation Parameters: Customize parameters like temperature, max_new_tokens, and context_window to control the model's output. Use the LlamaCPP class for setting these parameters and generating text.
Docker Container Setup: Verify that your Docker container is correctly configured for model access (either through a local file or internet access for downloading) and that the llama-cpp-python package is installed within the container.
Advanced Features: Explore LlamaCPP class's advanced features for applications like real-time chat interactions.

For detailed implementation, refer to the LlamaCPP class documentation and ensure your code is updated to handle exceptions correctly. This should help you successfully connect to and interact with your self-hosted LLM.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

JulianOestreich90 · 2024-04-25T16:56:40Z

@dosu why should i install llama-cpp-python on my server? Does that mean llama index can not handle the API spawned by the llama.cpp http server? It is supposed to be an OpenAI compatible API as well.

dosubot · 2024-04-25T17:32:20Z

The issue with the AttributeError: 'HTTPStatusError' object has no attribute 'message' you're encountering is due to the way exceptions are handled in the OpenLLMAPI class. Modern Python versions and HTTP client libraries don't use a .message attribute on exception objects. Instead, you should use str(exception) or access specific attributes like .status_code or .detail to get the error information. This adjustment is necessary for proper error handling in your code.

Regarding the OpenAILike class connection errors, these could stem from various configuration issues such as incorrect API keys, missing dependencies, or incompatible model metadata. Ensure all configurations and dependencies are correctly set up.

For your question about installing llama-cpp-python on your server, the LlamaCPP class within llama_index is designed to work directly with the llama-cpp library, not just through an HTTP server API. This direct integration requires the llama-cpp-python package to be installed on your server for the LlamaCPP class to function correctly. It uses the Llama class from the llama_cpp library for model initialization, completions, and other tasks, which is why the package is necessary even if the llama.cpp HTTP server provides an OpenAI-compatible API.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

RussellLuo · 2024-04-26T14:59:26Z

Hi @JulianOestreich90, in addition to the API you provided, llama.cpp also supports OpenAI-compatible Chat Completions API (search POST /v1/chat/completions). If you use that API, I guess OpenAI should suffice.

JulianOestreich90 added the question Further information is requested label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Using llama.cpp serve hosted model #13107

[Question]: Using llama.cpp serve hosted model #13107

JulianOestreich90 commented Apr 25, 2024 •

edited

dosubot bot commented Apr 25, 2024 •

edited

Details

JulianOestreich90 commented Apr 25, 2024

dosubot bot commented Apr 25, 2024

Details

RussellLuo commented Apr 26, 2024

[Question]: Using llama.cpp serve hosted model #13107

[Question]: Using llama.cpp serve hosted model #13107

Comments

JulianOestreich90 commented Apr 25, 2024 • edited

Question Validation

Question

dosubot bot commented Apr 25, 2024 • edited

Details

JulianOestreich90 commented Apr 25, 2024

dosubot bot commented Apr 25, 2024

Details

RussellLuo commented Apr 26, 2024

JulianOestreich90 commented Apr 25, 2024 •

edited

dosubot bot commented Apr 25, 2024 •

edited