Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Using llama.cpp serve hosted model #13107

Open
1 task done
JulianOestreich90 opened this issue Apr 25, 2024 · 4 comments
Open
1 task done

[Question]: Using llama.cpp serve hosted model #13107

JulianOestreich90 opened this issue Apr 25, 2024 · 4 comments
Labels
question Further information is requested

Comments

@JulianOestreich90
Copy link

JulianOestreich90 commented Apr 25, 2024

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

I am using llama.cpp http server to serve a model with a docker container:
docker run -d -p ipadress:8080:8080 -v ~/models:/models --gpus all ghcr.io/ggerganov/llama.cpp:server-cuda -m models/mistral-7b-instruct-v0.2.Q4_0.gguf -c 32000 --host 0.0.0.0 --port 8080 --n-gpu-layers 99

I can talk with the model via curl, e.g.:

curl --request POST \                        
    --url http://ipadress:8080/completion \
    --header "Content-Type: application/json" \
    --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'

And it returns a response.

However integrating it into llama index does't seem so easy. I tried these two options:

import os
from llama_index.llms.openllm import OpenLLMAPI

os.environ["OPENLLM_ENDPOINT"] = "http://ipadress:8080"  # replace with your server address
remote_llm = OpenLLMAPI()

response = remote_llm.complete("Hello World!")
print(str(response))

##########################################################

from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(api_base="http://ipadress:8080")

response = llm.complete("Hello World!")
print(str(response))

The OpenLLMAPI Options errors with AttributeError: 'HTTPStatusError' object has no attribute 'message' and the OpenAILike class also gives connection errors.

What can i do to connect properly to a self hosted remote llm?

@JulianOestreich90 JulianOestreich90 added the question Further information is requested label Apr 25, 2024
Copy link

dosubot bot commented Apr 25, 2024

To address the issues you're facing with connecting to a self-hosted remote LLM using llama.cpp within a Docker container, here are some focused steps and clarifications:

  1. Error Handling Update: The AttributeError: 'HTTPStatusError' object has no attribute 'message' occurs because the .message attribute is deprecated in Python 3. Instead of exception.message, use str(exception) or exception.args to access the error message. This adjustment is necessary for both OpenLLMAPI and OpenAILike classes when handling exceptions.

  2. Using LlamaCPP with Docker: Ensure you have the llama-cpp-python package installed in your environment. This package is crucial for interfacing with llama.cpp models. If it's not installed, you can add it using pip install llama-cpp-python.

  3. Model Configuration: When setting up your model, you can specify either a model_path for a pre-downloaded model or a model_url for dynamic downloading. Adjust your Docker setup accordingly to ensure the model is accessible or the container can download the model.

  4. Text Generation Parameters: Customize parameters like temperature, max_new_tokens, and context_window to control the model's output. Use the LlamaCPP class for setting these parameters and generating text.

  5. Docker Container Setup: Verify that your Docker container is correctly configured for model access (either through a local file or internet access for downloading) and that the llama-cpp-python package is installed within the container.

  6. Advanced Features: Explore LlamaCPP class's advanced features for applications like real-time chat interactions.

For detailed implementation, refer to the LlamaCPP class documentation and ensure your code is updated to handle exceptions correctly. This should help you successfully connect to and interact with your self-hosted LLM.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@JulianOestreich90
Copy link
Author

@dosu why should i install llama-cpp-python on my server? Does that mean llama index can not handle the API spawned by the llama.cpp http server? It is supposed to be an OpenAI compatible API as well.

Copy link

dosubot bot commented Apr 25, 2024

The issue with the AttributeError: 'HTTPStatusError' object has no attribute 'message' you're encountering is due to the way exceptions are handled in the OpenLLMAPI class. Modern Python versions and HTTP client libraries don't use a .message attribute on exception objects. Instead, you should use str(exception) or access specific attributes like .status_code or .detail to get the error information. This adjustment is necessary for proper error handling in your code.

Regarding the OpenAILike class connection errors, these could stem from various configuration issues such as incorrect API keys, missing dependencies, or incompatible model metadata. Ensure all configurations and dependencies are correctly set up.

For your question about installing llama-cpp-python on your server, the LlamaCPP class within llama_index is designed to work directly with the llama-cpp library, not just through an HTTP server API. This direct integration requires the llama-cpp-python package to be installed on your server for the LlamaCPP class to function correctly. It uses the Llama class from the llama_cpp library for model initialization, completions, and other tasks, which is why the package is necessary even if the llama.cpp HTTP server provides an OpenAI-compatible API.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@RussellLuo
Copy link
Contributor

Hi @JulianOestreich90, in addition to the API you provided, llama.cpp also supports OpenAI-compatible Chat Completions API (search POST /v1/chat/completions). If you use that API, I guess OpenAI should suffice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants