Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Gptcache server: with OpenAi embedding cache seems to be not working properly. #612

Open
swatataidbuddy opened this issue Mar 6, 2024 · 7 comments

Comments

@swatataidbuddy
Copy link

Current Behavior

After i start the gptcache server with below command:
python server.py (file is from https://github.com/zilliztech/GPTCache/tree/main/gptcache_server) -s 0.0.0.0 -p 8000 -of gptcache.yml -o True

the service is up and running.
Now from a client program , when i make a request , for example:

  1. user question is - "President of pakistan" , i get a correct response
    but after this , when i post a new question , lets say
  2. "capital of India?" , the response for this question is the answer from the question "President of pakistan". Its retrieving from the cache.
    No matter how many new questions i ask , its the same answer from the cache.

This is happening , when i set embedding - "openai" in gptcache.yaml file.

Also irrespective of embedding, there is an issue in semantic cache.

  1. For "President of pakistan". and for "President of India" , the answer is same from the cache.

Expected Behavior

  1. When a client make a request to the gptcache server running
    , it should check whether there is a exact or similar cache entry in the cache, if so , the answer should be from the cache, else the answer should from open ai, and response to be stored in cache.

Steps To Reproduce

1) Copy the server.py from the gptcache_Server foler into a dir you want. 
2) Configure gptcache.yaml file:
embedding:
    openai
embedding_config:
    # Set embedding model params here
storage_config:
    data_dir:
        /Users/swathinarayanan/tolka_feedback_sep/gptdocker/gptcache_server/gptcache_data
    manager:
        sqlite,faiss
    vector_params:
        # Set vector storage related params here
evaluation:
    distance
evaluation_config:
    # Set evaluation metric kws here
pre_function:
    last_content
post_function:
    first
config:
    similarity_threshold: 0.8
    # Set other config here

3) Start the server:

python server.py -s 0.0.0.0 -p 8000 -of gptcache.yml -o True

4) Create a client program or API call and make request to gptcache server.

Example program:

import requests
import json
import time

def call_chat_completions_endpoint(base_url, api_key, user_question):
    # Endpoint URL
    url = f"{base_url}/v1/chat/completions"
    
    # Headers including the authorization token
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {api_key}'
    }
    
    # Request payload
    payload = {
       
        'model': 'gpt-3.5-turbo',
        'messages': [{"role": "system", "content": "You are a helpful assistant."},
                     {'role': 'user', 'content': user_question}],
        'top_k': 10,
            
        
    }
    
    # Send POST request
    start_time = time.time()
    response = requests.post(url, headers=headers, data=json.dumps(payload))
    
    # Check if the request was successful
    if response.status_code == 200:
        # Process the successful response
        print("Success:", response.json())
        print("Time Consumed: {:.2f}s".format(time.time() - start_time))
    else:
        # Handle errors
        print(f"Error: {response.status_code}, Message: {response.text}")

# Example usage
if __name__ == "__main__":
    # Define the base URL of your FastAPI application
    BASE_URL = "http://localhost:8000"
    
    # Your API key for authorization (if needed)
    API_KEY = "****************************"
    
    # User question to be sent to the chat completions endpoint
    USER_QUESTION = "what is coral reef ?"
    
    call_chat_completions_endpoint(BASE_URL, API_KEY, USER_QUESTION)

Environment

MAC OS
M1 chip

Anything else?

The docker build image given by you not working, when running the docker getting below error:

successfully installed package: openai
Traceback (most recent call last):
File "/usr/local/bin/gptcache_server", line 5, in
from gptcache_server.server import main
File "/usr/local/lib/python3.8/site-packages/gptcache_server/server.py", line 8, in
from gptcache.adapter import openai
File "/usr/local/lib/python3.8/site-packages/gptcache/adapter/openai.py", line 31, in
class ChatCompletion(openai.ChatCompletion, BaseCacheLLM):
File "/usr/local/lib/python3.8/site-packages/openai/lib/_old_api.py", line 39, in call
raise APIRemovedInV1(symbol=self._symbol)
openai.lib._old_api.APIRemovedInV1:

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run openai migrate to automatically upgrade your codebase to use the 1.0.0 interface.

Alternatively, you can pin your installation to the old version, e.g. pip install openai==0.28

@SimFG
Copy link
Collaborator

SimFG commented Mar 6, 2024

@swatataidbuddy

  1. As you have discovered, the effect of cache actually depends largely on the selection of embedding, because the embedding model is the key to extracting the semantics of strings.
  2. Regarding the openai version issue, I will update gptcache recently and upgrade the openai version to 1.x

@swatataidbuddy
Copy link
Author

@SimFG , Thanks for the response.
I got the point you said. My concern is irrespective of the selection of the embedding model, basic functionality , should be :

  1. A user post a question , get an answer from open ai.
  2. User post another new question (not relevant to first) , as there is no relation between first and second question , call should go to open ai and answer should be sent to client program as response, but what i am seeing is , the answer for first question is being taken from cache and sent as an answer to the client program.

@SimFG
Copy link
Collaborator

SimFG commented Mar 6, 2024

@swatataidbuddy
Because the core factor in determining whether two questions are similar is the choice of embedding model. If you want to be very accurate, your model must be very large. At the same time, the data obtained by the embedding model is actually inaccurate to judge similarity, because the model can only recognize the rough composition of the sentence, which means that it cannot recognize the semantics of the entire sentence that are completely different because of one word.

@swatataidbuddy
Copy link
Author

ok the behaviour seems to be very inconsistent , now when i retested , earlier issue i saw is not happening, but seeing a different issue, eventhough when i ask the same question multiple times , it never gets from cache, call goes to openai.

Pls refer below:

virtual_env) swathinarayanan@Swathis-MacBook-Air tolka_feedback_sep % /Users/swathinarayanan/virtual_env/bin/python /Users/swathinarayanan/tolka_feedback_sep/testgptcacheAPI.py
Success: {'id': 'chatcmpl-8zkFc8lCl72egNxbKRLHma66rAOmj', 'object': 'chat.completion', 'created': 1709726488, 'model': 'gpt-3.5-turbo-0125', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "The President of India is the head of state and the highest constitutional office in India. The current President of India (as of September 2021) is Ram Nath Kovind. The President's role is largely ceremonial, but they have certain executive powers, such as the power to appoint the Prime Minister and dissolve the Parliament. The President is elected by an Electoral College consisting of the elected members of both houses of Parliament as well as the elected members of the Legislative Assemblies of the States."}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 20, 'completion_tokens': 98, 'total_tokens': 118}, 'system_fingerprint': 'fp_2b778c6b35'}
Time Consumed: 2.81s

(virtual_env) swathinarayanan@Swathis-MacBook-Air tolka_feedback_sep % /Users/swathinarayanan/virtual_env/bin/python /Users/swathinarayanan/tolka_feedback_sep/testgptcacheAPI.py
Success: {'id': 'chatcmpl-8zkILW9ogFbVXMOrycPdyIbgCRMES', 'object': 'chat.completion', 'created': 1709726657, 'model': 'gpt-3.5-turbo-0125', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "The President of India is the head of state and the supreme commander of the Indian Armed Forces. The current President of India is Ram Nath Kovind, who has been in office since July 25, 2017. The President's role is largely ceremonial, representing the nation both domestically and internationally."}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 20, 'completion_tokens': 60, 'total_tokens': 80}, 'system_fingerprint': 'fp_b9d4cef803'}
Time Consumed: 2.40s

(virtual_env) swathinarayanan@Swathis-MacBook-Air tolka_feedback_sep % /Users/swathinarayanan/virtual_env/bin/python /Users/swathinarayanan/tolka_feedback_sep/testgptcacheAPI.py
Success: {'id': 'chatcmpl-8zkIPwwvfYTXqtJgZLVUjryomNwyE', 'object': 'chat.completion', 'created': 1709726661, 'model': 'gpt-3.5-turbo-0125', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'The President of India is the head of state and the supreme commander of the Indian Armed Forces. The current President of India is Ram Nath Kovind, who took office on July 25, 2017. The President is elected by an electoral college consisting of the elected members of both houses of Parliament and the elected members of the Legislative Assemblies of the States. The President serves a term of five years and can be re-elected for a maximum of two terms. The role of the President is largely ceremonial, with executive powers being exercised by the Prime Minister and the Council of Ministers.'}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 20, 'completion_tokens': 117, 'total_tokens': 137}, 'system_fingerprint': 'fp_2b778c6b35'}
Time Consumed: 3.28s

Earlier this was not the case, if my first question is "president of india" and second question , "what is coral reef", i will get the answer from the cache , derived for the first question.

@SimFG
Copy link
Collaborator

SimFG commented Mar 7, 2024

which version of openai you are using?

@swatataidbuddy
Copy link
Author

Version: 0.28.0

@SimFG
Copy link
Collaborator

SimFG commented Mar 7, 2024

@swatataidbuddy
Based on the return value of openai, this does not appear to be openai 0.28, like system_fingerprint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants