Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A wrong answer from Cache record #388

Closed
SimFG opened this issue May 25, 2023 Discussed in #385 · 3 comments
Closed

A wrong answer from Cache record #388

SimFG opened this issue May 25, 2023 Discussed in #385 · 3 comments

Comments

@SimFG
Copy link
Collaborator

SimFG commented May 25, 2023

Discussed in #385

Originally posted by terryweijian May 25, 2023
Hi

My script contains the follow questions, and it will run several loop within one session.
1 'what is TV ?',
2 'can you explain what function of TV is ?',
3 'can you tell me more about TV ?',
4 'what is the function of money ?',

question 1~3 are all about TV, so cache answer are linked to the same answer of the first question in the first loop, but from second loop the second question will linked with the forth question of " what is the function of money". I guess there are both the key prompt word of function. Is there parameters can control the weight of vector calculation for different key words?

The first loop:
Question: what is TV ?
local answer: TV is a television channel that broadcasts live television.
Local Time Spent = 0.2
Cache answer: TV is a television channel that broadcasts live television.
Cache Hit Time Spent = 0.39


Question: can you explain what function of TV is ?
local answer: a television channel
Local Time Spent = 0.09
Cache answer: TV is a television channel that broadcasts live television. # the answer is reasonable
Cache Hit Time Spent = 0.04


Question: can you tell me more about TV ?
local answer: a tv show
Local Time Spent = 0.11
Cache answer: TV is a television channel that broadcasts live television.
Cache Hit Time Spent = 0.03


Question: what is the function of money ?
local answer: money is a currency
Local Time Spent = 0.1
Cache answer: money is a currency
Cache Hit Time Spent = 0.11


Second Loop

Question: what is TV ?
local answer: TV is a television channel that broadcasts live television.
Local Time Spent = 0.17
Cache answer: TV is a television channel that broadcasts live television.
Cache Hit Time Spent = 0.03


Question: can you explain what function of TV is ?
local answer: a television channel
Local Time Spent = 0.1
Cache answer: money is a currency # The cache answer is incorrect as it links to question 4 just because there are both have the key word of "function"
Cache Hit Time Spent = 0.04


Question: can you tell me more about TV ?
local answer: a tv show
Local Time Spent = 0.12
Cache answer: TV is a television channel that broadcasts live television.
Cache Hit Time Spent = 0.03


Question: what is the function of money ?
local answer: money is a currency
Local Time Spent = 0.11
Cache answer: money is a currency
Cache Hit Time Spent = 0.03


@SimFG
Copy link
Collaborator Author

SimFG commented May 25, 2023

answer
there is no idea to control the weight of vector calculation for different key words. You can choose to skip cache searching when you think the cached answer doesn't meet the requirements, but save the llm result to the cache this time. The next time you ask the same question, you will be able to get an accurate answer.

cache_skip param usage

openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "what's github"}],
    cache_skip=True,
)

@ht0rohit
Copy link

Hi @SimFG, can I only use the cache_skip parameter inside openai create functions or will it work with Langchain agents to? I would like to do something as below:

pipe = pipeline(
        CONF['MODEL']['pipeline'], model=model, tokenizer=tokenizer, max_length=CONF['MODEL']['max_length'],
        temperature=temperature, top_p=CONF['MODEL']['top_p'], num_beams=CONF['MODEL']['num_beams'],
        early_stopping=le(CONF['MODEL']['early_stopping'])
    )
llm = HuggingFacePipeline(pipeline=pipe)
cached_llm = LangChainLLMs(llm=llm)

llm_cache = Cache()
llm_cache.init(
    pre_embedding_func=get_content_func,
    embedding_func=cache_huggingface.to_embeddings,
    data_manager=data_manager,
    similarity_evaluation=SearchDistanceEvaluation(max_distance=
    CONF['GPTCACHE']['SDE_max_distance'], positive=False)
)

response = cached_llm(question, cache_obj=llm_cache, cache_skip=True)

@SimFG
Copy link
Collaborator Author

SimFG commented Jul 10, 2023

@ht0rohit yes, it can work. Are you having any problems?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants