Replies: 1 comment 18 replies
-
To address the issue of the chat model being invoked twice, particularly when dealing with follow-up questions that involve the chain's memory, you can adjust the logic to bypass the initial invocation that condenses the chat history and follow-up question into a standalone question. This adjustment involves directly using the chat history and question for generating the context and synthesizing the answer, thus avoiding the unnecessary first call to the chat model. Here's a suggested approach:
This approach eliminates the redundant invocation of the chat model for condensing purposes, ensuring that the model is only called once to generate the final response based on the provided context and question. Implementing this fix requires careful adjustment of your chain's logic to ensure that the presence of By directly using the chat history and follow-up question without an intermediate condensation step, you should be able to resolve the issue of the chat model being called twice and ensure that the memory operations involving the chain's memory are handled more efficiently.
|
Beta Was this translation helpful? Give feedback.
-
Checked other resources
Commit to Help
Example Code
Description
I am trying to implement the new way of creating a RAG chain with memory, since ConversationalRetrievalChain is deprecated. When invoked, the chain outputs the correct and expected answer.
However, when I ask something that requires some kind of operation involving the chain's memory, I get back the expected answer, but it is returned twice. Somehow the chat model is being called twice and I just can't figure out why.
@dosu said that my model is somehow being invoked twice because of the logic of my implementation. I would like to bypass the chat-history condensation and directly use it for the follow-up question.
System Info
python 3.11
Beta Was this translation helpful? Give feedback.
All reactions