You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, currently evaluating danswer. Is there currently a way to limit the input-tokens fed into the LLM to decrease cost per query?
I already searched GitHub and saw mentions of an env var called NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL, but apparently that got removed? I couldn't find it in danswer's code anymore.
I tried to tweak-down GEN_AI_MAX_TOKENS but that lead to errors.
So is there currently some way to decrease the tokens spent on a query by reducing the amount of context being fed into the model? Maybe some limit I can tweak-down regarding the selection of relevant chunks?
Thanks in advance and greetings from Austria :)
The text was updated successfully, but these errors were encountered:
Hi, currently evaluating danswer. Is there currently a way to limit the input-tokens fed into the LLM to decrease cost per query?
I already searched GitHub and saw mentions of an env var called
NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL
, but apparently that got removed? I couldn't find it in danswer's code anymore.I tried to tweak-down
GEN_AI_MAX_TOKENS
but that lead to errors.So is there currently some way to decrease the tokens spent on a query by reducing the amount of context being fed into the model? Maybe some limit I can tweak-down regarding the selection of relevant chunks?
Thanks in advance and greetings from Austria :)
The text was updated successfully, but these errors were encountered: