limit / decrease tokens fed to model for each query #1427

JonasDoesThings · 2024-05-07T12:01:44Z

Hi, currently evaluating danswer. Is there currently a way to limit the input-tokens fed into the LLM to decrease cost per query?

I already searched GitHub and saw mentions of an env var called NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL, but apparently that got removed? I couldn't find it in danswer's code anymore.

I tried to tweak-down GEN_AI_MAX_TOKENS but that lead to errors.

So is there currently some way to decrease the tokens spent on a query by reducing the amount of context being fed into the model? Maybe some limit I can tweak-down regarding the selection of relevant chunks?

Thanks in advance and greetings from Austria :)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limit / decrease tokens fed to model for each query #1427

limit / decrease tokens fed to model for each query #1427

JonasDoesThings commented May 7, 2024

limit / decrease tokens fed to model for each query #1427

limit / decrease tokens fed to model for each query #1427

Comments

JonasDoesThings commented May 7, 2024