New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Token counting and litellm provider customization #1421
Conversation
…rations for max input and output tokens, pulling from litellm when available.
… token-counting # Conflicts: # agenthub/monologue_agent/agent.py # opendevin/config.py
if self.model_info is not None and 'max_output_tokens' in self.model_info: | ||
self.max_output_tokens = self.model_info['max_output_tokens'] | ||
else: | ||
self.max_output_tokens = 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: where does this number come from? I guess 4096 is because it's the limit of GPT 3.5, but how about this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a significant justification for either of these defaults, and I am interested to hear opinions on them. I regularly experienced overruns with a 512 output token limit, and therefore I usually use 1024 or higher locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion either. I just feel like it would be better to have some comments explaining where these numbers are from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added comments documenting this:
# Max input tokens for gpt3.5, so this is a safe fallback for any potentially viable model
self.max_input_tokens = 4096
# Enough tokens for most output actions, and not too many for a bad llm to get carried away responding
# with thousands of unwanted tokens
self.max_output_tokens = 1024
… token-counting # Conflicts: # opendevin/llm/llm.py # opendevin/schema/config.py
… conflict with recent command-r-plus commit.
Co-authored-by: Engel Nyst <[email protected]>
Co-authored-by: Engel Nyst <[email protected]>
… token-counting # Conflicts: # opendevin/llm/llm.py
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1421 +/- ##
=======================================
Coverage ? 60.83%
=======================================
Files ? 88
Lines ? 3738
Branches ? 0
=======================================
Hits ? 2274
Misses ? 1464
Partials ? 0 ☔ View full report in Codecov by Sentry. |
@@ -32,7 +32,7 @@ | |||
if config.get(ConfigType.AGENT_MEMORY_ENABLED): | |||
from agenthub.monologue_agent.utils.memory import LongTermMemory | |||
|
|||
MAX_MONOLOGUE_LENGTH = 20000 | |||
MAX_TOKEN_COUNT_PADDING = 512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this roughly a similar number? 40 chars per token? That seems like a lot to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh nvm--I see how it's being used differently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM!
@enyst looks like we need your 👍 |
@computer-whisperer looks like it just needs a rebase. Feel free to ping me when it's ready! |
… token-counting # Conflicts: # opendevin/core/config.py # opendevin/core/schema/config.py # opendevin/llm/llm.py
@rbren should be ready to go |
This will make the behavior much better, thank you! And sorry for the delay here. |
Schedule monologue compression with token counting rather than character counting, preventing occasional input token count overruns that can happen when lots of non-textual output occur (creating scenarios where the same number of chars can require more tokens).
(This has a minor conflict with #1417, and will be updated after that pr gets merged)