Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token counting and litellm provider customization #1421

Merged
merged 17 commits into from May 5, 2024

Conversation

computer-whisperer
Copy link
Contributor

Schedule monologue compression with token counting rather than character counting, preventing occasional input token count overruns that can happen when lots of non-textual output occur (creating scenarios where the same number of chars can require more tokens).

(This has a minor conflict with #1417, and will be updated after that pr gets merged)

…rations for max input and output tokens, pulling from litellm when available.
… token-counting

# Conflicts:
#	agenthub/monologue_agent/agent.py
#	opendevin/config.py
if self.model_info is not None and 'max_output_tokens' in self.model_info:
self.max_output_tokens = self.model_info['max_output_tokens']
else:
self.max_output_tokens = 1024
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: where does this number come from? I guess 4096 is because it's the limit of GPT 3.5, but how about this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a significant justification for either of these defaults, and I am interested to hear opinions on them. I regularly experienced overruns with a 512 output token limit, and therefore I usually use 1024 or higher locally.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion either. I just feel like it would be better to have some comments explaining where these numbers are from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added comments documenting this:

# Max input tokens for gpt3.5, so this is a safe fallback for any potentially viable model
self.max_input_tokens = 4096

# Enough tokens for most output actions, and not too many for a bad llm to get carried away responding
# with thousands of unwanted tokens
self.max_output_tokens = 1024

opendevin/config.py Outdated Show resolved Hide resolved
@computer-whisperer computer-whisperer changed the title Token counting Token counting and litellm provider customization Apr 28, 2024
opendevin/config.py Outdated Show resolved Hide resolved
opendevin/llm/llm.py Outdated Show resolved Hide resolved
opendevin/llm/llm.py Outdated Show resolved Hide resolved
computer-whisperer and others added 3 commits April 29, 2024 21:40
… token-counting

# Conflicts:
#	opendevin/llm/llm.py
@codecov-commenter
Copy link

codecov-commenter commented Apr 30, 2024

Codecov Report

Attention: Patch coverage is 86.66667% with 4 lines in your changes are missing coverage. Please review.

❗ No coverage uploaded for pull request base (main@4e84aac). Click here to learn what that means.

Files Patch % Lines
opendevin/llm/llm.py 81.81% 4 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1421   +/-   ##
=======================================
  Coverage        ?   60.83%           
=======================================
  Files           ?       88           
  Lines           ?     3738           
  Branches        ?        0           
=======================================
  Hits            ?     2274           
  Misses          ?     1464           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -32,7 +32,7 @@
if config.get(ConfigType.AGENT_MEMORY_ENABLED):
from agenthub.monologue_agent.utils.memory import LongTermMemory

MAX_MONOLOGUE_LENGTH = 20000
MAX_TOKEN_COUNT_PADDING = 512
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this roughly a similar number? 40 chars per token? That seems like a lot to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh nvm--I see how it's being used differently

Copy link
Collaborator

@rbren rbren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM!

@rbren
Copy link
Collaborator

rbren commented May 2, 2024

@enyst looks like we need your 👍

@rbren
Copy link
Collaborator

rbren commented May 3, 2024

@computer-whisperer looks like it just needs a rebase. Feel free to ping me when it's ready!

… token-counting

# Conflicts:
#	opendevin/core/config.py
#	opendevin/core/schema/config.py
#	opendevin/llm/llm.py
@computer-whisperer
Copy link
Contributor Author

@rbren should be ready to go

@enyst
Copy link
Collaborator

enyst commented May 5, 2024

This will make the behavior much better, thank you! And sorry for the delay here.

@enyst enyst merged commit 27e13fa into OpenDevin:main May 5, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants