Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token Per Minute (TPM) Limiter #494

Open
bhancockio opened this issue Apr 23, 2024 · 5 comments
Open

Token Per Minute (TPM) Limiter #494

bhancockio opened this issue Apr 23, 2024 · 5 comments

Comments

@bhancockio
Copy link

It would be awesome if crewAI had token per minute property that we could set when defining the crew, so that we don't get rate limited by services such as Groq.

Here's an example rate limit from Groq that I frequently get inside of my crews:

groq.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama3-70b-8192` in organization `org_123` on tokens per minute (TPM): Limit 5000, Used 5747, Requested ~4251. Please try again in 59.977s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}

CrewAI is already tracking how many tokens we are using during the crew's session so hopefully this wouldn't be too large of a lift.

@gadgethome
Copy link

Hi, Did you try using the Max RPM (optional) | Maximum requests per minute the crew adheres to during execution.
https://docs.crewai.com/core-concepts/Crews/#crew-attributes
Also
https://docs.crewai.com/core-concepts/Agents/#what-is-an-agent

@bhancockio
Copy link
Author

Hey Paul!

Good suggestion! I did adjust the crew's RPM to 5 and I was able to get the crew to run. However, things were super slow and the crew still hit rate limit issues.

I think Tokens Per Minute would make for a great addition to the crew because Requests Per Minute is not the same as Tokens Per Minute.

Here's the problem with the RPM current approach:
I have zero control over the token size for each request as a developer.

So, even if I set the RPM of a crew to 10. The token size of those 10 requests could be drastically different.

For example, if each request is 500 tokens, I will use 5K tokens per minute which would put me at the limit for Groq.

However, if each requests is 2K tokens, I will use 20K tokens per minute which would put me way over the Groq limit and cause my crew to crash.

@ranzhang
Copy link

Agreed. For LLMs (eg Groq) that limits TPM, max_rpm does not provide enough control. You can still run into their limits even with a tiny RPM. Something like max_tpm would be a good addition. Developers can choose either one or both depending on the LLMs

@Seneko
Copy link

Seneko commented May 7, 2024

Agreed as well.

@jeeanribeiro
Copy link

I hope that this gets added soon 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants