New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Token Per Minute (TPM) Limiter #494
Comments
Hi, Did you try using the Max RPM (optional) | Maximum requests per minute the crew adheres to during execution. |
Hey Paul! Good suggestion! I did adjust the crew's RPM to 5 and I was able to get the crew to run. However, things were super slow and the crew still hit rate limit issues. I think Tokens Per Minute would make for a great addition to the crew because Requests Per Minute is not the same as Tokens Per Minute. Here's the problem with the RPM current approach: So, even if I set the RPM of a crew to 10. The token size of those 10 requests could be drastically different. For example, if each request is 500 tokens, I will use 5K tokens per minute which would put me at the limit for Groq. However, if each requests is 2K tokens, I will use 20K tokens per minute which would put me way over the Groq limit and cause my crew to crash. |
Agreed. For LLMs (eg Groq) that limits TPM, max_rpm does not provide enough control. You can still run into their limits even with a tiny RPM. Something like max_tpm would be a good addition. Developers can choose either one or both depending on the LLMs |
Agreed as well. |
I hope that this gets added soon 🤞 |
It would be awesome if crewAI had token per minute property that we could set when defining the crew, so that we don't get rate limited by services such as Groq.
Here's an example rate limit from Groq that I frequently get inside of my crews:
CrewAI is already tracking how many tokens we are using during the crew's session so hopefully this wouldn't be too large of a lift.
The text was updated successfully, but these errors were encountered: