Token Per Minute (TPM) Limiter #494

bhancockio · 2024-04-23T02:11:00Z

It would be awesome if crewAI had token per minute property that we could set when defining the crew, so that we don't get rate limited by services such as Groq.

Here's an example rate limit from Groq that I frequently get inside of my crews:

groq.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama3-70b-8192` in organization `org_123` on tokens per minute (TPM): Limit 5000, Used 5747, Requested ~4251. Please try again in 59.977s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}

CrewAI is already tracking how many tokens we are using during the crew's session so hopefully this wouldn't be too large of a lift.

The text was updated successfully, but these errors were encountered:

gadgethome · 2024-04-23T10:25:32Z

Hi, Did you try using the Max RPM (optional) | Maximum requests per minute the crew adheres to during execution.
https://docs.crewai.com/core-concepts/Crews/#crew-attributes
Also
https://docs.crewai.com/core-concepts/Agents/#what-is-an-agent

bhancockio · 2024-04-23T13:21:25Z

Hey Paul!

Good suggestion! I did adjust the crew's RPM to 5 and I was able to get the crew to run. However, things were super slow and the crew still hit rate limit issues.

I think Tokens Per Minute would make for a great addition to the crew because Requests Per Minute is not the same as Tokens Per Minute.

Here's the problem with the RPM current approach:
I have zero control over the token size for each request as a developer.

So, even if I set the RPM of a crew to 10. The token size of those 10 requests could be drastically different.

For example, if each request is 500 tokens, I will use 5K tokens per minute which would put me at the limit for Groq.

However, if each requests is 2K tokens, I will use 20K tokens per minute which would put me way over the Groq limit and cause my crew to crash.

ranzhang · 2024-04-23T19:25:32Z

Agreed. For LLMs (eg Groq) that limits TPM, max_rpm does not provide enough control. You can still run into their limits even with a tiny RPM. Something like max_tpm would be a good addition. Developers can choose either one or both depending on the LLMs

Seneko · 2024-05-07T14:22:50Z

Agreed as well.

jeeanribeiro · 2024-05-07T23:41:26Z

I hope that this gets added soon 🤞

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token Per Minute (TPM) Limiter #494

Token Per Minute (TPM) Limiter #494

bhancockio commented Apr 23, 2024

gadgethome commented Apr 23, 2024

bhancockio commented Apr 23, 2024

ranzhang commented Apr 23, 2024

Seneko commented May 7, 2024

jeeanribeiro commented May 7, 2024

Token Per Minute (TPM) Limiter #494

Token Per Minute (TPM) Limiter #494

Comments

bhancockio commented Apr 23, 2024

gadgethome commented Apr 23, 2024

bhancockio commented Apr 23, 2024

ranzhang commented Apr 23, 2024

Seneko commented May 7, 2024

jeeanribeiro commented May 7, 2024