Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PX (P90) for inference Cold start #127

Open
tshrjn opened this issue Oct 29, 2023 · 1 comment
Open

PX (P90) for inference Cold start #127

tshrjn opened this issue Oct 29, 2023 · 1 comment

Comments

@tshrjn
Copy link

tshrjn commented Oct 29, 2023

Describe the bug
Please provide a clear and concise expectation of how cold start looks like.
I see the docs mentions couple of methods ot speed up the load time for models, it would be great if objective numbers could be added. Ray also provides methods to combat cold start, and I see the library is being utilized, but do you use such methods?

For example if you look the img below from this article, most providers of the cold starts are below 100s. (see img) & most providers list either P90/P70/P50 values to help understand the cold start problem & solutions in those terms.

Other relevant stuff:
https://news.ycombinator.com/item?id=35738072
https://www.banana.dev/blog/turboboot

@dongreenberg
Copy link
Contributor

Hi Tusher, this is a great suggestion. There are really three cold start concepts with Runhouse: The cold start for a new service sent to an existing cluster, the cold start for creating a cluster on existing infra, and the cold start if the infra needs to come up. I think we'd like to show them each broken out for many infra types. So for example, we'd show cold start for sending a function to a fresh EC2 instance, and then cold start for sending a function to an existing EC2 instance (which would include starting Ray and the Runhouse HTTP server), and then the cold start for sending a function to an existing cluster on that EC2 instance. The point is that Runhouse gives you quite a bit of control of how you want to structure your deployments, so each of these are important. Does that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants