PX (P90) for inference Cold start #127

tshrjn · 2023-10-29T21:42:56Z

Describe the bug
Please provide a clear and concise expectation of how cold start looks like.
I see the docs mentions couple of methods ot speed up the load time for models, it would be great if objective numbers could be added. Ray also provides methods to combat cold start, and I see the library is being utilized, but do you use such methods?

For example if you look the img below from this article, most providers of the cold starts are below 100s. (see img) & most providers list either P90/P70/P50 values to help understand the cold start problem & solutions in those terms.

Other relevant stuff:
https://news.ycombinator.com/item?id=35738072
https://www.banana.dev/blog/turboboot

dongreenberg · 2023-11-02T14:28:03Z

Hi Tusher, this is a great suggestion. There are really three cold start concepts with Runhouse: The cold start for a new service sent to an existing cluster, the cold start for creating a cluster on existing infra, and the cold start if the infra needs to come up. I think we'd like to show them each broken out for many infra types. So for example, we'd show cold start for sending a function to a fresh EC2 instance, and then cold start for sending a function to an existing EC2 instance (which would include starting Ray and the Runhouse HTTP server), and then the cold start for sending a function to an existing cluster on that EC2 instance. The point is that Runhouse gives you quite a bit of control of how you want to structure your deployments, so each of these are important. Does that make sense?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PX (P90) for inference Cold start #127

PX (P90) for inference Cold start #127

tshrjn commented Oct 29, 2023

dongreenberg commented Nov 2, 2023

PX (P90) for inference Cold start #127

PX (P90) for inference Cold start #127

Comments

tshrjn commented Oct 29, 2023

dongreenberg commented Nov 2, 2023