GPU Isolation and flexible deployment strategies [FEA] #243

vikashg · 2022-01-21T20:50:47Z

Is your feature request related to a problem? Please describe.
If we consider a few scenarios where we need

to deploy multiple models for a single application.
deploy multiple models on the same machine with different GPU architectures.
lockin resources for deployment so I can do training with the remaining resources.

In all these examples, we want to assign a GPU to a model and do not want the inference service to take up the entire system. If we can isolate the GPU and pin it to a particular deployment, it will be really useful. In addition, this will also future proof our deployments. Imagine a scenario where we get new GPUs with new architectures. Maybe the deployment and the model and pytorch versions do not work with the new architecture. In such a case, we can add more GPUs without disturbing the deployments.

Describe alternatives you've considered
@slbryson has tried GPU isolation using clara CLI tools.

Additional context

vikashg · 2022-01-21T21:16:20Z

This also ties in loosly to what @MMelQin was mentioning about trying to have multiple models deployed in a MAP

MMelQin · 2022-01-22T00:00:54Z

This is definitely a good request for a much needed capability, though more for a deployment platform, e.g. Clara inference operators/applications uses remote Triton Inference Service which supports model to GPU affinity, number of instances per model etc, so, Triton configuration can be used for distributing model instance(s) to GPU.

App SDK does have an issue for utilizing remote Triton inference service, #212

As for multi-model support, #244, when all the inference operators use in-proc inference, it is possible to

link the operators in the app (application.add_flow()) in such a way that only one inference operator can run at any given time, such that GPU is not overloaded.
Potentially enhance the model loading logic in the App SDK base Application to make use of specific GPU if so configured, but this becomes moot if remote Triton is used.

vikashg added the enhancement New feature or request label Jan 21, 2022

vikashg assigned vikashg, dbericat, slbryson and GreySeaWolf Jan 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Isolation and flexible deployment strategies [FEA] #243

GPU Isolation and flexible deployment strategies [FEA] #243

vikashg commented Jan 21, 2022

vikashg commented Jan 21, 2022

MMelQin commented Jan 22, 2022 •

edited

GPU Isolation and flexible deployment strategies [FEA] #243

GPU Isolation and flexible deployment strategies [FEA] #243

Comments

vikashg commented Jan 21, 2022

vikashg commented Jan 21, 2022

MMelQin commented Jan 22, 2022 • edited

MMelQin commented Jan 22, 2022 •

edited