Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMP] Setting up MONAI Model Server #458

Open
vikashg opened this issue Oct 4, 2023 · 3 comments
Open

[IMP] Setting up MONAI Model Server #458

vikashg opened this issue Oct 4, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@vikashg
Copy link
Collaborator

vikashg commented Oct 4, 2023

Is your enhancement request related to a problem? Please describe.
Over the last six months, we have been building towards packaging the MONAI model in different flavors. The focus of these efforts is towards model sharing and

  1. Packaged as a MONAI Bundle and shared as a model zoo.
  2. We can now make Pythonic calls to a monai bundle for further fine-tuning or inferencing.
    The MONAI bundles can also be used for creating MONAI Application Packages (MAPs), which are their own entities directed for inferencing. However, it should be pointed out that MONAI Application Packages do not need to run a MONAI Bundle. MAPs do support DICOM data (dicom I/O) loaders which is not the case for MONAI Bundles (at least natively).

The core focus for MONAI 1.3 release has been giving access to MONAI bundles through python function calls.

We can now standardize the model training and inferencing using the MONAI bundle

In essence, there are various flavors for models to be shared and re-used. However, one key use case that we are missing is that we do not have a mechanism to set up a model server, where we can have model-store (a directory) where models (monai bundles) are dropped and inference can be made by making a call to the model store using a model name.

Describe the solution you'd like
An ideal solution would be to set up a model-server where models can be dropped and inference can be made by calling the model using the model_name

Describe alternatives you've considered
An alternative would be to use torchserve and the .mar format (model archive). Pytorch provides a CLI torch-model-archive to generate a .mar file. These .mar files contain a torchscript model along with python handlers for data pre and post processing. The models can be invoked using a RestAPI (curl) in conjunction with the model name, and the IP address of the host computer.

However, torchserve only supports Rest API (AFAIK) which might be good for small images but maybe not a great idea for large dicom images and files. As the model inference will fail if the connection drops in between.
An alternative to the Rest API might be tus-py https://github.com/tus/tus-py-client.

We can try to write a thin wrapper on top of our MONAI Bundle so that we can utilize the torchserve model server or repackage our monai bundles as .mar file so that they can be used in torchserve. For medical imaging applications, where the dataset size is quite large even for a single patient, we can set up a data store with unique identifiers (UID) for each patient. Instead of sending the data using a restful call, we can pass the unique identifier and the model server can pull the appropriate data from the datastore and complete the inference.

Additional context
I think continuing with the idea that there is a theme with each release. For 1.3 the main theme was pythonic support for MONAI Bundles. Maybe for 1.4 the main theme would be a model server for monai bundles.

Unclear to me
One of the aspects of torchserve that is unclear to me is if the models in model server are loaded in the GPUs (i.e., if the models are hot) and ready to serve. If we can achieve this it will be really great.

@jordimassaguerpla
Copy link

Given you can package models as containers (MAP), would using a container registry fit your needs?

@MMelQin
Copy link
Collaborator

MMelQin commented Nov 28, 2023

Given you can package models as containers (MAP), would using a container registry fit your needs?

Absolutely, though it is up to the adopter of the SDK and creator of MAPs to pick a (secure) container registry. Some of the examples used in this project use GitHub Container Registry.

@MMelQin
Copy link
Collaborator

MMelQin commented Apr 15, 2024

Triton Inference Server has been support PyTorch model for the longest time, supporting tensor input for simple inference (i.e. the client needs to chunk the input if larger than model size), as well as supporting Python backend passing whole encoded input to the backend app for pre, infere, and post processing, where the app can be built with this App SDK.

Nvidia has also announced NIM, making inference as a microservice with Triton being the default inference engine. NIM supports container workload, in the sense that the service implementation is realized with the container, such as a MAP, and a Helm chart if necessary. We'll be working on examples and recipes on how to make a MAP a NIM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

10 participants