Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use pytoch library with libtorch backend when using triton inference server In-Process python API #7222

Open
sivanantha321 opened this issue May 15, 2024 · 9 comments
Assignees
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@sivanantha321
Copy link

Description
A clear and concise description of what the bug is.
I am trying to use the newly introduced triton inference server In-Process python API to serve pytorch models using the libtorch backend. I am using pytorch and torchvision libraries to do some pre and post processing of the input data before sending it to the triton server for prediction. But when I try to use pytorch or torchvision i am getting the follwing error.

failed to load 'cifar10' version 1: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Triton Server logs:

I0515 09:22:40.092038 265 cache_manager.cc:480] Create CacheManager with cache_dir: '/opt/tritonserver/caches'
W0515 09:22:40.092110 265 pinned_memory_manager.cc:271] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0515 09:22:40.092129 265 cuda_memory_manager.cc:117] CUDA memory pool disabled
E0515 09:22:40.092267 265 server.cc:243] CudaDriverHelper has not been initialized.
I0515 09:22:40.093620 265 model_config_utils.cc:680] Server side auto-completed config: name: "cifar10"
platform: "pytorch_libtorch"
max_batch_size: 1
input {
  name: "INPUT__0"
  data_type: TYPE_FP32
  dims: 3
  dims: 32
  dims: 32
}
output {
  name: "OUTPUT__0"
  data_type: TYPE_FP32
  dims: 10
}
default_model_filename: "model.pt"
backend: "pytorch"

I0515 09:22:40.093699 265 model_lifecycle.cc:469] loading: cifar10:1
I0515 09:22:40.093820 265 backend_model.cc:502] Adding default backend config setting: default-max-batch-size,4
I0515 09:22:40.093847 265 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/pytorch/libtriton_pytorch.so
I0515 09:22:40.098713 265 backend_manager.cc:138] unloading backend 'pytorch'
E0515 09:22:40.098758 265 model_lifecycle.cc:638] failed to load 'cifar10' version 1: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
I0515 09:22:40.098775 265 model_lifecycle.cc:773] failed to load 'cifar10'
I0515 09:22:40.098860 265 server.cc:607] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0515 09:22:40.098880 265 server.cc:634] 
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0515 09:22:40.098907 265 server.cc:677] 
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model   | Version | Status                                                                                                                                                                 |
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| cifar10 | 1       | UNAVAILABLE: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKc |
|         |         | S2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE                                                                                                             |
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0515 09:22:40.099027 265 metrics.cc:770] Collecting CPU metrics
I0515 09:22:40.099151 265 tritonserver.cc:2538] 
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                  |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                 |
| server_version                   | 2.45.0                                                                                                                                                 |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memo |
|                                  | ry binary_tensor_data parameters statistics trace logging                                                                                              |
| model_repository_path[0]         | models_dir                                                                                                                                             |
| model_control_mode               | MODE_NONE                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                      |
| rate_limit                       | OFF                                                                                                                                                    |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                              |
| min_supported_compute_capability | 6.0                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                     |
| cache_enabled                    | 0                                                                                                                                                      |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+

I0515 09:22:40.099172 265 server.cc:307] Waiting for in-flight requests to complete.
I0515 09:22:40.099176 265 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
I0515 09:22:40.099204 265 server.cc:338] All models are stopped, unloading models
I0515 09:22:40.099210 265 server.cc:347] Timeout 30: Found 0 live models and 0 in-flight non-inference requests

Triton Information
What version of Triton are you using?

$ pip show tritonserver

Name: tritonserver
Version: 2.45.0
Summary: Triton Inference Server In-Process Python API
Home-page: https://developer.nvidia.com/nvidia-triton-inference-server
Author: NVIDIA Inc.
Author-email: [email protected]
License: BSD
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy
Required-by: 
$ pip show torch
Name: torch
Version: 2.3.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: torchvision
$ pip show torchvision
Name: torchvision
Version: 0.18.0
Summary: image and video datasets and models for torch deep learning
Home-page: https://github.com/pytorch/vision
Author: PyTorch Core Team
Author-email: [email protected]
License: BSD
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy, pillow, torch
Required-by: 

Are you using the Triton container or did you build it yourself?
I am using nvcr.io/nvidia/tritonserver:24.04-py3 container to serve the model using in-process python API.

To Reproduce
Steps to reproduce the behavior.
A simple script to reproduce the error.

import time
import tritonserver
from torchvision import transforms  # importing this leads to errors
import torch  # importing this leads to errors


def start():
    server = tritonserver.Server(model_repository="python/models",
                                 log_error=True,
                                 log_info=True,
                                 log_verbose=True,
                                 )
    print("tritonserver version : ", tritonserver.__version__)
    server.start()
    print("server started")
    model = server.model("cifar10")


if __name__ == "__main__":
    start()

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

name: "cifar10"
platform: "pytorch_libtorch"
max_batch_size: 1
input [
  {
    name: "INPUT__0"
    data_type: TYPE_FP32
    dims: [3,32,32]
  }
]
output [
  {
    name: "OUTPUT__0"
    data_type: TYPE_FP32
    dims: [10]
  }
]

Expected behavior
A clear and concise description of what you expected to happen.
Pytorch and torchvision should work with tritonserver in-process python API

@sivanantha321
Copy link
Author

/CC @yuzisun

@nnshah1 nnshah1 self-assigned this May 15, 2024
@nnshah1 nnshah1 added help wanted Extra attention is needed question Further information is requested labels May 15, 2024
@nnshah1
Copy link
Contributor

nnshah1 commented May 18, 2024

@sivanantha321 - is it possible to provide the .pt file / instructions on recreating it?

@nnshah1
Copy link
Contributor

nnshah1 commented May 18, 2024

nvcr.io/nvidia/tritonserver:24.04-py3

never mind - I was able to find a model to reproduce it locally.

I believe the issue is that the latest public pytorch version as installed via pip conflicts with the torch libraries used for the libtorch backend -

will experiment with some potential workarounds

@sivanantha321
Copy link
Author

nvcr.io/nvidia/tritonserver:24.04-py3

never mind - I was able to find a model to reproduce it locally.

I believe the issue is that the latest public pytorch version as installed via pip conflicts with the torch libraries used for the libtorch backend -

will experiment with some potential workarounds

Thanks for looking into this.

@nnshah1
Copy link
Contributor

nnshah1 commented May 22, 2024

I made some progress in using the NGC pytorch image as base and then copying in tritonserver binaries into that:

https://github.com/triton-inference-server/tutorials/blob/nnshah1-meetup-04-2024/Triton_Inference_Server_Python_API/docker/Dockerfile.pytorch

However - when doing that with pre-built libraries I still ran into an issue with torchvision as the shared library was imported twice and that caused conflicts (I think that is a fundamental issue with the libtorchvision.so).

I then rebuilt the triton pytorch backend without torch vision support (seen in Dockerfile above).

However - I haven't been able to confirm with a use case - I was testing out a resnet50 model but didn't get to the stage where the results looked correct to me.

I'm giving this as an update here - in case you have time to try / test on your end

@nnshah1
Copy link
Contributor

nnshah1 commented May 30, 2024

@sivanantha321 - were you able to try the work around?

@sivanantha321
Copy link
Author

@nnshah1 Thanks for the big help! Yes, I tried the workaround and it worked successfully. There is one more thing I like to know., Is there a way to use custom pytorch version other than what's comes with the NGC pytorch image ?

@nnshah1
Copy link
Contributor

nnshah1 commented Jun 1, 2024

@sivanantha321 - I believe you would just need to rebuild the pytorch backend with the custom version of pytorch you want to use:

https://github.com/triton-inference-server/pytorch_backend?tab=readme-ov-file#build-the-pytorch-backend-with-custom-pytorch

@nnshah1
Copy link
Contributor

nnshah1 commented Jun 1, 2024

@Tabrizian , @rmccorm4 , @tanmayv25 for visability.

In this work around I searched and replaced the backend pytorch libraries with symlinks to the system ones.

that may be a simple recipe for enabling installing pytorch and pytorch backend in the same container w/o doubling the libraries - but needs further review and testing,.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Development

No branches or pull requests

2 participants