Questions about input and output shape in model configuration when batch size is 1 #7227

jackylu0124 · 2024-05-16T03:02:03Z

Hey all, I have a question regarding the input and output shape configuration in the model configuration file. Basically I have a model that takes in images in the NCHW layout (more specifically C=3 and H and W can have variable-size positive integer values), and the model also outputs tensors in the NCHW layout (more specifically C=3, and H and W can have variable-size positive integer values). Due to the relatively large size of this model and also due to the limited memory on my GPU, I want to set a batch size of 1 for both the input and output tensor.

Based on my understanding of the following paragraph in the documentation https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_configuration.html:

Input and output shapes are specified by a combination of max_batch_size and the dimensions specified by the input or output dims property. For models with max_batch_size greater-than 0, the full shape is formed as [ -1 ] + dims. For models with max_batch_size equal to 0, the full shape is formed as dims.

my questions are:
1. Are the following 2 configurations for the model input and output shapes equivalent, and do they have identical effects for specifying the input and output shape for the model?
2. From the inference server's point of view, are these two configurations treated in any different ways or are they indistinguishable from the inference server's point of view?
3. If the following 2 configurations are equivalent, are they considered to be equivalent by all the backends supported by the Triton Inference Server (e.g. onnxruntime backend, python backend, etc.)?

Configuration 1:

max_batch_size: 1

input [
    {
        name: "input"
        data_type: TYPE_FP32
        dims: [3, -1, -1]
    }
]

output [
    {
        name: "output"
        data_type: TYPE_FP32
        dims: [3, -1, -1]
    }
]

Configuration 2:

max_batch_size: 0

input [
    {
        name: "input"
        data_type: TYPE_FP32
        dims: [1, 3, -1, -1]
    }
]

output [
    {
        name: "output"
        data_type: TYPE_FP32
        dims: [1, 3, -1, -1]
    }
]

Thank you very much for your time and help in advance!

The text was updated successfully, but these errors were encountered:

statiraju · 2024-05-16T22:19:07Z

@tanmayv25 can you help here?

tanmayv25 · 2024-05-20T19:34:30Z

Are the following 2 configurations for the model input and output shapes equivalent, and do they have identical effects for specifying the input and output shape for the model?

Both the configurations are identical. The client in both the cases have to provide input with shape [1,3,-1,-1], where -1 can be any positive integer number. And the received output will be of shape [1,3,-1,-1].

They have an identical impact as in Triton core will forward requests with input shape [1,3,-1,-1] to the backend and receive output of shape [1,3,-1,-1] from the backend.

From the inference server's point of view, are these two configurations treated in any different ways or are they indistinguishable from the inference server's point of view?

Point of view will be identical. The only difference would be when dynamic_batching field is enabled. When the field is set with max_batch_size = 1, the request would go into an additional queue, and would be picked as soon as there is an available instance for execution. When dynamic batching is disabled, there is no difference even in the request control flow.

If the following 2 configurations are equivalent, are they considered to be equivalent by all the backends supported by the Triton Inference Server (e.g. onnxruntime backend, python backend, etc.)?

They will be completely identical from backend's perspective. Even more so max_batch_size value is not even propagated to the backend during inference execution. However, the backend's during auto-completion of model config, may enable dynamic_batching setting which can introduce an extra queue transaction in the control flow.
None of the standard backends in my knowledge do that for max_batch_size=1 (only when max_batch_size > 1). The tensorflow behavior can be found here: https://github.com/triton-inference-server/tensorflow_backend?tab=readme-ov-file#dynamic-batching

jackylu0124 · 2024-05-22T14:26:16Z

@tanmayv25 Got it, thank you very much for your detailed explanation and clarification! I really appreciate it!

statiraju assigned tanmayv25 May 16, 2024

statiraju added the investigating The developement team is investigating this issue label May 16, 2024

tanmayv25 added question Further information is requested and removed investigating The developement team is investigating this issue labels May 20, 2024

tanmayv25 closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about input and output shape in model configuration when batch size is 1 #7227

Questions about input and output shape in model configuration when batch size is 1 #7227

jackylu0124 commented May 16, 2024

statiraju commented May 16, 2024

tanmayv25 commented May 20, 2024

jackylu0124 commented May 22, 2024

Questions about input and output shape in model configuration when batch size is 1 #7227

Questions about input and output shape in model configuration when batch size is 1 #7227

Comments

jackylu0124 commented May 16, 2024

statiraju commented May 16, 2024

tanmayv25 commented May 20, 2024

jackylu0124 commented May 22, 2024