-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about input and output shape in model configuration when batch size is 1 #7227
Comments
@tanmayv25 can you help here? |
Both the configurations are identical. The client in both the cases have to provide input with shape [1,3,-1,-1], where -1 can be any positive integer number. And the received output will be of shape [1,3,-1,-1]. They have an identical impact as in Triton core will forward requests with input shape [1,3,-1,-1] to the backend and receive output of shape [1,3,-1,-1] from the backend.
Point of view will be identical. The only difference would be when dynamic_batching field is enabled. When the field is set with max_batch_size = 1, the request would go into an additional queue, and would be picked as soon as there is an available instance for execution. When dynamic batching is disabled, there is no difference even in the request control flow.
They will be completely identical from backend's perspective. Even more so max_batch_size value is not even propagated to the backend during inference execution. However, the backend's during auto-completion of model config, may enable dynamic_batching setting which can introduce an extra queue transaction in the control flow. |
@tanmayv25 Got it, thank you very much for your detailed explanation and clarification! I really appreciate it! |
Hey all, I have a question regarding the input and output shape configuration in the model configuration file. Basically I have a model that takes in images in the NCHW layout (more specifically C=3 and H and W can have variable-size positive integer values), and the model also outputs tensors in the NCHW layout (more specifically C=3, and H and W can have variable-size positive integer values). Due to the relatively large size of this model and also due to the limited memory on my GPU, I want to set a batch size of 1 for both the input and output tensor.
Based on my understanding of the following paragraph in the documentation https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_configuration.html:
my questions are:
1. Are the following 2 configurations for the model input and output shapes equivalent, and do they have identical effects for specifying the input and output shape for the model?
2. From the inference server's point of view, are these two configurations treated in any different ways or are they indistinguishable from the inference server's point of view?
3. If the following 2 configurations are equivalent, are they considered to be equivalent by all the backends supported by the Triton Inference Server (e.g. onnxruntime backend, python backend, etc.)?
Configuration 1:
Configuration 2:
Thank you very much for your time and help in advance!
The text was updated successfully, but these errors were encountered: