-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[int8 quantization] rules for correct Q/DQ node placement with add & concat operations, unclear documentation #3861
Comments
Could you try remove the Q/DQ between
The
Both of the attached file are screenshot of the onnx file, ont trt graph. The snapshot is fuzzy, I am not sure what those pointwise are. Could you upgrade your TRT to latest 10.0 to see if anything changed?
Because your onnx has pattern |
Tried this, but it is not getting fused.
So does this mean that no Q/DQ node is required as input? Is the Q/DQ node only required for computational operations?
So as you can see, from the pointwise on the left to the last concat/conv there is again a scale introduced, what is the reason for this? how is this fused? I assumed that you need to add a Q/DQ node after the add operation to have a int8 output, but still there is a scale introduced, how can i avoid this? Or is this again because of the the Q/DQ before the split.
Both onnx and engine graphs were added already in the first post, see relevant files. I cannot try TRT10 cuz 8.6 is the latest version supported by the latest jetpack version for jetson ago print. |
Adding @nzmora for vis.
When I open your attachment, both image are the same.
No Q/DQ required for concat/slide since they commutes with Q/DQ.
I am not sure if this is 8.6 only issue. Have you tried run your model on TRT 10.0 with desktop GPU? |
@ttyio, just a small update from my side (with updated onnx and engine graphs included). As you can see from the graph most of (double Q/DQ0 operations) are fixed and or properly placed, which resulted in a reduction in reformat layers, and a model that is in my opinion fully int8. But there are 3 strange things happening in the model which I do not understand:
I need to deploy models on Jetson systems, which only support TRT8.6. How will testing on RT 10.0 help you? Since it will be not possible to deploy the model on a Jetson if I am correct?
|
I want to achieve a full int8 model, with maximum speed optimizations in int8, but documentations is very unclear.
Fusion of nodes: Conv, Sigmoid, Mul, Add?
How should I handle Q/DQ nodes with concat?
How should I handle Q/DQ nodes with split?
Why are the Scale & PointWise operations introduced in my graph?
All these operations introduces some extra latency, is the an option to omit this?
scale_pointwise
scale_pointwise_onnx
Strange behavior first "bottleneck" layer
There are extra reformat layers added in the first bottleneck layer? and why are the operations not fused? As you can see the Q/DQ nodes are correctly placed...
Environment
TensorRT Version:
8.6.3
ONNXVersion:
1.15.0, optset 17
Relevant Files
full trt graph
full onnx graph
@ttyio
The text was updated successfully, but these errors were encountered: