Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about output type of QuantizeLinear #3844

Open
cc-sketch opened this issue May 6, 2024 · 6 comments
Open

Question about output type of QuantizeLinear #3844

cc-sketch opened this issue May 6, 2024 · 6 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@cc-sketch
Copy link

cc-sketch commented May 6, 2024

hello, why the log showed out type of QuantizeLinear is float
20240506-211227

@cc-sketch
Copy link
Author

and, when i serialize manually fused qdq onnx when only int8 in-out type supported in plugin, like
bool supportsFormatCombination(int32_t pos, const PluginTensorDesc* inOut, int32_t nbInputs, int32_t nbOutputs) NOEXCEPT override { switch (pos) { case 0: return ((inOut[0].type == DataType::kINT8) && (inOut[0].format == TensorFormat::kCHW4 || inOut[0].format == TensorFormat::kCHW32)); case 1: return (inOut[pos].type == inOut[0].type) && (inOut[pos].format == inOut[0].format); default: return false; } return false; }

it failed with log like

[05/06/2024-21:04:56] [TRT] [E] 9: [pluginV2Builder.cpp::reportPluginError::23] Error Code 9: Internal Error (LayerNorm_4: could not find any supported formats consistent with input/output data types)
screenshot-20240506-211947

@zerollzeng
Copy link
Collaborator

hello, why the log showed out type of QuantizeLinear is float

What does it look like in onnx?

@zerollzeng
Copy link
Collaborator

You implement a layernorm plugin? TRT has native support.

@zerollzeng zerollzeng self-assigned this May 12, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label May 12, 2024
@cc-sketch
Copy link
Author

cc-sketch commented May 14, 2024

@zerollzeng thank you and sorry for the late reply. I will try the layernorm plugin lately. But now what i really want to ask is how to serialize the "manually fused qdq onnx ". As seen below, left is the onnx with a couple of qdq nodes, and right is fused onnx.
qlayernorm qdq

Firstly, when i serialize the model with only "kINT8 DataType" supported in trt plugin, as
int8_only

It produced the error:

[pluginV2Builder.cpp::reportPluginError::23] Error Code 9: Internal Error (MyOp_3: could not find any supported formats consistent with input/output data types)

So i asked "why out type of QuantizeLinear is float".

and, if i serialize the model with "kFLOAT DataType" supported also in trt plugin, as
fp16

It produced the error:

[optimizer.cpp::filterQDQFormats::4422] Error Code 2: Internal Error (Assertion !n->candidateRequirements.empty() failed. All of the candidates were removed, which points to the node being incorrectly marked as an int8 node.)

With trt-8531 and cuda 11.3. Thank you very much.

@zerollzeng
Copy link
Collaborator

zerollzeng commented May 19, 2024

Your Q/DQ placement is wrong to me, try Q->MyOp->DQ if your plugin is in int8.

@cc-sketch
Copy link
Author

Sorry, i didn't understand what this mean.
111
I manually fused the model. From Q->DQ->MyOp->Q->DQ, to Q-> MyOp->DQ, like https://forums.developer.nvidia.com/t/where-do-tensorrt-plugin-determine-whether-fuse-qdq-or-not/226705/4
222
So, for my question what is wrong? Is there sth wrong for quantized model generation, or sth wrong in MyOp plugin? It confused me for a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants