How to get a real int8 quanted ONNX model? #2816

JiliangNi · 2024-03-08T09:10:13Z

Hi~

After calling " sim.export(path=config.logdir, filename_prefix='quant_model') ", I can get 3 files: quant_model.encodings, quant_model.encodings.yaml and quant_model.onnx.

However, when quant_model.onnx is visualized in netron, it shows as a fp32 ONNX model. Its file size is also same as the original fp32 model's.

How can I get a real int8 ONNX model which contains quanted ops and has smaller file size than the original fp32 model?

quic-mangal · 2024-03-19T22:34:20Z

Hi @JiliangNi, AIMET offers quantization simulation, which simulates the quantization noise by quantizing and dequantizing. So, to get INT8 vectors, you will have to run your model on a quantized hardware.

escorciav · 2024-04-17T14:17:07Z

Can we simulate inference with onnx-runtime prior to run on a target hardware, via QNN/SNPE?

quic-mangal · 2024-04-17T16:21:34Z

@escorciav, we don't have a workflow available for it currently.

escorciav mentioned this issue Apr 17, 2024

Example using AIMET qunatized model and onnruntime #2880

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get a real int8 quanted ONNX model? #2816

How to get a real int8 quanted ONNX model? #2816

JiliangNi commented Mar 8, 2024

quic-mangal commented Mar 19, 2024

escorciav commented Apr 17, 2024

quic-mangal commented Apr 17, 2024

How to get a real int8 quanted ONNX model? #2816

How to get a real int8 quanted ONNX model? #2816

Comments

JiliangNi commented Mar 8, 2024

quic-mangal commented Mar 19, 2024

escorciav commented Apr 17, 2024

quic-mangal commented Apr 17, 2024