Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get a real int8 quanted ONNX model? #2816

Open
JiliangNi opened this issue Mar 8, 2024 · 3 comments
Open

How to get a real int8 quanted ONNX model? #2816

JiliangNi opened this issue Mar 8, 2024 · 3 comments

Comments

@JiliangNi
Copy link

Hi~

After calling " sim.export(path=config.logdir, filename_prefix='quant_model') ", I can get 3 files: quant_model.encodings, quant_model.encodings.yaml and quant_model.onnx.

However, when quant_model.onnx is visualized in netron, it shows as a fp32 ONNX model. Its file size is also same as the original fp32 model's.

How can I get a real int8 ONNX model which contains quanted ops and has smaller file size than the original fp32 model?

@quic-mangal
Copy link
Contributor

Hi @JiliangNi, AIMET offers quantization simulation, which simulates the quantization noise by quantizing and dequantizing. So, to get INT8 vectors, you will have to run your model on a quantized hardware.

@escorciav
Copy link

Can we simulate inference with onnx-runtime prior to run on a target hardware, via QNN/SNPE?

@quic-mangal
Copy link
Contributor

@escorciav, we don't have a workflow available for it currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants