-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get a real int8 quanted ONNX model? #2816
Comments
Hi @JiliangNi, AIMET offers quantization simulation, which simulates the quantization noise by quantizing and dequantizing. So, to get INT8 vectors, you will have to run your model on a quantized hardware. |
Can we simulate inference with |
@escorciav, we don't have a workflow available for it currently. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi~
After calling " sim.export(path=config.logdir, filename_prefix='quant_model') ", I can get 3 files: quant_model.encodings, quant_model.encodings.yaml and quant_model.onnx.
However, when quant_model.onnx is visualized in netron, it shows as a fp32 ONNX model. Its file size is also same as the original fp32 model's.
How can I get a real int8 ONNX model which contains quanted ops and has smaller file size than the original fp32 model?
The text was updated successfully, but these errors were encountered: