-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to perform int8 quantisation (not uint8) using ONNX? #1610
Comments
Hi @paul-ang , we only support U8S8 by default because on x86-64 machines with AVX2 and AVX512 extensions, ONNX Runtime uses the VPMADDUBSW instruction for U8S8 for performance. I am so sorry you need to update the code by yourself to use S8S8. Please add 'int8' in activations' dtype list: https://github.com/intel/neural-compressor/blob/master/neural_compressor/adaptor/onnxrt.yaml. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi team, I am having issue quantizing the network consisting of Conv and Linear layers using int8 weights and activations in ONNX. I have tried setting it using op_type_dict, however it doesn't work. The activation is still using uint8. I am using version 2.3.1 neural compressor.
The text was updated successfully, but these errors were encountered: