Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting inference with EETQ quantized model #391

Open
thincal opened this issue Apr 5, 2024 · 0 comments · May be fixed by #393
Open

Supporting inference with EETQ quantized model #391

thincal opened this issue Apr 5, 2024 · 0 comments · May be fixed by #393
Labels
enhancement New feature or request

Comments

@thincal
Copy link
Contributor

thincal commented Apr 5, 2024

Feature request

EETQ quantized model perform with very good quality in my case, but the loading is pretty slow. So that if the base model is quantized with EETQ already, LoRAX should load it directly without the JIT quantization, but currently will failed to find related layers.

Motivation

Speed up the EETQ model loading speed.

Your contribution

I will prepare a PR for a review, also I need some help with the implementation in someplace.

@thincal thincal linked a pull request Apr 5, 2024 that will close this issue
3 tasks
@tgaddair tgaddair added the enhancement New feature or request label May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants