Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phi-3 mini 4k instruct with MICROSOFT's quantization #2273

Open
federicoparra opened this issue May 4, 2024 · 3 comments
Open

Phi-3 mini 4k instruct with MICROSOFT's quantization #2273

federicoparra opened this issue May 4, 2024 · 3 comments
Labels
help wanted Looking for community help new-models

Comments

@federicoparra
Copy link

⚙️ Request New Models

Additional context

I know others have made this request already (#2246, #2222, #2238, #2205).

But I am requesting something different: I am suggesting that you do not quantize or modify the weights of the model but that you instead use Microsoft's already 4-bit quantized weights.

The reason is that I suspect (although it is not explicit in their repo) they used quantization-aware training to build these GGUF files.
I have tested the regular 32-bit model vs the GGUF 4-bit one and the performance is almost equivalent which is not what I've seen so far with MLC's quantized models (they tend to be more inaccurate compared to their 32-bit counterparts).

Is there a way to use Microsoft's own quantized weights?

Thank you!
Federico

@tqchen
Copy link
Contributor

tqchen commented May 4, 2024

Thanks for the suggestion, we are still focusing on a major refactoring push to stablize the universal deployment use-case so cannot quickly add new format support as of now.

This is something that i think would be good to explore as community effort. The main thing needed here is a customized loader that loads weight, and a quantization scheme(which maps loaded weights into the target weights)

@tqchen tqchen added the help wanted Looking for community help label May 4, 2024
@federicoparra
Copy link
Author

Perhaps a converter? So far in general contributors produce GGUF quantized versions of models doing post training quantization, but if, like Microsoft, other large vendors begin providing quantization-aware training quantized weights in GGUF format it would be great to be able to import them.

@tqchen
Copy link
Contributor

tqchen commented May 4, 2024

right, the loader and quantization combined would be effectively a converter like you mentioned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Looking for community help new-models
Projects
Development

No branches or pull requests

2 participants