Phi-3 mini 4k instruct with MICROSOFT's quantization #2273

federicoparra · 2024-05-04T08:31:31Z

⚙️ Request New Models

Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf
Is this model architecture supported by MLC-LLM? Yes

Additional context

I know others have made this request already (#2246, #2222, #2238, #2205).

But I am requesting something different: I am suggesting that you do not quantize or modify the weights of the model but that you instead use Microsoft's already 4-bit quantized weights.

The reason is that I suspect (although it is not explicit in their repo) they used quantization-aware training to build these GGUF files.
I have tested the regular 32-bit model vs the GGUF 4-bit one and the performance is almost equivalent which is not what I've seen so far with MLC's quantized models (they tend to be more inaccurate compared to their 32-bit counterparts).

Is there a way to use Microsoft's own quantized weights?

Thank you!
Federico

tqchen · 2024-05-04T12:41:40Z

Thanks for the suggestion, we are still focusing on a major refactoring push to stablize the universal deployment use-case so cannot quickly add new format support as of now.

This is something that i think would be good to explore as community effort. The main thing needed here is a customized loader that loads weight, and a quantization scheme(which maps loaded weights into the target weights)

federicoparra · 2024-05-04T19:45:11Z

Perhaps a converter? So far in general contributors produce GGUF quantized versions of models doing post training quantization, but if, like Microsoft, other large vendors begin providing quantization-aware training quantized weights in GGUF format it would be great to be able to import them.

tqchen · 2024-05-04T20:55:53Z

right, the loader and quantization combined would be effectively a converter like you mentioned

federicoparra added the new-models label May 4, 2024

federicoparra mentioned this issue May 4, 2024

Reduced and now seemingly removed support for Mali? #2274

Closed

tqchen added the help wanted Looking for community help label May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phi-3 mini 4k instruct with MICROSOFT's quantization #2273

Phi-3 mini 4k instruct with MICROSOFT's quantization #2273

federicoparra commented May 4, 2024

tqchen commented May 4, 2024 •

edited

federicoparra commented May 4, 2024

tqchen commented May 4, 2024

Phi-3 mini 4k instruct with MICROSOFT's quantization #2273

Phi-3 mini 4k instruct with MICROSOFT's quantization #2273

Comments

federicoparra commented May 4, 2024

⚙️ Request New Models

Additional context

tqchen commented May 4, 2024 • edited

federicoparra commented May 4, 2024

tqchen commented May 4, 2024

tqchen commented May 4, 2024 •

edited