GGML types IQ1_M and IQ2_M? #6235

Nexesenex · 2024-03-22T12:45:48Z

Nexesenex
Mar 22, 2024

I don't know the amount of work you'd need to do what follows if you didn't do it already on your private repo, but I think that it would be great to have a 1.8125bpw (and maybe a 2.8125bpw) GGML type, in order to improve the granularity of the FFN & attn.q.weight tensors quantization, and establish refined strategies for sub-IQ2_XXS and sub-IQ3_XXS quantized models. Beyond 3bpw, higher bpw intervals between the GGML types are less "problematic".

I don't know where the mathematical breaking of catastrophic quality loss is reached (you already obviously lowered it below 1.5bpw with your IQ1_S_"EvenBetter" GGML type), but the attn.q.weight and ffn tensors (notably .up and .gate) might even be able to endure a sub 1.5bpw quant while allowing still a quant strategy using it to remain on the same "curve" as the one presented on Artefact's graph.

Also, and considering the massive improvements your IQ quants have brought in term of quality/size ratio, higher IQ GGML types in the 4.5 - 6bpw range are highly awaited, especially for the attn.v.weight, attn.output.weight, ffn.down, and output tensors.

ikawrakow · 2024-03-22T15:42:04Z

ikawrakow
Mar 22, 2024

So, my concept is that IQ1_S is not really useful. It is there so we here in this repo don't need to be envious of the hype around 1 bit quantization out there. A 1.8 bpw quantization will be of course better than IQ1_S, but will it be really useful? And then there is the possibility that we eventually get models trained at 1.58 bits, which will completely obsolete low-bit quants.

We already have IQ2_M, no?

When we go to 5 bpw, there is very little difference between Q5_0/1 and Q5_K, or the 5-bit equivalent of IQ4_XS that I have in my private repo. So, not really sure how useful additional 5-bpw quants are.

At 6 bpw, basically nothing I have tried is better than Q6_K. One could add a 6-bit quantization that works with blocks of 32 (or 64), but I'm not sure there is need/interest for that.

Anyone else apart from @Nexesenex interested in such additional quants?

0 replies

Nexesenex · 2024-03-23T01:10:47Z

Nexesenex
Mar 23, 2024
Author

The point of having a maximum amount of GGML_Types available is for interested folks to be able to make their own quantization strategies, tensor by tensor.

Why not just share the GGML_Types you created and which are working as intended with the community, even if you don't elaborate new quant strategies based on them, instead of leaving that amazing work to sleep on your private repo? Trust the folks trying to use them to determine if they are useful or not.

Only by exploring can we find the combinations which work best, and more people exploring multiplicate the chances. I didn't find an aleady all-made formula determining the interaction between the 9 different kind of tensors of a model like Llama 2 and the ideal quantization strategy at a given overall bpw, so trial and error it is.

But your IQ1_S quant strategy can already be improved with the available GGML_types, and a 1Q1_M GGML_type at 1.8125bpw would allow to make 2.0x bpw and sub 2bpw overall quant (for 34b+ models, and MOEs) which are actually usable beyond a demo, this without toying too much with sketchy combos of ffn tensors quantized in part in IQ2_XXS and in part with IQ1_S.

As for IQ2_M, we have a quant strategy named like that indeed, but not a GGML_Type IQ2_M. IQ2_M is based on GGML_Type IQ2_S (2.5625bpw).

case LLAMA_FTYPE_MOSTLY_IQ2_M: default_type = GGML_TYPE_IQ2_S; break;

0 replies

ikawrakow · 2024-03-23T08:18:47Z

ikawrakow
Mar 23, 2024

But your IQ1_S quant strategy can already be improved with the available GGML_types, and a 1Q1_M GGML_type at 1.8125bpw would allow to make 2.0x bpw and sub 2bpw overall quant (for 34b+ models, and MOEs) which are actually usable beyond a demo, this without toying too much with sketchy combos of ffn tensors quantized in part in IQ2_XXS and in part with IQ1_S.

@Nexesenex Why don't we start by you sharing with us this improvement?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGML types IQ1_M and IQ2_M? #6235

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

GGML types IQ1_M and IQ2_M? #6235

Nexesenex Mar 22, 2024

Replies: 3 comments

ikawrakow Mar 22, 2024

Nexesenex Mar 23, 2024 Author

ikawrakow Mar 23, 2024

Nexesenex
Mar 22, 2024

ikawrakow
Mar 22, 2024

Nexesenex
Mar 23, 2024
Author

ikawrakow
Mar 23, 2024