GGUF quantization meta-data format #797

mobicham · 2024-04-14T12:12:38Z

Hello!

Are there some resources that explain how the quantized parameters are structured in a GGUF file?
We are interested in porting HQQ-quantized models into GGUF format, but in order to do that, we need to know exactly how it is stored.
We basically need to know:

The bitpacking logic
axis along which quantization is done
group-sizes associated with different quant types

Thanks!

phymbert · 2024-04-14T13:59:28Z

Hi, you would better have a look at llama.cpp :

https://github.com/ggerganov/llama.cpp/blob/f184dd920852d6d372b754f871ee06cfe6f977ad/llama.cpp#L13599

crimson-knight · 2024-04-15T19:26:31Z

@mobicham here is the spec for GGUF for you to use: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF quantization meta-data format #797

GGUF quantization meta-data format #797

mobicham commented Apr 14, 2024

phymbert commented Apr 14, 2024

crimson-knight commented Apr 15, 2024

GGUF quantization meta-data format #797

GGUF quantization meta-data format #797

Comments

mobicham commented Apr 14, 2024

phymbert commented Apr 14, 2024

crimson-knight commented Apr 15, 2024