There are three files there
- for finetuning the Mistral7b
- for inferencing the model which is uploaded to the huggingface
- for inferencing the saved model as gguff format
to convert the model to gguf format:
- load the model on higher bits than 8-bits, such as 16 or 32 bits.
- Finetune the model using lora adapters.
- Save the model and adapter both
- convert the model to gguf using the following command
pip install -r llama.cpp/requirements.txt
python llama.cpp/convert.py vicuna-hf
--outfile vicuna-13b-v1.5.gguf
--outtype q8_0