FEAT / Trainer: Experimental feature - add `GGUF` conversion when pushing the model to Hub #30928

younesbelkada · 2024-05-21T07:49:04Z

What does this PR do?

Introduces a new quantization_config that is intended to be used only for trainer.push_to_hub(), it calls ` a GGUF conversion Space under the hood - (for now: https://huggingface.co/spaces/ybelkada/gguf-my-repo that we will move to a more "official" org)

The API looks like the following:

from datasets import load_dataset
from trl import SFTTrainer, SFTConfig
from transformers import TrainingArguments, GgufConfig

dataset = load_dataset("imdb", split="train")

args = SFTConfig(output_dir="test-gguf-trainer")

trainer = SFTTrainer(
    "ybelkada/tiny-random-llama",
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=512,
    args=args
)

gguf_config = GgufConfig(
    quantization_method="Q8_0",
    private=True,
    space_name="ybelkada/gguf-my-repo"
)

trainer.push_to_hub(gguf_config=gguf_config)

cc @amyeroberts @Vaibhavs10

HuggingFaceDocBuilderDev · 2024-05-21T08:13:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

younesbelkada · 2024-05-28T16:26:21Z

src/transformers/utils/quantization_config.py

+ def __init__(
+ self,
+ quantization_method: str,
+ space_name: Optional[str] = "ggml-org/gguf-my-repo",


Suggested change

space_name: Optional[str] = "ggml-org/gguf-my-repo",

space_name: Optional[str] = "ggml-org/gguf-my-repo-trainer",

src/transformers/utils/quantization_config.py

younesbelkada · 2024-05-28T16:27:20Z

Would love to get a first round of review before adding tests and documentation ! cc @amyeroberts @muellerzr

younesbelkada · 2024-05-28T17:41:06Z

Example generated GGUF repo: https://huggingface.co/ybelkada/test-gguf-trainer-Q8_0-GGUF

muellerzr

Overall this makes sense from a it works perspective, but is there a reason why we are dedicating a space to this rather than using something like a Hub util? (or baking this in here, for an offline util instead of online)

E.g. I'd rather this be able to be applied locally rather than on the Hub personally... (ran into this same thought process with safetensors the other day)

Going transformers -> gradio -> hf_hub seems a bit middle-man to me.

younesbelkada · 2024-05-28T18:04:04Z

This is possible but would require quite a huge work, the conversion script maintained by the community: https://github.com/ggerganov/llama.cpp/blob/master/convert.py evolves quite quickly, which would make maintaining that feature quite hard. On the other hand, the official Space uses docker and is build on that conversion script: https://huggingface.co/spaces/ggml-org/gguf-my-repo/blob/main/app.py#L102
one direction could be to build a separate python package / or submodule within transformers to convert locally transformers model to GGUF, with the caveat being the maintenance burden between the official llama.cpp conversion script and our submodule. What do you think? cc @Vaibhavs10 as well

younesbelkada · 2024-05-29T15:37:29Z

cc @LysandreJik wdyt ?

Vaibhavs10

This sounds like a good approach to me too! With Quantisation we have seen that the script evolves wayyy too quickly sometimes from day to day.

That's why the initial recommendation was for it to be used via the space! That said, for offline use-cases perhaps we just use the convert script and update it periodically via a corn - maybe everyday?

This way the only script we'd need to update would be this: https://github.com/ggerganov/llama.cpp/blob/master/convert-hf-to-gguf.py

Then we use this to convert to fp16 and the quantisation code remains the same.

amyeroberts

Thanks for opening this!

amyeroberts · 2024-05-30T22:17:35Z

src/transformers/utils/quantization_config.py

+ space_name (`str`, *optional*, defaults to `"ggml-org/gguf-my-repo"`):
+ Hub ID of the conversion space to use, modify it at your own risk!
+ private (`bool`, *optional*, defaults to `False`):
+ Whether to push the final GGUF file on a private repository.
+ duplicate_space (`bool`, *optional*, defaults to `False`):
+ Whether to duplicate the conversion space or not. This is useful to avoid queue issues if one uses the official
+ Space.


I really don't think either of these arguments make sense to be included as part of a "quantization config" - they're not configuring quantization, they're configuring a space. It also creates a clash of arguments with the current trainer. If I have private=True in this config and call hub_private_repo=False in the training arguments, which one is should be respected?

I would move these somewhere else (how configurable do we need them to be?)

Hmm yes makes sense, let me think how best we could address that

amyeroberts · 2024-05-30T22:19:18Z

src/transformers/trainer.py

+ if not isinstance(gguf_config, GgufConfig):
+ raise ValueError("Please pass an instance of `GgufConfig`")
+
+ from gradio_client import Client


We need some protection around whether gradio_client is available

amyeroberts · 2024-05-30T22:25:39Z

src/transformers/trainer.py

+ if gguf_config.duplicate_space:
+ client = Client.duplicate(gguf_config.space_name, hf_token=get_token_to_send(token))
+ else:
+ client = Client(gguf_config.space_name)
+
+ job = client.submit(
+ model_id=self.hub_model_id,
+ q_method=gguf_config.quantization_method,
+ private_repo=gguf_config.private,
+ token=get_token_to_send(token),
+ api_name="/predict",
+ )
+
+ logger.info(
+ f"GGUF conversion Space is running with the status {job.status().code} - please monitor your HF profile to see if the converted model has been pushed"
+ )


hmmmm.... I'm really not sure about this, my gut feeling is that it's pretty dirty piggy-backing gradio here.

I'd like for this general logic to be moved out to a function rather than being injected directly into a core trainer method

Makes sense, will move that out in a separate method !

v1

b8ea110

younesbelkada added 7 commits May 22, 2024 10:28

Merge remote-tracking branch 'origin/main' into add-gguf-convert-trainer

28ba930

Merge remote-tracking branch 'origin/main' into add-gguf-convert-trainer

a24b3b9

get correct token

620547b

Merge remote-tracking branch 'origin/main' into add-gguf-convert-trainer

f717277

fix

03df6aa

add more changes

fcaa42d

more changes

90d70b4

younesbelkada changed the title ~~FEAT / Trainer: Add GGUF conversion when pushing the model to Hub~~ FEAT / Trainer: Experimental feature - add GGUF conversion when pushing the model to Hub May 28, 2024

fix

585b9a1

younesbelkada commented May 28, 2024

View reviewed changes

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved

Update src/transformers/utils/quantization_config.py

4ec1178

younesbelkada requested review from amyeroberts and muellerzr and removed request for amyeroberts May 28, 2024 16:27

muellerzr reviewed May 28, 2024

View reviewed changes

younesbelkada requested a review from LysandreJik May 29, 2024 15:37

Vaibhavs10 reviewed May 29, 2024

View reviewed changes

amyeroberts reviewed May 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT / Trainer: Experimental feature - add `GGUF` conversion when pushing the model to Hub #30928

FEAT / Trainer: Experimental feature - add `GGUF` conversion when pushing the model to Hub #30928

younesbelkada commented May 21, 2024 •

edited

HuggingFaceDocBuilderDev commented May 21, 2024

younesbelkada May 28, 2024

younesbelkada commented May 28, 2024

younesbelkada commented May 28, 2024

muellerzr left a comment •

edited

younesbelkada commented May 28, 2024

younesbelkada commented May 29, 2024

Vaibhavs10 left a comment

amyeroberts left a comment

amyeroberts May 30, 2024

younesbelkada May 31, 2024

amyeroberts May 30, 2024

amyeroberts May 30, 2024

younesbelkada May 31, 2024

	space_name: Optional[str] = "ggml-org/gguf-my-repo",
	space_name: Optional[str] = "ggml-org/gguf-my-repo-trainer",

FEAT / Trainer: Experimental feature - add GGUF conversion when pushing the model to Hub #30928

Are you sure you want to change the base?

FEAT / Trainer: Experimental feature - add GGUF conversion when pushing the model to Hub #30928

Conversation

younesbelkada commented May 21, 2024 • edited

What does this PR do?

HuggingFaceDocBuilderDev commented May 21, 2024

younesbelkada May 28, 2024

Choose a reason for hiding this comment

younesbelkada commented May 28, 2024

younesbelkada commented May 28, 2024

muellerzr left a comment • edited

Choose a reason for hiding this comment

younesbelkada commented May 28, 2024

younesbelkada commented May 29, 2024

Vaibhavs10 left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts May 30, 2024

Choose a reason for hiding this comment

younesbelkada May 31, 2024

Choose a reason for hiding this comment

amyeroberts May 30, 2024

Choose a reason for hiding this comment

amyeroberts May 30, 2024

Choose a reason for hiding this comment

younesbelkada May 31, 2024

Choose a reason for hiding this comment

FEAT / Trainer: Experimental feature - add `GGUF` conversion when pushing the model to Hub #30928

FEAT / Trainer: Experimental feature - add `GGUF` conversion when pushing the model to Hub #30928

younesbelkada commented May 21, 2024 •

edited

muellerzr left a comment •

edited