Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert.py: Outfile default name change and additional metadata support #4858

Merged
merged 4 commits into from
May 13, 2024

Conversation

mofosyne
Copy link
Collaborator

@mofosyne mofosyne commented Jan 10, 2024

Was working on llamafile and was exploring how people commonly name their files in hugging face. Based on that I am suggesting a naming convention for it that I think can also apply for GGUF, so I'm applying it to the convert.py that I am currently using.

This commit adds a --metadata which would read from a file like below which would give convert.py enough of a context to figure out a reasonable default file name.

{
    "general.name": "TinyLLama",
    "general.version": "v0",
    "general.author": "mofosyne",
    "general.url": "https://huggingface.co/mofosyne/TinyLLama-v0-llamafile",
    "general.description": "This gguf is ported from a first version of Maykeye attempt at recreating roneneldan/TinyStories-1M but using Llama architecture",
    "general.license": "apache-2.0",
    "general.source.url": "https://huggingface.co/Maykeye/TinyLLama-v0",
    "general.source.huggingface.repository": "https://huggingface.co/Maykeye/TinyLLama-v0"
}

The above when applied to a hugging face model Maykeye/TinyLLama-v0 will generate tinystories-v0-5M-F16.gguf. Key thing to note is that it is able to estimate the total parameter size (version and name is determined by metadata.json and by context).

I'm also proposing that we add an additional field general.version to the gguf standard which would be handy for models that are from the same group and is effectively the same model but trained further. People have been attaching version to model name, so it be better to allow people to split it to model name and version.

The above metadata KV store key names are based on https://github.com/ggerganov/ggml/blob/master/docs/gguf.md which appears to be the canonical reference for gguf key values names.

Proposed GGUF Naming Convention

GGUF follow a naming convention of <Model>-<Version>-<ExpertsCount>x<Parameters>-<Quantization>.gguf.

The components are:

  1. Model: A descriptive name for the model type or architecture.
  2. Version (Optional): Denotes the model version number, starting at v1 if not specified, formatted as v<Major>.<Minor>.
    • Best practice to include model version number only if model has multiple versions and assume the unversioned model to be the first version and/or check the model card.
  3. ExpertsCount: Indicates the number of experts found in a Mixture of Experts based model.
  4. Parameters: Indicates the number of parameters and their scale, represented as <count><scale-prefix>:
    • T: Trillion parameters.
    • B: Billion parameters.
    • M: Million parameters.
    • K: Thousand parameters.
  5. Quantization: This part specifies how the model parameters are quantized or compressed. The notation is influenced by the ./quantize --help command in llama.cpp.
    • Uncompressed formats:
      • F16: 16-bit floats per weight
      • F32: 32-bit floats per weight
    • Quantization (Compression) formats:
      • Q<X>: X bits per weight, where X could be 4 (for 4 bits) or 8 (for 8 bits) etc...
      • Variants provide further details on how the quantized weights are interpreted:
        • _K: k-quant models, which further have specifiers like _S, _M, and _L for small, medium, and large, respectively, if they are not specified, it defaults to medium.
        • _<num>: Different approaches, with even numbers indicating the model weights as a scaling factor multiplied by the quantized weight and odd numbers indicating the model weights as a combination of an offset factor plus a scaling factor multiplied by the quantized weight. This convention was found from this llama.cpp issue ticket on QX_4.
          • Even Number (0 or 2): <model weights> = <scaling factor> * <quantised weight>
          • Odd Number (1 or 3): <model weights> = <offset factor> + <scaling factor> * <quantised weight>

@mofosyne
Copy link
Collaborator Author

mofosyne commented Jan 16, 2024

Any thoughts about this proposal @cebtenzzre ? (Also the addition of an extra field general.version to gguf?)

This comment has been minimized.

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like some extra review on the Python implementation before merging

@mofosyne mofosyne force-pushed the outfile-default-name-change branch from 3a10cec to 840e7bf Compare April 5, 2024 23:42
@mofosyne
Copy link
Collaborator Author

mofosyne commented Apr 5, 2024

no problems. Just rebased so that there is only one file to review

convert.py Show resolved Hide resolved
@mofosyne
Copy link
Collaborator Author

mofosyne commented Apr 6, 2024

Should I also add a command to tell the user what default file it will generate? This may be required for automated pipelines scripts so they know what .gguf artifacts was generated? If so, this is what I plan to add

    parser.add_argument("--get-outfile",      action="store_true",    help="get calculated default outfile format")
...
    if args.get_outfile:
        logging.basicConfig(level=logging.CRITICAL)
        model_plus = load_some_model(args.model)
        params = Params.load(model_plus)

        model   = model_plus.model
        model   = convert_model_names(model, params, args.skip_unknown)
        ftype   = pick_output_type(model, args.outtype)
        model   = convert_to_output_type(model, ftype)

        model_params_count = model_parameter_count(model_plus.model)
        print(f"{default_convention_outfile(model_plus.paths, ftype, params, model_params_count, metadata)}")
        return

@mofosyne
Copy link
Collaborator Author

mofosyne commented Apr 6, 2024

Also if it would help here is https://huggingface.co/mofosyne/TinyLLama-v0-5M-F16-llamafile/blob/main/llamafile-creation.sh which shows me trying to convert from safetensor to gguf and using the new metadata format. Note this command:

./llama.cpp/convert.py maykeye_tinyllama --outtype f16 --metadata maykeye_tinyllama-metadata.json

The metadata used above is also in https://huggingface.co/mofosyne/TinyLLama-v0-5M-F16-llamafile/blob/main/maykeye_tinyllama-metadata.json and looks like:

{
    "general.name": "TinyLLama",
    "general.version": "v0",
    "general.author": "mofosyne",
    "general.url": "https://huggingface.co/mofosyne/TinyLLama-v0-llamafile",
    "general.description": "This gguf is ported from a first version of Maykeye attempt at recreating roneneldan/TinyStories-1M but using Llama architecture",
    "general.license": "apache-2.0",
    "general.source_url": "https://huggingface.co/Maykeye/TinyLLama-v0",
    "general.source_hf_repo": "https://huggingface.co/Maykeye/TinyLLama-v0"
}

So the other factor you may want to also consider when reviewing, is if I included enough metadata as well.

@mofosyne
Copy link
Collaborator Author

Slightly realize that we may also need to adjust the other conversion scripts to match this as well. But on cursory look at the others, wasn't exactly sure where to get all the values. Plus we should likely first make sure we agree on the naming convention first at least on this file before touching the rest.

@mofosyne
Copy link
Collaborator Author

Also heads, up that for me to add--get-outfile I'll need someone to pull this #6511 PR so I can suppress other message when dumping the generated filename.

@mofosyne mofosyne marked this pull request as draft May 6, 2024 03:10
@mofosyne mofosyne force-pushed the outfile-default-name-change branch 2 times, most recently from ddfaad9 to 4f99da4 Compare May 6, 2024 06:57
@mofosyne
Copy link
Collaborator Author

mofosyne commented May 6, 2024

Now that the python logging was refactored, I've took the opportunity to refactor this PR to include --get-outfile so now when you use this switch you can see how it selects the default outfile name based on model name and internal specifications.

~/huggingface/TinyLLama-v0-llamafile$ ./llama.cpp/convert.py maykeye_tinyllama --outtype f16 --metadata maykeye_tinyllama-metadata.json --get-outfile
TinyLLama-v0-5M-F16

@mofosyne mofosyne marked this pull request as ready for review May 6, 2024 07:06
@mofosyne mofosyne force-pushed the outfile-default-name-change branch from 4f99da4 to 74fe2ea Compare May 6, 2024 07:45
@mofosyne
Copy link
Collaborator Author

mofosyne commented May 6, 2024

Okay now testing again the usage flow implication of adding --get-outfile flag in https://huggingface.co/mofosyne/TinyLLama-v0-5M-F16-llamafile. This is what my bash script looks like, where I grab the generated outfile name via OUTFILE=$(./llama.cpp/convert.py ${MODEL_DIR} --metadata ${METADATA_FILE} --outtype f16 --get-outfile)

#!/bin/bash

MODEL_DIR="maykeye_tinyllama"
METADATA_FILE="maykeye_tinyllama-metadata.json"

###############################################################################
# Pull both model folder, llamafile (for the engine) and llama.cpp (for the conversion script)
echo == Prep Enviroment ==
git submodule update --init

###############################################################################
echo == Build and prep the llamafile engine execuable ==
pushd llamafile
make -j8
make
popd

###############################################################################
echo == What is our llamafile name going to be? ==
OUTFILE=$(./llama.cpp/convert.py ${MODEL_DIR} --metadata ${METADATA_FILE} --outtype f16 --get-outfile)
echo We will be aiming to generate $OUTFILE.llamafile

###############################################################################
echo == Convert from safetensor to gguf ==
./llama.cpp/convert.py ${MODEL_DIR} --metadata ${METADATA_FILE} --outtype f16
mv ${MODEL_DIR}/${OUTFILE}.gguf ${OUTFILE}.gguf

###############################################################################
echo == Generating Llamafile ==
cp ./llamafile/o/llama.cpp/main/main ${OUTFILE}.llamafile

# Create an .args file with settings defaults
cat >.args <<EOF
-m
${OUTFILE}.gguf
EOF

# zip align engine, gguf and default args
./llamafile/o/llamafile/zipalign -j0 ${OUTFILE}.llamafile ${OUTFILE}.gguf .args

###############################################################################
echo == Test Output ==
./${OUTFILE}.llamafile --cli -p "hello world the gruff man said"

It would be good to hear other ggml packagers/maintainers to see if this workflow makes sense.

Copy link
Collaborator

@compilade compilade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Some minor things to correct, but it's overall fine :)

convert.py Show resolved Hide resolved
convert.py Outdated Show resolved Hide resolved
convert.py Outdated Show resolved Hide resolved
convert.py Outdated Show resolved Hide resolved
@mofosyne mofosyne merged commit b1f8af1 into ggerganov:master May 13, 2024
22 checks passed
@mofosyne mofosyne deleted the outfile-default-name-change branch May 13, 2024 02:56
@mofosyne
Copy link
Collaborator Author

mofosyne commented May 13, 2024

Okay double checked in my hugging face repo that my replications script now works on the merged master and can confirm no problems.

https://huggingface.co/mofosyne/TinyLLama-v0-5M-F16-llamafile

It's now ready for use. @compilade thanks for the review, feel free to suggest improvements for metadata override to #7165 if you feel like we should adjust the behavior to make it easier to use.

@compilade
Copy link
Collaborator

@mofosyne Consider updating the example in the description of this PR to use valid metadata keys according to what was merged.

Note that general.source.url and general.source.huggingface.repository are also part of the GGUF spec (if you want somewhere else to link for the naming scheme)

@mofosyne
Copy link
Collaborator Author

mofosyne commented May 13, 2024

Good point. I've updated the metadata example in both the issue ticket and this PR description.

So https://github.com/ggerganov/ggml/blob/master/docs/gguf.md is the canonical source? Gotcha. I've already merged in the changes, but I'll note it in the description at least and hopefully we can adjust the source comments later as needed.

teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 17, 2024
…rt (ggerganov#4858)

* convert.py: Outfile default name change and additional metadata support

* convert.py: don't stringify Metadata load method output

* convert.py: typo fix

* convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need feedback Testing and feedback with results are needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants