Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Add model vocab support #7117

Closed
wants to merge 70 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
1a9cf92
feat: Add stablelm vocab to gguf update
teleprint-me May 7, 2024
1355c24
chore: Apply update to get_vocab_base_pre method
teleprint-me May 7, 2024
e71789e
feat: Add stablelm vocab
teleprint-me May 7, 2024
8490705
feat: Add generate vocab shell script
teleprint-me May 7, 2024
d8694af
refactor: Clean up and organize url and dir paths
teleprint-me May 8, 2024
9d2fcd0
tests: Add test for qwen tokenizer
teleprint-me May 8, 2024
b8f8a96
feat: Add qwen pattern and tokenizer
teleprint-me May 8, 2024
3ae6c17
chore: Add missing command-r gguf vocab
teleprint-me May 8, 2024
4155e86
feat: Add support for qwen tokenizer
teleprint-me May 8, 2024
cbfed5b
chore: Update generate-vocab.sh script
teleprint-me May 8, 2024
f7dda38
note: Time of check to time of use
teleprint-me May 8, 2024
670e1c3
fix: Attempt to remove potential TOCTOU
teleprint-me May 8, 2024
69efb59
fix: Apply proper paths for handling qwen
teleprint-me May 8, 2024
906c3f7
fix: Apply fix to generate-vocab.sh script
teleprint-me May 8, 2024
0478552
chore: Add tiktoken to convert requirements
teleprint-me May 8, 2024
ccafb87
chore: Add model vocab
teleprint-me May 8, 2024
a6c5d5d
Merge branch 'master' into add-stablelm-hash
teleprint-me May 8, 2024
ca8acea
chore: Group qwen models together
teleprint-me May 8, 2024
c05d2a2
chore: Fix enumeration for qwen, olmo, and dbrx
teleprint-me May 8, 2024
17f2243
patch: Apply patch to fix config and SPM retrieval
teleprint-me May 8, 2024
de3d9e3
patch: Apply fix for downloading related model files
teleprint-me May 8, 2024
bc924e0
Merge branch 'master' into add-stablelm-hash
teleprint-me May 8, 2024
fc0007e
Merge branch 'master' into add-stablelm-hash
teleprint-me May 13, 2024
932ab05
Remove qwen and fix mauled imports
teleprint-me May 13, 2024
58551d0
chore: Apply updates to vocab models
teleprint-me May 13, 2024
4067536
change default temperature of OAI compat API from 0 to 1 (#7226)
Kartoffelsaft May 13, 2024
cfeb962
convert.py: Outfile default name change and additional metadata suppo…
mofosyne May 13, 2024
eaa8457
llama : rename jina tokenizers to v2 (#7249)
JoanFM May 13, 2024
3fa36ac
[SYCL] rm wait() (#7233)
arthw May 13, 2024
89550bb
perplexity: add BF16 vs. FP16 results (#7150)
JohannesGaessler May 13, 2024
d8b6869
llava-cli: fix base64 prompt (#7248)
Adriankhl May 13, 2024
7d85ea8
llama : less KV padding when FA is off (#7257)
ggerganov May 13, 2024
3dfaa1f
convert-hf : support direct Q8_0 conversion (#7234)
compilade May 13, 2024
95390eb
docs: Fix typo and update description for --embeddings flag (#7026)
louixs May 14, 2024
c7b8254
Add left recursion check: quit early instead of going into an infinit…
nuchi May 14, 2024
a94019b
move ndk code to a new library (#6951)
eltonkola May 14, 2024
e30a369
llama : disable pipeline parallelism with nkvo (#7265)
slaren May 14, 2024
7a2f768
ggml : add RPC backend (#6829)
rgerganov May 14, 2024
04a7f32
Revert "move ndk code to a new library (#6951)" (#7282)
mofosyne May 14, 2024
58962a2
server: free sampling contexts on exit (#7264)
stevegrubb May 14, 2024
37e2593
ggml : optimize for ppc64le using VSX intrinsics (ggml/784)
penghongbo May 12, 2024
b95c202
ggml : expose SSE3 and SSSE3 for MSVC when AVX is available (whisper/…
przemoc May 8, 2024
da894f9
ggml : try fix ppc64 (whisper/0)
ggerganov May 12, 2024
48296bf
metal : tune soft_max number of threads (whisper/0)
ggerganov May 13, 2024
2022675
sync : ggml
ggerganov May 14, 2024
4bc6f6e
metal : support FA without mask + add asserts (#7278)
ggerganov May 14, 2024
02f4122
script : sync ggml-rpc
ggerganov May 14, 2024
53332ff
server bench: fix bench not waiting for model load (#7284)
JohannesGaessler May 15, 2024
79bc1ea
ggml : add `ggml_upscale_ext` (ggml/814)
balisujohn May 15, 2024
4aae3a5
sync : ggml
ggerganov May 15, 2024
f3e8fc1
embedding : free the batch after execution (#7297)
dm4 May 15, 2024
da26e4d
Add missing " (#7303)
AidanBeltonS May 15, 2024
6fb91c1
ggml : tag ggml_tensor::backend as deprecated (#7290)
slaren May 15, 2024
dda1347
Avoid unnecessarily disabling CUDA graphs (#7302)
agray3 May 15, 2024
d1e2b6e
ggml : use dynamic thread scheduling for matrix multiplication (#6915)
kunnis May 15, 2024
41b9e5c
readme : remove stray double quote (#7310)
danbev May 15, 2024
b953ca3
Add support for properly optimized Windows ARM64 builds with LLVM and…
max-krasnyansky May 16, 2024
ad34bee
ci: fix bin/Release path for windows-arm64 builds (#7317)
max-krasnyansky May 16, 2024
a8d948c
doc: add references to hugging face GGUF-my-repo quantisation web too…
Vaibhavs10 May 16, 2024
d0a9c31
grammar, json, llama: replace push on emplace if it possible (#7273)
GermanAizek May 16, 2024
c7a926f
convert : get general.name from model dir, not its parent (#5615)
cebtenzzre May 16, 2024
3d210da
rpc : add command line arg for specifying backend memory
rgerganov May 15, 2024
99d5b28
rpc : get available mem for the CPU backend
rgerganov May 15, 2024
657f980
Revert "server bench: fix bench not waiting for model load (#7284)" (…
phymbert May 16, 2024
cd0e3d5
[Server] Added --verbose option to README [no ci] (#7335)
reuank May 17, 2024
e7c7ae8
patch: Add pre-tokenizer metadata to phi-2
teleprint-me May 17, 2024
9a81faf
patch: Fix jina vocab generation
teleprint-me May 17, 2024
8aa4937
feat: Make number of experts configurable
teleprint-me May 17, 2024
a7e0042
chore: Update gguf vocabularies
teleprint-me May 17, 2024
9269594
Merge branch 'master' into add-stablelm-hash
teleprint-me May 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
29 changes: 21 additions & 8 deletions convert-hf-to-gguf-update.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,14 @@
# TODO: automate the update of convert-hf-to-gguf.py
#

import json
import logging
import os
import requests
import sys
import json

from hashlib import sha256
from enum import IntEnum, auto
from hashlib import sha256

import requests
from transformers import AutoTokenizer

logging.basicConfig(level=logging.DEBUG)
Expand Down Expand Up @@ -65,6 +65,13 @@ class TOKENIZER_TYPE(IntEnum):
{"name": "mpt", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/mosaicml/mpt-7b", },
{"name": "starcoder", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/bigcode/starcoder2-3b", },
{"name": "gpt-2", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/openai-community/gpt2", },
{"name": "phi", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/microsoft/phi-1", },
{"name": "stablelm", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b", },
{"name": "qwen", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/Qwen/Qwen-tokenizer", },
{"name": "mistral-bpe", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2", },
{"name": "mistral-spm", "tokt": TOKENIZER_TYPE.SPM, "repo": "https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2", },
{"name": "mixtral-bpe", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1", },
{"name": "mixtral-spm", "tokt": TOKENIZER_TYPE.SPM, "repo": "https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1", },
{"name": "refact", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/smallcloudai/Refact-1_6-base", },
{"name": "command-r", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/CohereForAI/c4ai-command-r-v01", },
]
Expand Down Expand Up @@ -290,12 +297,18 @@ def get_vocab_base_pre(self, tokenizer) -> str:
logger.info(f"Tests for {name} written in ./models/ggml-vocab-{name}.gguf.*")

# generate commands for creating vocab files

logger.info("\nRun the following commands to generate the vocab files for testing:\n")
shscript = "#!/usr/bin/env bash\n\n"

for model in models:
name = model["name"]
tmpline = f"python3 convert-hf-to-gguf.py models/tokenizers/{name}/ --outfile models/ggml-vocab-{name}.gguf --vocab-only\n"
shscript += tmpline
logging.info(tmpline.strip())

print(f"python3 convert-hf-to-gguf.py models/tokenizers/{name}/ --outfile models/ggml-vocab-{name}.gguf --vocab-only") # noqa: NP100
with open("generate-vocab.sh", "w", encoding="utf-8") as f:
f.writelines(shscript)
logging.info(f"Wrote {len(shscript)} bytes to generate-vocab.sh")

logger.info("\n")
logging.info("Run the following command to generate the vocab files for testing:")
logging.info("Enable execution: chmod +x generate-vocab.sh")
logging.info("Execute with ./generate-vocab.sh")
30 changes: 27 additions & 3 deletions convert-hf-to-gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,27 @@

from __future__ import annotations

import logging
import argparse
import contextlib
import json
import logging
import os
import re
import sys
from abc import ABC, abstractmethod
from enum import IntEnum
from pathlib import Path
from hashlib import sha256
from typing import TYPE_CHECKING, Any, Callable, ContextManager, Iterator, Sequence, TypeVar, cast
from pathlib import Path
from typing import (
TYPE_CHECKING,
Any,
Callable,
ContextManager,
Iterator,
Sequence,
TypeVar,
cast,
)

import numpy as np
import torch
Expand Down Expand Up @@ -308,6 +317,21 @@ def get_vocab_base_pre(self, tokenizer) -> str:
if chkhsh == "3ce83efda5659b07b1ad37ca97ca5797ea4285d9b9ab0dc679e4a720c9da7454":
# ref: https://huggingface.co/openai-community/gpt2
res = "gpt-2"
if chkhsh == "fcace8b9cac38ce847670c970cd5892031a753a1ef381abd1d9af00f713da085":
# ref: https://huggingface.co/microsoft/phi-1
res = "phi"
if chkhsh == "32d85c31273f8019248f2559fed492d929ea28b17e51d81d3bb36fff23ca72b3":
# ref: https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b
res = "stablelm"
if chkhsh == "e636dc30a262dcc0d8c323492e32ae2b70728f4df7dfe9737d9f920a282b8aea":
# ref: https://huggingface.co/Qwen/Qwen-tokenizer
res = "qwen"
if chkhsh == "e750a9b14dfed9b73287639bd1ecda50c38fa6011138f2f609804c6dab9ed5c2":
# ref: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
res = "mistral-bpe"
if chkhsh == "e750a9b14dfed9b73287639bd1ecda50c38fa6011138f2f609804c6dab9ed5c2":
# ref: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
res = "mixtral-bpe"
if chkhsh == "6221ad2852e85ce96f791f476e0b390cf9b474c9e3d1362f53a24a06dc8220ff":
# ref: https://huggingface.co/smallcloudai/Refact-1_6-base
res = "refact"
Expand Down
21 changes: 21 additions & 0 deletions generate-vocab.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/usr/bin/env bash

python3 convert-hf-to-gguf.py models/tokenizers/llama-spm/ --outfile models/ggml-vocab-llama-spm.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/llama-bpe/ --outfile models/ggml-vocab-llama-bpe.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/phi-3/ --outfile models/ggml-vocab-phi-3.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/deepseek-llm/ --outfile models/ggml-vocab-deepseek-llm.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/deepseek-coder/ --outfile models/ggml-vocab-deepseek-coder.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/falcon/ --outfile models/ggml-vocab-falcon.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/bert-bge/ --outfile models/ggml-vocab-bert-bge.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/mpt/ --outfile models/ggml-vocab-mpt.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/starcoder/ --outfile models/ggml-vocab-starcoder.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/gpt-2/ --outfile models/ggml-vocab-gpt-2.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/phi/ --outfile models/ggml-vocab-phi.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/stablelm/ --outfile models/ggml-vocab-stablelm.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/qwen/ --outfile models/ggml-vocab-qwen.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/mistral-bpe/ --outfile models/ggml-vocab-mistral-bpe.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/mistral-spm/ --outfile models/ggml-vocab-mistral-spm.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/mixtral-bpe/ --outfile models/ggml-vocab-mixtral-bpe.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/mixtral-spm/ --outfile models/ggml-vocab-mixtral-spm.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/refact/ --outfile models/ggml-vocab-refact.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/command-r/ --outfile models/ggml-vocab-command-r.gguf --vocab-only
Binary file modified models/ggml-vocab-stablelm.gguf
Binary file not shown.