add chatglm3-6b model support [help wanted] #6999

mnlife · 2024-04-30T06:16:18Z

Text generation has been implemented.

The following features that I know of haven't been implemented.
Compare with PyTorch version:

The model input prefixes not include {"[gMASK]", "sop", "<|user|>", "_", "<0x0A>"}, while the suffix not includes {"<|assistant|>"}
- for example: when we input "hi", after tokenizer it is {"[gMASK]", "sop", "<|user|>", "_", "<0x0A>", "hi", "<|assistant|>"}. Implement this feature, what we need to changed in llama.cpp?
- when I add 9a8db6b, and exec below command，The changes have not taken effect.

./build/bin/main -m ~/models/chatglm3-6b-Q4_K_M.gguf --verbose-prompt -p 你好

The inference results are incorrect with the CUDA version.

below is some link about chatglm model
The Hugging Face model path for chatglm3-6b: https://huggingface.co/THUDM/chatglm3-6b
gguf model: https://modelscope.cn/api/v1/models/mnlife/chatglm3-6b-gguf/repo?Revision=master&FilePath=chatglm3-6b-Q4_K_M.gguf

github-actions · 2024-04-30T06:44:48Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 544 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8601.62ms p(95)=21255.69ms fails=, finish reason: stop=497 truncated=47
Prompt processing (pp): avg=99.69tk/s p(95)=440.7tk/s
Token generation (tg): avg=36.76tk/s p(95)=48.73tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=chatglm3 commit=f4a6c2fe271b4937e118d66517bde1f5e706ba24

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 544 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716538966 --> 1716539596
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 313.34, 313.34, 313.34, 313.34, 313.34, 528.85, 528.85, 528.85, 528.85, 528.85, 567.52, 567.52, 567.52, 567.52, 567.52, 586.04, 586.04, 586.04, 586.04, 586.04, 664.08, 664.08, 664.08, 664.08, 664.08, 686.47, 686.47, 686.47, 686.47, 686.47, 710.25, 710.25, 710.25, 710.25, 710.25, 731.25, 731.25, 731.25, 731.25, 731.25, 735.25, 735.25, 735.25, 735.25, 735.25, 767.14, 767.14, 767.14, 767.14, 767.14, 789.48, 789.48, 789.48, 789.48, 789.48, 803.87, 803.87, 803.87, 803.87, 803.87, 812.56, 812.56, 812.56, 812.56, 812.56, 819.15, 819.15, 819.15, 819.15, 819.15, 820.97, 820.97, 820.97, 820.97, 820.97, 825.38, 825.38, 825.38, 825.38, 825.38, 826.83, 826.83, 826.83, 826.83, 826.83, 847.41, 847.41, 847.41, 847.41, 847.41, 843.41, 843.41, 843.41, 843.41, 843.41, 844.91, 844.91, 844.91, 844.91, 844.91, 849.54, 849.54, 849.54, 849.54, 849.54, 853.26, 853.26, 853.26, 853.26, 853.26, 853.89, 853.89, 853.89, 853.89, 853.89, 803.03, 803.03, 803.03, 803.03, 803.03, 805.97, 805.97, 805.97, 805.97, 805.97, 808.09, 808.09, 808.09, 808.09, 808.09, 818.4, 818.4, 818.4, 818.4, 818.4, 819.88, 819.88, 819.88, 819.88, 819.88, 818.31, 818.31, 818.31, 818.31, 818.31, 819.95, 819.95, 819.95, 819.95, 819.95, 824.01, 824.01, 824.01, 824.01, 824.01, 821.53, 821.53, 821.53, 821.53, 821.53, 825.72, 825.72, 825.72, 825.72, 825.72, 815.13, 815.13, 815.13, 815.13, 815.13, 819.94, 819.94, 819.94, 819.94, 819.94, 828.79, 828.79, 828.79, 828.79, 828.79, 827.48, 827.48, 827.48, 827.48, 827.48, 824.48, 824.48, 824.48, 824.48, 824.48, 824.91, 824.91, 824.91, 824.91, 824.91, 828.32, 828.32, 828.32, 828.32, 828.32, 829.45, 829.45, 829.45, 829.45, 829.45, 841.01, 841.01, 841.01, 841.01, 841.01, 845.5, 845.5, 845.5, 845.5, 845.5, 845.37, 845.37, 845.37, 845.37, 845.37, 844.6, 844.6, 844.6, 844.6, 844.6, 842.02, 842.02, 842.02, 842.02, 842.02, 845.38, 845.38, 845.38, 845.38, 845.38, 841.75, 841.75, 841.75, 841.75, 841.75, 840.77, 840.77, 840.77, 840.77, 840.77, 843.24, 843.24, 843.24, 843.24, 843.24, 842.97, 842.97, 842.97, 842.97, 842.97, 846.51, 846.51, 846.51, 846.51, 846.51, 847.62, 847.62, 847.62, 847.62, 847.62, 848.31, 848.31, 848.31, 848.31, 848.31, 855.08, 855.08, 855.08, 855.08, 855.08, 854.22, 854.22, 854.22, 854.22, 854.22, 855.05, 855.05, 855.05, 855.05, 855.05, 856.05, 856.05, 856.05, 856.05, 856.05, 855.98, 855.98, 855.98, 855.98, 855.98, 858.74, 858.74, 858.74, 858.74, 858.74, 861.17]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 544 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716538966 --> 1716539596
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 38.81, 38.81, 38.81, 38.81, 38.81, 40.77, 40.77, 40.77, 40.77, 40.77, 31.57, 31.57, 31.57, 31.57, 31.57, 31.05, 31.05, 31.05, 31.05, 31.05, 31.92, 31.92, 31.92, 31.92, 31.92, 32.81, 32.81, 32.81, 32.81, 32.81, 34.2, 34.2, 34.2, 34.2, 34.2, 35.0, 35.0, 35.0, 35.0, 35.0, 35.55, 35.55, 35.55, 35.55, 35.55, 35.98, 35.98, 35.98, 35.98, 35.98, 35.56, 35.56, 35.56, 35.56, 35.56, 35.12, 35.12, 35.12, 35.12, 35.12, 35.03, 35.03, 35.03, 35.03, 35.03, 34.16, 34.16, 34.16, 34.16, 34.16, 31.74, 31.74, 31.74, 31.74, 31.74, 30.34, 30.34, 30.34, 30.34, 30.34, 30.49, 30.49, 30.49, 30.49, 30.49, 30.64, 30.64, 30.64, 30.64, 30.64, 30.4, 30.4, 30.4, 30.4, 30.4, 30.42, 30.42, 30.42, 30.42, 30.42, 30.31, 30.31, 30.31, 30.31, 30.31, 30.24, 30.24, 30.24, 30.24, 30.24, 30.48, 30.48, 30.48, 30.48, 30.48, 30.59, 30.59, 30.59, 30.59, 30.59, 30.9, 30.9, 30.9, 30.9, 30.9, 31.12, 31.12, 31.12, 31.12, 31.12, 31.09, 31.09, 31.09, 31.09, 31.09, 31.31, 31.31, 31.31, 31.31, 31.31, 31.38, 31.38, 31.38, 31.38, 31.38, 31.57, 31.57, 31.57, 31.57, 31.57, 31.63, 31.63, 31.63, 31.63, 31.63, 31.77, 31.77, 31.77, 31.77, 31.77, 31.86, 31.86, 31.86, 31.86, 31.86, 31.95, 31.95, 31.95, 31.95, 31.95, 31.85, 31.85, 31.85, 31.85, 31.85, 31.62, 31.62, 31.62, 31.62, 31.62, 31.26, 31.26, 31.26, 31.26, 31.26, 30.99, 30.99, 30.99, 30.99, 30.99, 31.06, 31.06, 31.06, 31.06, 31.06, 31.25, 31.25, 31.25, 31.25, 31.25, 31.34, 31.34, 31.34, 31.34, 31.34, 31.33, 31.33, 31.33, 31.33, 31.33, 31.08, 31.08, 31.08, 31.08, 31.08, 31.04, 31.04, 31.04, 31.04, 31.04, 29.5, 29.5, 29.5, 29.5, 29.5, 29.34, 29.34, 29.34, 29.34, 29.34, 29.31, 29.31, 29.31, 29.31, 29.31, 29.27, 29.27, 29.27, 29.27, 29.27, 29.26, 29.26, 29.26, 29.26, 29.26, 29.27, 29.27, 29.27, 29.27, 29.27, 29.27, 29.27, 29.27, 29.27, 29.27, 29.36, 29.36, 29.36, 29.36, 29.36, 29.37, 29.37, 29.37, 29.37, 29.37, 29.24, 29.24, 29.24, 29.24, 29.24, 29.3, 29.3, 29.3, 29.3, 29.3, 29.21, 29.21, 29.21, 29.21, 29.21, 29.26, 29.26, 29.26, 29.26, 29.26, 29.44, 29.44, 29.44, 29.44, 29.44, 29.6, 29.6, 29.6, 29.6, 29.6, 29.69, 29.69, 29.69, 29.69, 29.69, 29.74]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 544 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716538966 --> 1716539596
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.13, 0.13, 0.13, 0.13, 0.13, 0.44, 0.44, 0.44, 0.44, 0.44, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.21, 0.21, 0.21, 0.21, 0.21, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.21, 0.21, 0.21, 0.21, 0.21, 0.24, 0.24, 0.24, 0.24, 0.24, 0.15, 0.15, 0.15, 0.15, 0.15, 0.36, 0.36, 0.36, 0.36, 0.36, 0.4, 0.4, 0.4, 0.4, 0.4, 0.46, 0.46, 0.46, 0.46, 0.46, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.3, 0.3, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.29, 0.29, 0.29, 0.29, 0.29, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.13, 0.13, 0.13, 0.13, 0.13, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.3, 0.3, 0.3, 0.3, 0.3, 0.25, 0.25, 0.25, 0.25, 0.25, 0.4, 0.4, 0.4, 0.4, 0.4, 0.35, 0.35, 0.35, 0.35, 0.35, 0.17, 0.17, 0.17, 0.17, 0.17, 0.08, 0.08, 0.08, 0.08, 0.08, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.4, 0.4, 0.4, 0.4, 0.4, 0.57, 0.57, 0.57, 0.57, 0.57, 0.51, 0.51, 0.51, 0.51, 0.51, 0.55, 0.55, 0.55, 0.55, 0.55, 0.13, 0.13, 0.13, 0.13, 0.13, 0.27, 0.27, 0.27, 0.27, 0.27, 0.25, 0.25, 0.25, 0.25, 0.25, 0.24, 0.24, 0.24, 0.24, 0.24, 0.2, 0.2, 0.2, 0.2, 0.2, 0.18, 0.18, 0.18, 0.18, 0.18, 0.19, 0.19, 0.19, 0.19, 0.19, 0.26, 0.26, 0.26, 0.26, 0.26, 0.1, 0.1, 0.1, 0.1, 0.1, 0.26, 0.26, 0.26, 0.26, 0.26, 0.16, 0.16, 0.16, 0.16, 0.16, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13, 0.13, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 0.15, 0.15, 0.23]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 544 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716538966 --> 1716539596
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0]

convert-hf-to-gguf.py

https://hf-mirror.com/THUDM/chatglm3-6b Signed-off-by: XingXing Qiao <[email protected]>

Signed-off-by: XingXing Qiao <[email protected]>

mnlife force-pushed the chatglm3 branch from bb6bc4a to 9a8db6b Compare April 30, 2024 07:10

mofosyne added help wanted Extra attention is needed enhancement New feature or request review complexity : medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 9, 2024

compilade reviewed May 9, 2024

View reviewed changes

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

mofosyne self-assigned this May 10, 2024

mofosyne force-pushed the chatglm3 branch from c3804d0 to 95dabb1 Compare May 10, 2024 12:31

mofosyne removed their assignment May 10, 2024

mnlife force-pushed the chatglm3 branch from 95dabb1 to 398fecb Compare May 15, 2024 02:45

mofosyne marked this pull request as ready for review May 15, 2024 03:12

compilade reviewed May 15, 2024

View reviewed changes

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

mnlife force-pushed the chatglm3 branch 2 times, most recently from 8aee20e to cb324f4 Compare May 15, 2024 05:28

compilade reviewed May 16, 2024

View reviewed changes

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

convert-hf-to-gguf.py Show resolved Hide resolved

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

mnlife force-pushed the chatglm3 branch 2 times, most recently from ed1d3ff to 9226518 Compare May 23, 2024 08:14

github-actions bot added testing Everything test related python python script changes labels May 23, 2024

xingxingqiao added 2 commits May 24, 2024 14:11

add chatglm3-6b model support huggingface model:

7a45dbf

https://hf-mirror.com/THUDM/chatglm3-6b Signed-off-by: XingXing Qiao <[email protected]>

remove .rotary_pos_emb.inv_freq and unuse code for chatglm3 model

4b972d2

Signed-off-by: XingXing Qiao <[email protected]>

mnlife force-pushed the chatglm3 branch from 9226518 to 2911e90 Compare May 24, 2024 06:16

xingxingqiao added 2 commits May 24, 2024 16:17

fix lint error

b9c5801

Signed-off-by: XingXing Qiao <[email protected]>

optimize convert-hf-to-gguf.py for chatglm model

f4a6c2f

Signed-off-by: XingXing Qiao <[email protected]>

mnlife force-pushed the chatglm3 branch from 2911e90 to f4a6c2f Compare May 24, 2024 08:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add chatglm3-6b model support [help wanted] #6999

add chatglm3-6b model support [help wanted] #6999

mnlife commented Apr 30, 2024 •

edited

github-actions bot commented Apr 30, 2024 •

edited

add chatglm3-6b model support [help wanted] #6999

Are you sure you want to change the base?

add chatglm3-6b model support [help wanted] #6999

Conversation

mnlife commented Apr 30, 2024 • edited

github-actions bot commented Apr 30, 2024 • edited

mnlife commented Apr 30, 2024 •

edited

github-actions bot commented Apr 30, 2024 •

edited