Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

咨询:4 bit量化Qwen72B的模型,需要多大的GPU?我采用4 * A40 (4 * 48GB),量化进度到46%的时候OOM了。 #3663

Closed
1 task done
camposs1979 opened this issue May 9, 2024 · 2 comments
Labels
solved This problem has been already solved.

Comments

@camposs1979
Copy link

Reminder

  • I have read the README and searched the existing issues.

Reproduction

CUDA_VISIBLE_DEVICES=0,1,2,3 python3.10 export_model.py
--model_name_or_path /hy-tmp/models/Qwen1.5-72B-Chat-sft
--export_quantization_bit 4
--export_quantization_dataset ../data/c4_demo.json
--template qwen
--export_dir ../../models/Qwen1.5-72B-Chat-sft-INT4
--export_size 2
--export_device cpu
--export_legacy_format False
......
Quantizing model.layers blocks : 46%|███████████████████████████████████████▎ | 37/80 [38:12<44:24, 61.96s/it]
Traceback (most recent call last):
File "/hy-tmp/LLaMA-Factory-main/src/export_model.py", line 8, in
main()
File "/hy-tmp/LLaMA-Factory-main/src/export_model.py", line 4, in main
export_model()
File "/hy-tmp/LLaMA-Factory-main/src/llmtuner/train/tuner.py", line 57, in export_model
model = load_model(tokenizer, model_args, finetuning_args) # must after fixing tokenizer to resize vocab
File "/hy-tmp/LLaMA-Factory-main/src/llmtuner/model/loader.py", line 128, in load_model
model = AutoModelForCausalLM.from_pretrained(**init_kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3592, in from_pretrained
hf_quantizer.postprocess_model(model)
File "/usr/local/lib/python3.10/dist-packages/transformers/quantizers/base.py", line 195, in postprocess_model
return self._process_model_after_weight_loading(model, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/quantizers/quantizer_gptq.py", line 85, in _process_model_after_weight_loading
self.optimum_quantizer.quantize_model(model, self.quantization_config.tokenizer)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/gptq/quantizer.py", line 506, in quantize_model
scale, zero, g_idx = gptq[name].fasterquant(
File "/usr/local/lib/python3.10/dist-packages/auto_gptq/quantization/gptq.py", line 117, in fasterquant
H = torch.cholesky_inverse(H)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.25 GiB. GPU 1 has a total capacty of 47.33 GiB of which 943.44 MiB is free. Process 2716737 has 46.40 GiB memory in use. Of the allocated memory 44.18 GiB is allocated by PyTorch, and 1.88 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Expected behavior

了解量化72B Qwen1.5到底需要多大的GPU

System Info

(base) root@I19c2837ff800901ccf:/# python3.10 -m pip list
Package Version


accelerate 0.28.0
addict 2.4.0
aiofiles 23.2.1
aiohttp 3.9.3
aiosignal 1.3.1
aliyun-python-sdk-core 2.15.0
aliyun-python-sdk-kms 2.16.2
altair 5.2.0
annotated-types 0.6.0
anyio 4.3.0
async-timeout 4.0.3
attrs 23.2.0
auto_gptq 0.7.1
bitsandbytes 0.43.0
certifi 2019.11.28
cffi 1.16.0
chardet 3.0.4
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 3.0.0
cmake 3.29.2
coloredlogs 15.0.1
contourpy 1.2.0
crcmod 1.7
cryptography 42.0.5
cupy-cuda12x 12.1.0
cycler 0.12.1
datasets 2.18.0
dbus-python 1.2.16
deepspeed 0.14.0
dill 0.3.8
diskcache 5.6.3
distro 1.4.0
distro-info 0.23ubuntu1
docstring_parser 0.16
einops 0.7.0
exceptiongroup 1.2.0
fastapi 0.110.0
fastrlock 0.8.2
ffmpy 0.3.2
filelock 3.13.3
fire 0.6.0
fonttools 4.50.0
frozenlist 1.4.1
fsspec 2024.2.0
galore-torch 1.0
gast 0.5.4
gekko 1.0.7
gradio 4.10.0
gradio_client 0.7.3
h11 0.14.0
hjson 3.1.0
httpcore 1.0.4
httptools 0.6.1
httpx 0.27.0
huggingface-hub 0.22.0
humanfriendly 10.0
idna 2.8
importlib_metadata 7.1.0
importlib_resources 6.4.0
interegular 0.3.3
Jinja2 3.1.3
jmespath 0.10.0
joblib 1.3.2
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
lark 1.1.9
llvmlite 0.42.0
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.8.3
mdurl 0.1.2
modelscope 1.13.3
mpmath 1.3.0
msgpack 1.0.8
multidict 6.0.5
multiprocess 0.70.16
nest-asyncio 1.6.0
networkx 3.2.1
ninja 1.11.1.1
numba 0.59.1
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.4.99
nvidia-nvtx-cu12 12.1.105
optimum 1.16.0
orjson 3.9.15
oss2 2.18.4
outlines 0.0.34
packaging 24.0
pandas 2.2.1
peft 0.10.0
pillow 10.2.0
pip 24.0
platformdirs 4.2.0
prometheus_client 0.20.0
protobuf 5.26.0
psutil 5.9.8
py-cpuinfo 9.0.0
pyarrow 15.0.2
pyarrow-hotfix 0.6
pycparser 2.21
pycryptodome 3.20.0
pydantic 2.6.4
pydantic_core 2.16.3
pydub 0.25.1
Pygments 2.17.2
PyGObject 3.36.0
pynvml 11.5.0
pyparsing 3.1.2
python-apt 2.0.1+ubuntu0.20.4.1
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0.1
ray 2.10.0
referencing 0.34.0
regex 2023.12.25
requests 2.31.0
requests-unixsocket 0.2.0
rich 13.7.1
rouge 1.0.1
rpds-py 0.18.0
safetensors 0.4.2
scipy 1.12.0
semantic-version 2.10.0
sentencepiece 0.2.0
setuptools 69.2.0
shellingham 1.5.4
shtab 1.7.1
simplejson 3.19.2
six 1.14.0
sniffio 1.3.1
sortedcontainers 2.4.0
sse-starlette 2.0.0
ssh-import-id 5.10
starlette 0.36.3
sympy 1.12
termcolor 2.4.0
tiktoken 0.6.0
tokenizers 0.15.2
tomli 2.0.1
tomlkit 0.12.0
toolz 0.12.1
torch 2.1.2
tqdm 4.66.2
transformers 4.39.1
triton 2.1.0
trl 0.8.1
typer 0.12.3
typing_extensions 4.10.0
tyro 0.7.3
tzdata 2024.1
unattended-upgrades 0.1
urllib3 2.2.1
uvicorn 0.29.0
uvloop 0.19.0
vllm 0.4.0
watchfiles 0.21.0
websockets 11.0.3
wheel 0.34.2
xformers 0.0.23.post1
xxhash 3.4.1
yapf 0.40.2
yarl 1.9.4
zipp 3.18.1

Others

No response

@camposs1979
Copy link
Author

补充,我又换了一个6 * A40运行,现在还在跑的过程中,但是从GPU利用来看,分布很不均匀,有大佬这是因为什么么?
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A40 On | 00000000:35:00.0 Off | Off |
| 0% 77C P0 290W / 300W | 30522MiB / 49140MiB | 100% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A40 On | 00000000:39:00.0 Off | Off |
| 0% 37C P0 75W / 300W | 25530MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA A40 On | 00000000:3D:00.0 Off | Off |
| 0% 36C P0 73W / 300W | 25530MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA A40 On | 00000000:9C:00.0 Off | Off |
| 0% 34C P0 72W / 300W | 25530MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA A40 On | 00000000:9D:00.0 Off | Off |
| 0% 36C P0 72W / 300W | 25530MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA A40 On | 00000000:A0:00.0 Off | Off |
| 0% 37C P0 75W / 300W | 16118MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

@hiyouga
Copy link
Owner

hiyouga commented May 11, 2024

建议使用更大的显存

@hiyouga hiyouga added the solved This problem has been already solved. label May 11, 2024
@hiyouga hiyouga closed this as completed May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved.
Projects
None yet
Development

No branches or pull requests

2 participants