Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatGLM的Finetune推荐命令,使用3090 24G会OOM,代码默认使用8Bit量化同样会导致OOM #118

Open
StarrickLiu opened this issue May 9, 2023 · 1 comment

Comments

@StarrickLiu
Copy link

StarrickLiu commented May 9, 2023

Issue 1:

python3 uniform_finetune.py   --model_type chatglm --model_name_or_path THUDM/chatglm-6b \
    --data alpaca-belle-cot --lora_target_modules query_key_value \
    --lora_r 32 --lora_alpha 32 --lora_dropout 0.1 --per_gpu_train_batch_size 2 \
    --learning_rate 2e-5 --epochs 1

运行上述命令后会在训练阶段OOM:

RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 23.69 GiB total capacity; 22.48 GiB already allocated; 6.06 MiB free; 22.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

使用下述命令训练GLM顺利进入训练阶段,尚未发生OOM:

python3 uniform_finetune.py   --model_type chatglm --model_name_or_path /workspace/para/chatglm-6b \
     --data instinwild_ch --lora_target_modules query_key_value \
     --per_gpu_train_batch_size 1  --epochs 1 \
     --report_to wandb

训练时占用:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 31%   65C    P2   305W / 350W |  21916MiB / 24576MiB |     78%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Issue 2:

根据Readme所述,训练GLM时不能使用int8量化,但是finetune代码中没有判断后跳过此类的处理,会导致OOM:

image

89e97978136633b0d03e41544d61060
可以注释掉这行,注释后不会在这OOM

@ForgetThatNight
Copy link

chatglm才6b,我32G都是OutOfMemoryError,好神奇,没找到原因

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants