New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用fp8 后微调速度特别慢 #355
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
finetune_moss.py 中修改如下
accelerator = Accelerator(mixed_precision='fp8')
环境用的nvidia的容器 nvcr.io/nvidia/pytorch:23.06-py3
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
因计算卡显存不足,DeepSpeed offload cpu
修改 sft.yaml 如下
command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: cpu
offload_param_device: cpu
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: null
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp8
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false
我设置fp8格式微调后,训练速度变慢,是怎么回事呢?
DeepSpeed v0.9.5
FP8 unittest for H100 by @jomayeri in microsoft/DeepSpeed#3731
难道是DeepSpeed offload cpu 后,cpu不支持fp8导致的? 我的cpu是Intel® Xeon® w9-3495X Processor
The text was updated successfully, but these errors were encountered: