You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I try to use DeepSpeed Zero Stage 3, it seems that the model gets replicated on all the GPUs, instead of being sharded. I get OOM issues when finetuning model. I am trying to use a context length of 2048 and ViT with 336 resolution.
Could you please suggest what I might be doing wrong here?
Hi @haotian-liu !
Interesting work around LLaVa!
Issue:
I am trying to finetune LLaVa using 8 X H100.
When I try to use DeepSpeed Zero Stage 3, it seems that the model gets replicated on all the GPUs, instead of being sharded. I get OOM issues when finetuning model. I am trying to use a context length of 2048 and ViT with 336 resolution.
Could you please suggest what I might be doing wrong here?
Command:
When I run the model using
CUDA_VISIBLE_DEVICES=0 bash ./scripts/sample_stage3.sh
, the memory usage before training is:However, when I am using the stage 3 deepspeed, the GPU usage before training is
And the model gets OOM after this. Could you please suggest what flag we might need to change?
The text was updated successfully, but these errors were encountered: