Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DoRA training in distributed setting #1731

Open
4 tasks
BenjaminBossan opened this issue May 14, 2024 · 0 comments
Open
4 tasks

DoRA training in distributed setting #1731

BenjaminBossan opened this issue May 14, 2024 · 0 comments

Comments

@BenjaminBossan
Copy link
Member

System Info

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

Expected behavior

As reported by @winglian

with deepspeed zero3:

  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 516, in forward                                                                                                                          
    output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias)                                                                                                                                                   
    RuntimeErroroutput = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias):                                                                                                                                      
mat1 and mat2 shapes cannot be multiplied (512x4096 and 1x4194304)                                                                                                                                                                                    
RuntimeError: mat1 and mat2 shapes cannot be multiplied (512x4096 and 1x4194304)  

with FSDP:

  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 436, in forward
    output = module._old_forward(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 436, in forward
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/peft/tuners/lora/bnb.py", line 476, in forward
    key_states = self.k_proj(hidden_states)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    key_states = self.k_proj(hidden_states)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    output = self._apply_dora(x, lora_A, lora_B, scaling, active_adapter)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 226, in _apply_dora
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/peft/tuners/lora/bnb.py", line 476, in forward
    lora_weight = lora_B.weight @ lora_A.weight
RuntimeError: inconsistent tensor size, expected tensor [820] and src [3277] to have the same number of elements, but got 820 and 3277 elements respectively
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant