'LlamaModel' object has no attribute '_gradient_checkpointing_func'. #30544

foreverpiano · 2024-04-29T14:17:15Z

System Info

[rank3]: File "/home/dhl/miniconda3/envs/light/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1060, in forward
[rank3]: layer_outputs = self._gradient_checkpointing_func(
[rank3]: File "/home/dhl/miniconda3/envs/light/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in getattr
[rank3]: raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
[rank3]: AttributeError: 'LlamaModel' object has no attribute '_gradient_checkpointing_func'. Did you mean: 'gradient_checkpointing'?

https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1007

Who can help?

@ArthurZucker and @younesbelkada @Narsil

foreverpiano · 2024-04-29T14:18:32Z

accelerate==0.29.3
certifi==2024.2.2
charset-normalizer==3.3.2
einops==0.8.0
exceptiongroup==1.2.1
filelock==3.13.4
flash-attn==2.5.8
fsspec==2024.3.1
huggingface-hub==0.22.2
idna==3.7
iniconfig==2.0.0
Jinja2==3.1.3
MarkupSafe==2.1.5
mpmath==1.3.0
networkx==3.3
ninja==1.11.1.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.1.105
packaging==24.0
pillow==10.3.0
pip==23.3.1
pluggy==1.5.0
psutil==5.9.8
pytest==8.2.0
PyYAML==6.0.1
regex==2024.4.16
requests==2.31.0
safetensors==0.4.3
setuptools==68.2.2
sympy==1.12
tokenizers==0.15.2
tomli==2.0.1
torch==2.3.0
torchaudio==2.3.0
torchvision==0.18.0
tqdm==4.66.2
transformers==4.37.2
triton==2.3.0
typing_extensions==4.11.0
urllib3==2.2.1
wheel==0.41.2

younesbelkada · 2024-04-30T08:26:56Z

HI @foreverpiano
Can you share more details about the script that you are running?

foreverpiano · 2024-04-30T12:06:21Z

when I try to use Llamamodel

        data = {"input_ids": torch.randint(0, 1000, (1, length,), device="cuda"),
                "labels": torch.randint(0, 1000, (1, length,), device="cuda"), 
                "attention_mask": torch.ones(1, length, device="cuda")}
        for i in tqdm(range(20)):
            if i > 10:
                time_s = time.time()
            outputs = model(**data, use_cache=True) ----> fail

log

[rank3]:   File "/home/dhl/LongChat-dev/longchat/train/fine_tune/train_no_trainer.py", line 166, in train
[rank3]:     outputs = model(**data, use_cache=True)
[rank3]:   File "/home/dhl/miniconda3/envs/light/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank3]:     return self._call_impl(*args, **kwargs)
[rank3]:   File "/home/dhl/miniconda3/envs/light/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank3]:     return forward_call(*args, **kwargs)
[rank3]:   File "/home/dhl/miniconda3/envs/light/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward
[rank3]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank3]:   File "/home/dhl/miniconda3/envs/light/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank3]:     return self._call_impl(*args, **kwargs)
[rank3]:   File "/home/dhl/miniconda3/envs/light/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank3]:     return forward_call(*args, **kwargs)
[rank3]:   File "/home/dhl/miniconda3/envs/light/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward
[rank3]:     outputs = self.model(
[rank3]:   File "/home/dhl/miniconda3/envs/light/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank3]:     return self._call_impl(*args, **kwargs)
[rank3]:   File "/home/dhl/miniconda3/envs/light/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank3]:     return forward_call(*args, **kwargs)
[rank3]:   File "/home/dhl/miniconda3/envs/light/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1060, in forward
[rank3]:     layer_outputs = self._gradient_checkpointing_func(
[rank3]:   File "/home/dhl/miniconda3/envs/light/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
[rank3]:     raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
[rank3]: AttributeError: 'LlamaModel' object has no attribute '_gradient_checkpointing_func'. Did you mean: 'gradient_checkpointing'?

for the model definition

    config = transformers.AutoConfig.from_pretrained(model_args.model_name_or_path)
model = transformers.AutoModelForCausalLM.from_config(config).to(device="cuda", dtype=torch.float16)
    model.model.gradient_checkpointing = True
    model_parameters = filter(lambda p: p.requires_grad, model.parameters())


    model = FSDP(
        model,
        auto_wrap_policy=auto_wrap_policy,
        sharding_strategy=ShardingStrategy.FULL_SHARD
        #sharding_strategy=ShardingStrategy.SHARD_GRAD_OP
        )

foreverpiano · 2024-04-30T12:07:18Z

@younesbelkada

younesbelkada · 2024-04-30T12:09:10Z

Thanks @foreverpiano
Can you try:

    config = transformers.AutoConfig.from_pretrained(model_args.model_name_or_path)
model = transformers.AutoModelForCausalLM.from_config(config).to(device="cuda", dtype=torch.float16)
-   model.model.gradient_checkpointing = True
+   model.gradient_checkpointing_enable()
    model_parameters = filter(lambda p: p.requires_grad, model.parameters())


    model = FSDP(
        model,
        auto_wrap_policy=auto_wrap_policy,
        sharding_strategy=ShardingStrategy.FULL_SHARD
        #sharding_strategy=ShardingStrategy.SHARD_GRAD_OP
        )

foreverpiano · 2024-04-30T14:53:33Z

@younesbelkada Thanks for your timely reply. It works for me.

foreverpiano closed this as completed Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'LlamaModel' object has no attribute '_gradient_checkpointing_func'. #30544

'LlamaModel' object has no attribute '_gradient_checkpointing_func'. #30544

foreverpiano commented Apr 29, 2024 •

edited

foreverpiano commented Apr 29, 2024

younesbelkada commented Apr 30, 2024

foreverpiano commented Apr 30, 2024 •

edited

foreverpiano commented Apr 30, 2024

younesbelkada commented Apr 30, 2024

foreverpiano commented Apr 30, 2024

'LlamaModel' object has no attribute '_gradient_checkpointing_func'. #30544

'LlamaModel' object has no attribute '_gradient_checkpointing_func'. #30544

Comments

foreverpiano commented Apr 29, 2024 • edited

System Info

Who can help?

foreverpiano commented Apr 29, 2024

younesbelkada commented Apr 30, 2024

foreverpiano commented Apr 30, 2024 • edited

foreverpiano commented Apr 30, 2024

younesbelkada commented Apr 30, 2024

foreverpiano commented Apr 30, 2024

foreverpiano commented Apr 29, 2024 •

edited

foreverpiano commented Apr 30, 2024 •

edited