New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'LlamaModel' object has no attribute '_gradient_checkpointing_func'. #30544
Comments
|
HI @foreverpiano |
when I try to use Llamamodel data = {"input_ids": torch.randint(0, 1000, (1, length,), device="cuda"),
"labels": torch.randint(0, 1000, (1, length,), device="cuda"),
"attention_mask": torch.ones(1, length, device="cuda")}
for i in tqdm(range(20)):
if i > 10:
time_s = time.time()
outputs = model(**data, use_cache=True) ----> fail log
for the model definition config = transformers.AutoConfig.from_pretrained(model_args.model_name_or_path)
model = transformers.AutoModelForCausalLM.from_config(config).to(device="cuda", dtype=torch.float16)
model.model.gradient_checkpointing = True
model_parameters = filter(lambda p: p.requires_grad, model.parameters())
model = FSDP(
model,
auto_wrap_policy=auto_wrap_policy,
sharding_strategy=ShardingStrategy.FULL_SHARD
#sharding_strategy=ShardingStrategy.SHARD_GRAD_OP
) |
Thanks @foreverpiano config = transformers.AutoConfig.from_pretrained(model_args.model_name_or_path)
model = transformers.AutoModelForCausalLM.from_config(config).to(device="cuda", dtype=torch.float16)
- model.model.gradient_checkpointing = True
+ model.gradient_checkpointing_enable()
model_parameters = filter(lambda p: p.requires_grad, model.parameters())
model = FSDP(
model,
auto_wrap_policy=auto_wrap_policy,
sharding_strategy=ShardingStrategy.FULL_SHARD
#sharding_strategy=ShardingStrategy.SHARD_GRAD_OP
) |
@younesbelkada Thanks for your timely reply. It works for me. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
System Info
[rank3]: File "/home/dhl/miniconda3/envs/light/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1060, in forward
[rank3]: layer_outputs = self._gradient_checkpointing_func(
[rank3]: File "/home/dhl/miniconda3/envs/light/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in getattr
[rank3]: raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
[rank3]: AttributeError: 'LlamaModel' object has no attribute '_gradient_checkpointing_func'. Did you mean: 'gradient_checkpointing'?
https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1007
Who can help?
@ArthurZucker and @younesbelkada @Narsil
The text was updated successfully, but these errors were encountered: