Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen1.5-7B Lora微调报错:RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #112

Open
feifeifei-hue opened this issue May 13, 2024 · 2 comments

Comments

@feifeifei-hue
Copy link

您好,我的代码感觉跟您几乎一模一样,但不知道为什么会报这样的错误,麻烦您给出指导意见,谢谢您~
代码如下:

from datasets import Dataset
import pandas as pd
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, DataCollatorForSeq2Seq, TrainingArguments, Trainer, GenerationConfig
from peft import LoraConfig, TaskType, get_peft_model

df = pd.read_json('test.json', encoding='utf-8')
ds = Dataset.from_pandas(df)

# print(ds[:3])

def process_func(example):
    # Llama分词器会将一个中文字切分为多个token,因此需要放开一些最大长度,保证数据的完整性
    MAX_LENGTH = 384
    input_ids, attention_mask, labels = [], [], []
    # add_special_tokens 不在开头加 special_tokens
    instruction = tokenizer(f"<|im_start|>system\nYou are a friendly and helpful assistant, please strictly follow the prompt I give you to generate content.<|im_end|>\n<|im_start|>user\n{example['instruction'] + example['input']}<|im_end|>\n<|im_start|>assistant\n", add_special_tokens=False)  
    response = tokenizer(f"{example['output']}", add_special_tokens=False)
    input_ids = instruction["input_ids"] + response["input_ids"] + [tokenizer.pad_token_id]
    # 因为eos token咱们也是要关注的所以 补充为1
    attention_mask = instruction["attention_mask"] + response["attention_mask"] + [1]  
    labels = [-100] * len(instruction["input_ids"]) + response["input_ids"] + [tokenizer.pad_token_id]  
    if len(input_ids) > MAX_LENGTH:  # 做一个截断
        input_ids = input_ids[:MAX_LENGTH]
        attention_mask = attention_mask[:MAX_LENGTH]
        labels = labels[:MAX_LENGTH]

    return {
        "input_ids": input_ids,
        "attention_mask": attention_mask,
        "labels": labels
    }

tokenized_id = ds.map(process_func, remove_columns=ds.column_names)
# print(tokenized_id)

# print(tokenizer.decode(tokenized_id[0]['input_ids']))

# print(tokenizer.decode(list(filter(lambda x: x != -100, tokenized_id[1]["labels"]))))

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-7B-Chat", use_fast=False, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    'Qwen/Qwen1.5-7B-Chat', 
    device_map="auto",
    torch_dtype=torch.bfloat16
)

# LoRA
config = LoraConfig(
    task_type=TaskType.CAUSAL_LM, 
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    inference_mode=False, # 训练模式
    r=8, # Lora 秩
    lora_alpha=32, # Lora alaph,具体作用参见 Lora 原理
    lora_dropout=0.1# Dropout 比例
)

model = get_peft_model(model, config)

print(config)

print(model.print_trainable_parameters())

args = TrainingArguments(
    output_dir="fine_tuning_output",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    logging_steps=10,
    num_train_epochs=3,
    save_steps=100,
    learning_rate=1e-4,
    save_on_each_node=True,
    gradient_checkpointing=True
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=tokenized_id,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)

trainer.train()
@KMnO4-zx
Copy link
Contributor

image

@feifeifei-hue
Copy link
Author

您好,谢谢您的解答,我这里一直设置的True;

我最终是把print(model.print_trainable_parameters()) 改为了 model.print_trainable_parameters() 解决的,即去掉了print...但是我不知道为什么这样会得到解决,您可以讲解一下吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants