请问在运行p-tung/train.sh微调时是否可以冻结prefix_encoder层的全部参数，微调模型其他Block的参数？ #1444

huilong-chen · 2024-01-09T07:30:39Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

现在我对p-tuning/modeling_chatglm.py中第850行代码起添加了如下代码：
`
if self.pre_seq_len is not None:
for param in self.parameters():
param.requires_grad = False
self.prefix_tokens = torch.arange(self.pre_seq_len).long()
self.prefix_encoder = PrefixEncoder(config)
self.dropout = torch.nn.Dropout(0.1)

        for k, v in self.prefix_encoder.named_parameters():
            v.requires_grad = False
        for k, v in self.layers[0].named_parameters():
            v.requires_grad = True

`
在继续微调时会导致梯度爆炸，loss出现nan。（LR修改成了全量微调时的1e-4）

Expected Behavior

No response

Steps To Reproduce

在p-tuning/modeling_chatglm.py中第850行代码起添加如下代码：
for k, v in self.prefix_encoder.named_parameters(): v.requires_grad = False for k, v in self.layers[0].named_parameters(): v.requires_grad = True

Environment

- OS: 
- Python:
- Transformers: 
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请问在运行p-tung/train.sh微调时是否可以冻结prefix_encoder层的全部参数，微调模型其他Block的参数？ #1444

请问在运行p-tung/train.sh微调时是否可以冻结prefix_encoder层的全部参数，微调模型其他Block的参数？ #1444

huilong-chen commented Jan 9, 2024

请问在运行p-tung/train.sh微调时是否可以冻结prefix_encoder层的全部参数，微调模型其他Block的参数？ #1444

请问在运行p-tung/train.sh微调时是否可以冻结prefix_encoder层的全部参数，微调模型其他Block的参数？ #1444

Comments

huilong-chen commented Jan 9, 2024

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?