Freeze manually #2

G-JWLee · 2023-03-02T08:08:46Z

Hi, thank you for your great work.

I want to use yours for my experiment.

I wonder get_lora_params() would load parameters to optimizer, but if the model itself can compute gradient, wouldn't the model still compute gradient?

Would be freezing the model enough for using minlora without the get_lora_params?

Also, when merging lora to the model to have another lora module, should I have to set lora_A and lora_B requires_grad=False before merging?

Thank you.

cccntu · 2023-03-02T11:34:01Z

Hi, thanks!

I wonder get_lora_params() would load parameters to optimizer, but if the model itself can compute gradient, wouldn't the model still compute gradient?
Would be freezing the model enough for using minlora without the get_lora_params?

Probably yes, but you need to make sure you don't accidentally freeze the lora parameters.

Also, when merging lora to the model to have another lora module, should I have to set lora_A and lora_B requires_grad=False before merging?

Probably not. After merging, lora_A and lora_B will no longer exist.

G-JWLee · 2023-03-02T14:36:54Z

Thank you for your kind reply.

However, in the example in https://github.com/cccntu/LoRAnanoGPT/blob/master/train.py, line 236, it uses DDP without 'find_unused_parameters=True' argument.
When I work on my own experiment on other setting with DDP, since backbone model has requires_grad=False, I get error message since backbone model parameters are not used for gradient computation when not specifying 'find_unused_parameters=True'.
Is there something that I missed? I believe this API works with DDP.

Thnak you!

cccntu · 2023-03-03T02:04:29Z

Honestly I don't know. Can you solve it by simply adding 'find_unused_parameters=True'?

I've only used it on one GPU.

Or does using get_lora_parameter solve this issue?

justindachille · 2023-05-28T22:40:41Z

It looks like this method is correct in the sense that it only updates the parameters you pass in to the optimizer, but Torch will still compute gradients for all weights, as requires_grad is still True, according to this thread:

https://discuss.pytorch.org/t/passing-a-subset-of-the-parameters-to-an-optimizer-equivalent-to-setting-requires-grad-of-subset-only-to-true/42866/2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Freeze manually #2

Freeze manually #2

G-JWLee commented Mar 2, 2023 •

edited

cccntu commented Mar 2, 2023

G-JWLee commented Mar 2, 2023

cccntu commented Mar 3, 2023

justindachille commented May 28, 2023

Freeze manually #2

Freeze manually #2

Comments

G-JWLee commented Mar 2, 2023 • edited

cccntu commented Mar 2, 2023

G-JWLee commented Mar 2, 2023

cccntu commented Mar 3, 2023

justindachille commented May 28, 2023

G-JWLee commented Mar 2, 2023 •

edited