-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] LISA: same loss regardless of lisa_activated_layers #726
Comments
Thanks for your interest in LMFlow! You may change the first |
@research4pan why would that make a huge difference since it's only a difference during the first step? |
That's a very good question. We conjectured that's because The experiments in the paper are conducted differently by running |
@geronimi73 , we just did some tests, in your script without |
I tried this and the loss-curve was still exactly the same for me. |
this works, i finally have a distinct loss curve after removing any call to freeze from init |
It's the optimizer I think. When you hit This explains why the suggested change of |
Hi @geronimi73 I think you are right. I also think the main reason is from the optimizer. Directly using |
i am wondering if we update
UPDATE: checked failed. HOPE find other solutions, especially how to pass trainer to callaback? like Lightning-AI/pytorch-lightning#3095 (comment) |
Hello,
I do not know if this could mess up the trainer in some mysterious ways, but it seems to work for me. |
@BrunoBelucci yes that would be a solution but i'm not sure if trashing the optimizer state every few steps is a good idea. probably not. |
this answer is the same to what i have suggested in #726 (comment). Maybe that is a solution, but it needs pytorch-lightning package installed. Needs further test. |
Thanks for the fruitful discussion! We discovered a different way to let trainer recreate its optimizer (i.e. |
are you sure about this? it seems |
I came up with the code below. The def on_train_epoch_start(self, trainer: "L.Trainer", pl_module: "pl.LightningModule"):
if trainer.current_epoch % self.epoch_interval == 0:
self.switch_active_layers()
pl_module.optimizer_fn = torch.optim.Adam
trainer.strategy.setup_optimizers(trainer) |
Yes. In our coming implementation next version, we overwrite the function as well by inheriting the |
I tried it like this:
but when I tried training the loss never decreases. Another idea is to keep the same optimizer, but reset the internal state. This is similar to ReLoRA technique: https://github.com/OpenAccess-AI-Collective/axolotl/pull/1414/files |
Describe the bug
I think there might be something wrong with the current LISA implementation. There is no difference in training loss, no matter how many layers are active.
Not using LMFlow but HF Trainer with
DynamicLayerActivationCallback
from https://github.com/OptimalScale/LMFlow/blob/main/src/lmflow/pipeline/finetuner.pyTo Reproduce
model
llama2-7b
Expected behavior
lisa_activated_layers
lisa_activated_layers==32
and full finetune (without LISA) - loss curves are different, they diverge after a few stepsScreenshots
Setup
2x 3090
The text was updated successfully, but these errors were encountered: