Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up benchmarks by disabling find_unused_parameters #1434

Open
5 tasks
guarin opened this issue Nov 24, 2023 · 0 comments
Open
5 tasks

Speed up benchmarks by disabling find_unused_parameters #1434

guarin opened this issue Nov 24, 2023 · 0 comments

Comments

@guarin
Copy link
Contributor

guarin commented Nov 24, 2023

We use the ddp_find_unused_parameters_true strategy when running benchmarks:

strategy="ddp_find_unused_parameters_true",

This flag can slow down training considerably. We enabled it because some models have parameters that are not used during all training steps, for example, DINO freezes the projection head during the first epoch. But in principle we should be able to disable the flag for most models.

One special case are models with frozen backbones (EMA backbones) where the backbone parameters remain frozen during all training steps. For those models it should be possible to disable the flag but only if we disable gradients in the model __init__ method (according to this issue: Lightning-AI/pytorch-lightning#17212). Currently we use torch.no_grad() to disable gradients, disabling them with module.requires_grads_(False) should allow us to disable the flag.

For some models it should also be possible to set static_graph=True (https://lightning.ai/docs/pytorch/latest/advanced/ddp_optimizations.html#ddp-static-graph) for further speedups.

Todo

  • Set ddp_find_unused_parameters_false in benchmarks/imagenet/resnet50/main.py and check which models work with it
  • Check which models we can easily fix to support disabling the flag
  • For models that do not support disabling the flag, we can add a "strategy" entry to the METHODS dict at the top of main.py and then use this one to set the training argument
  • Check if we get a speedup
  • Check if we can set static_graph=True for some models
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant