Speed up benchmarks by disabling find_unused_parameters #1434

guarin · 2023-11-24T13:24:01Z

We use the ddp_find_unused_parameters_true strategy when running benchmarks:

lightly/benchmarks/imagenet/resnet50/main.py

Line 230 in caf72c5

strategy="ddp_find_unused_parameters_true",

This flag can slow down training considerably. We enabled it because some models have parameters that are not used during all training steps, for example, DINO freezes the projection head during the first epoch. But in principle we should be able to disable the flag for most models.

One special case are models with frozen backbones (EMA backbones) where the backbone parameters remain frozen during all training steps. For those models it should be possible to disable the flag but only if we disable gradients in the model __init__ method (according to this issue: Lightning-AI/pytorch-lightning#17212). Currently we use torch.no_grad() to disable gradients, disabling them with module.requires_grads_(False) should allow us to disable the flag.

For some models it should also be possible to set static_graph=True (https://lightning.ai/docs/pytorch/latest/advanced/ddp_optimizations.html#ddp-static-graph) for further speedups.

Todo

Set ddp_find_unused_parameters_false in benchmarks/imagenet/resnet50/main.py and check which models work with it
Check which models we can easily fix to support disabling the flag
For models that do not support disabling the flag, we can add a "strategy" entry to the METHODS dict at the top of main.py and then use this one to set the training argument
Check if we get a speedup
Check if we can set static_graph=True for some models

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up benchmarks by disabling find_unused_parameters #1434

Speed up benchmarks by disabling find_unused_parameters #1434

guarin commented Nov 24, 2023

Speed up benchmarks by disabling find_unused_parameters #1434

Speed up benchmarks by disabling find_unused_parameters #1434

Comments

guarin commented Nov 24, 2023

Todo