Add LearningRateMultiplier wrapper for optimizers #396

stante · 2019-01-07T18:40:09Z

Summary

Optimizer have a model global learning rate. This PR adds a wrapper, which can be used with existing optimizers to provide a facility to specify different learning rates per layers in a network. The per layer learning rate is specified as a factor, which is multiplied with the learning rate of the wrapped optimizer. This wrapper can be used in the following way:

multipliers = {'dense_1': 0.5, 'dense_2': 0.4}
opt = LearningRateMultiplier(SGD, lr_multipliers=multipliers, lr=0.001, momentum=0.9)

The example wrappes SGD and specifies lr and momentum for it. The layer which contain the string 'dense_1' has a multiplier of 0.5 and the layer which contains the string dense_2 has the multiplier of 0.4.

Different multipliers for kernel and bias can be specified with:

multipliers = {'dense_1/kernel': 0.5, 'dense_1/bias': 0.1}

Related Issues

There are issues regarding this topic in keras keras-team/keras#11934, keras-team/keras#7912 and partially keras-team/keras#5920

gabrieldemarmiesse · 2019-01-07T19:03:38Z

It seems there is some pep8 errors and that the code isn't compatible with python 2 because of super() . Super takes two arguments in python 2. Usually it's the class and self.

gabrieldemarmiesse · 2019-01-07T19:05:11Z

You can find out more about the errors by looking at the travis logs.

gabrieldemarmiesse

Thanks a lot for working on that. Many people asked for this feature, it's very welcome. Since your optimizer is quite special, (an optimizer inside an optimizer) we'll make sure that we minimize the amount of hackyness so that it works in as many cases as possible. See my comments. If you have any questions/problems, feel free to ask for help.

gabrieldemarmiesse · 2019-01-09T16:14:40Z

keras_contrib/optimizers/lr_multiplier.py

+ learning rate of the optimizer.
+
+ Note: This is a wrapper and does not implement any
+ optimization algorithm.


What about two examples?

One where you specify manually the learning rate by using the strings as keys {'conv_1/kernel':0.5, 'conv_1/bias':0.1}

One where you programmatically set the learning rates by iterating through the layers of the model (for big models this is useful). I suppose that it should be possible with a for loop and getting the layer.name as the key of the dictionary.

gabrieldemarmiesse · 2019-01-09T16:15:18Z

keras_contrib/optimizers/lr_multiplier.py

+ # Arguments
+ optimizer: An optimizer class to be wrapped.
+ lr_multipliers: Dictionary of the per layer factors. For
+ example `optimizer={'conv_1/kernel':0.5, 'conv_1/bias':0.1}`.


Typo: the keyword is lr_multipliers.

gabrieldemarmiesse · 2019-01-09T16:16:09Z

keras_contrib/optimizers/lr_multiplier.py

+ optimization algorithm.
+
+ # Arguments
+ optimizer: An optimizer class to be wrapped.


I think optimizer should be an optimizer instance, not an optimizer class. Let's minimize the hackyness.

gabrieldemarmiesse · 2019-01-09T16:17:03Z

keras_contrib/optimizers/lr_multiplier.py

+ class.
+ """
+ def __init__(self, optimizer, lr_multipliers=None, **kwargs):
+ self._class = optimizer


I don't think underscores are needed.

gabrieldemarmiesse · 2019-01-09T16:22:53Z

tests/keras_contrib/optimizers/lr_multiplier_test.py

+ optimizers._test_optimizer(opt1, target=0.95)
+
+ mult = {'dense': 10}
+ opt2 = LearningRateMultiplier(SGD, lr_multipliers=mult,


Can you make a second function test_lr_multiplier_layerwise for this?

gabrieldemarmiesse · 2019-01-09T16:24:59Z

tests/keras_contrib/optimizers/lr_multiplier_test.py

+ mult = {'dense': 10}
+ opt2 = LearningRateMultiplier(SGD, lr_multipliers=mult,
+ lr=0.001, momentum=0.9, nesterov=True)
+ optimizers._test_optimizer(opt2, target=0.95)


We'll also need a third test test_lr_multiplier_weightwise where you use the format {'layer_name/weight_name': lr} to ensure that all configuration work.

And a fourth test with a more complex optimizer (ADAM would be a good fit)

gabrieldemarmiesse · 2019-01-09T16:25:38Z

tests/keras_contrib/optimizers/lr_multiplier_test.py

+from keras_contrib.tests import optimizers
+from keras_contrib.optimizers import LearningRateMultiplier
+from keras.optimizers import SGD, Adam
+from keras.callbacks import LearningRateScheduler


Unused import

gabrieldemarmiesse · 2019-01-09T16:28:30Z

keras_contrib/optimizers/lr_multiplier.py

+ if name.startswith('_'):
+ super(LearningRateMultiplier, self).__setattr__(name, value)
+ else:
+ self._optimizer.__setattr__(name, value)


I don't think __setattr__ and __getattr__ are needed. By calling the right super() functions at the right places, everything should work. Ask me if you have any issues while removing them.

You'll likely have to have a lr parameter which will be the same as self.optimizer.lr since many callbacks expect a lr attribute.

gabrieldemarmiesse · 2019-01-09T16:29:38Z

keras_contrib/optimizers/lr_multiplier.py

+ self._class = optimizer
+ self._optimizer = optimizer(**kwargs)
+ self._lr_multipliers = lr_multipliers or {}
+


You should call super() at the end of the __init__ function. You can take a look at the source code of keras optimizers to see what happends.

gabrieldemarmiesse · 2019-01-09T16:34:32Z

keras_contrib/optimizers/lr_multiplier.py

+ return updates
+
+ def get_config(self):
+ config = {'optimizer': self._class,


Since optimizer will be an instance of the class optimizer, you should use the function serialize_keras_object which will serialize the optimizer for you.

Dicksonchin93 · 2020-01-09T17:17:47Z

will there be updates on this? if not can I make a new PR that adds this class to keras-contrib? @gabrieldemarmiesse @stante , will be enabling DiscriminativeLearningRate in general but not specifically only learning rate multiplier.

I propose three settings, automatic learning rate decaying (cosine) from the base learning rate of the wrapped optimizer by layer, automatic learning rate decaying (cosine) from the base learning rate of the wrapped optimizer by convolutional blocks/groups, and this learning rate multiplier

gabrieldemarmiesse · 2020-01-09T17:21:48Z

Keras contrib is currently deprecated. Please redicted the PRs to tensorflow/addons. It would be really nice if you could add that @Dicksonchin93 , a lot of people are asking for this feature :)

Dicksonchin93 · 2020-01-09T18:52:11Z

@gabrieldemarmiesse is there a reason why we shouldn't add this into keras directly?

gabrieldemarmiesse · 2020-01-09T19:00:33Z

This was proposed a while back and rejected. The reason is that not enough people use it to justify an API change of Keras. It's also not clear that it's a best practice. Tensorflow addons was made exactly for this kind of feature. El jue., 9 ene. 2020 a las 19:52, Ee Kin (<[email protected]>) escribió:

…

@gabrieldemarmiesse <https://github.com/gabrieldemarmiesse> is there a reason why we shouldn't add this into keras directly? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#396?email_source=notifications&email_token=ADCLMK4BPT7YKORCSYQ5VYLQ45W5ZA5CNFSM4GOQCUKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIRLZWA#issuecomment-572701912>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCLMKYTSUBDF44MQUUQLLDQ45W5ZANCNFSM4GOQCUKA> .

Add LearningRateMultiplier optimizer.

73f1f4c

stante added 2 commits January 7, 2019 21:21

Fix pep8 violations in LearningRateMultiplier.

cc6eaaf

Fix Python2.7 compatibility in LearningRateMultiplier.

3d3cf9f

gabrieldemarmiesse reviewed Jan 9, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LearningRateMultiplier wrapper for optimizers #396

Add LearningRateMultiplier wrapper for optimizers #396

stante commented Jan 7, 2019

gabrieldemarmiesse commented Jan 7, 2019

gabrieldemarmiesse commented Jan 7, 2019

gabrieldemarmiesse left a comment

gabrieldemarmiesse Jan 9, 2019

gabrieldemarmiesse Jan 9, 2019

gabrieldemarmiesse Jan 9, 2019

gabrieldemarmiesse Jan 9, 2019

gabrieldemarmiesse Jan 9, 2019

gabrieldemarmiesse Jan 9, 2019

gabrieldemarmiesse Jan 9, 2019

gabrieldemarmiesse Jan 9, 2019

gabrieldemarmiesse Jan 9, 2019

gabrieldemarmiesse Jan 9, 2019

Dicksonchin93 commented Jan 9, 2020

gabrieldemarmiesse commented Jan 9, 2020

Dicksonchin93 commented Jan 9, 2020

gabrieldemarmiesse commented Jan 9, 2020 via email

Add LearningRateMultiplier wrapper for optimizers #396

Are you sure you want to change the base?

Add LearningRateMultiplier wrapper for optimizers #396

Conversation

stante commented Jan 7, 2019

Summary

Related Issues

gabrieldemarmiesse commented Jan 7, 2019

gabrieldemarmiesse commented Jan 7, 2019

gabrieldemarmiesse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dicksonchin93 commented Jan 9, 2020

gabrieldemarmiesse commented Jan 9, 2020

Dicksonchin93 commented Jan 9, 2020

gabrieldemarmiesse commented Jan 9, 2020 via email