Swish activation doesn't save the weight if beta is not trainable #482

aghoshpub · 2019-03-21T16:52:17Z

It is very useful to always save the beta value in the weights file even if beta is not trainable. It is useful when converting to a light weight inference package such as LWTNN (https://github.com/lwtnn/lwtnn).

I already have an implementation of swish activation that preserves all the features of the current one but also saves the untrainable beta in the weights file and I would like to create a pull request.

gabrieldemarmiesse · 2019-03-21T19:27:07Z

Could you expand on this?

It is useful when converting to a light weight inference package such as LWTNN

aghoshpub · 2019-03-22T09:54:25Z

Could you expand on this?

It is useful when converting to a light weight inference package such as LWTNN

Indeed. To apply neural networks trained in the latest DNN libraries (such as Keras) to a massive C++ framework, developed over several years, it's often useful to use a light weight inference package (LWTNN). It is routinely used by certain experiments at CERN (European Organization for Nuclear Research) but the interest in using such a package extends beyond physics. It's lighter than adding an entire TensorFlow dependency to your production framework and avoids problems of having to harmonise multithreading and so on.

The idea in LWTNN is just to pickup the weights of a model trained in Keras from the weights .h5 file and use it for inference in C++. So it is useful if the weight of a swish function is always stored in the weights file when a model is saved, whether or not the beta parameter is trainable.

For this reason I suggest to add the weight with the trainable argument, something like:

self.beta = self.add_weight(shape=[1], name='beta',
                                    initializer=self.beta_initializer,
                                     trainable=trainable)

But I am open to any other solution as long as Keras always saves the beta in the weight file. What do you think?

gabrieldemarmiesse · 2019-03-22T10:46:31Z

How does it work for example with the LeakyReLU layer in keras? Do you have the same problem?

aghoshpub · 2019-03-22T11:25:23Z

For the LeakyReLU since the alpha is not trainable we pick it up from the architecture file. And we can do the same with the swish activation they way it is currently implemented. But in my personal opinion it would be better to save it as a weight since it could be a trainable parameter. If you disagree then feel free to close the issue. We can modify LWTNN to pick it up from the architecture file.

aghoshpub · 2019-03-22T12:07:46Z

On further consideration I am not convinced it's worth changing the implementation of swish here as long as any future version of the implementation continues to save the beta parameter as a config.

gabrieldemarmiesse · 2019-03-23T13:18:59Z

We don't plan on changing this, so no problem here. Also, on a side note, we can't change code just for the sake of facilitating implementation in specific projects. For a change to take place, a majority of users must benefit from it.
If this is causing major inconvinience for many users, then feel free to reopen an issue about it.

aghoshpub mentioned this issue Mar 21, 2019

Swish activation saves beta as weight even if it is not trainable #483

Open

aghoshpub mentioned this issue Mar 22, 2019

How does our swish function compare to keras-contrib version lwtnn/lwtnn#87

Closed

gabrieldemarmiesse closed this as completed Mar 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swish activation doesn't save the weight if beta is not trainable #482

Swish activation doesn't save the weight if beta is not trainable #482

aghoshpub commented Mar 21, 2019

gabrieldemarmiesse commented Mar 21, 2019

aghoshpub commented Mar 22, 2019

gabrieldemarmiesse commented Mar 22, 2019

aghoshpub commented Mar 22, 2019

aghoshpub commented Mar 22, 2019 •

edited

gabrieldemarmiesse commented Mar 23, 2019

Swish activation doesn't save the weight if beta is not trainable #482

Swish activation doesn't save the weight if beta is not trainable #482

Comments

aghoshpub commented Mar 21, 2019

gabrieldemarmiesse commented Mar 21, 2019

aghoshpub commented Mar 22, 2019

gabrieldemarmiesse commented Mar 22, 2019

aghoshpub commented Mar 22, 2019

aghoshpub commented Mar 22, 2019 • edited

gabrieldemarmiesse commented Mar 23, 2019

aghoshpub commented Mar 22, 2019 •

edited