Skip to content
This repository has been archived by the owner on Nov 3, 2022. It is now read-only.

Swish activation doesn't save the weight if beta is not trainable #482

Closed
aghoshpub opened this issue Mar 21, 2019 · 6 comments · May be fixed by #483
Closed

Swish activation doesn't save the weight if beta is not trainable #482

aghoshpub opened this issue Mar 21, 2019 · 6 comments · May be fixed by #483

Comments

@aghoshpub
Copy link

It is very useful to always save the beta value in the weights file even if beta is not trainable. It is useful when converting to a light weight inference package such as LWTNN (https://github.com/lwtnn/lwtnn).

I already have an implementation of swish activation that preserves all the features of the current one but also saves the untrainable beta in the weights file and I would like to create a pull request.

@gabrieldemarmiesse
Copy link
Contributor

Could you expand on this?

It is useful when converting to a light weight inference package such as LWTNN

@aghoshpub
Copy link
Author

Could you expand on this?

It is useful when converting to a light weight inference package such as LWTNN

Indeed. To apply neural networks trained in the latest DNN libraries (such as Keras) to a massive C++ framework, developed over several years, it's often useful to use a light weight inference package (LWTNN). It is routinely used by certain experiments at CERN (European Organization for Nuclear Research) but the interest in using such a package extends beyond physics. It's lighter than adding an entire TensorFlow dependency to your production framework and avoids problems of having to harmonise multithreading and so on.

The idea in LWTNN is just to pickup the weights of a model trained in Keras from the weights .h5 file and use it for inference in C++. So it is useful if the weight of a swish function is always stored in the weights file when a model is saved, whether or not the beta parameter is trainable.

For this reason I suggest to add the weight with the trainable argument, something like:

self.beta = self.add_weight(shape=[1], name='beta',
                                    initializer=self.beta_initializer,
                                     trainable=trainable)

But I am open to any other solution as long as Keras always saves the beta in the weight file. What do you think?

@gabrieldemarmiesse
Copy link
Contributor

How does it work for example with the LeakyReLU layer in keras? Do you have the same problem?

@aghoshpub
Copy link
Author

For the LeakyReLU since the alpha is not trainable we pick it up from the architecture file. And we can do the same with the swish activation they way it is currently implemented. But in my personal opinion it would be better to save it as a weight since it could be a trainable parameter. If you disagree then feel free to close the issue. We can modify LWTNN to pick it up from the architecture file.

@aghoshpub
Copy link
Author

aghoshpub commented Mar 22, 2019

On further consideration I am not convinced it's worth changing the implementation of swish here as long as any future version of the implementation continues to save the beta parameter as a config.

@gabrieldemarmiesse
Copy link
Contributor

We don't plan on changing this, so no problem here. Also, on a side note, we can't change code just for the sake of facilitating implementation in specific projects. For a change to take place, a majority of users must benefit from it.
If this is causing major inconvinience for many users, then feel free to reopen an issue about it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants