Skip to content

Complete implementation of Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks article

Notifications You must be signed in to change notification settings

zoli333/Weight-Normalization

Repository files navigation

Weight-Normalization

This is a complete Pytorch implementation of Weight Normalization:

A Simple Reparameterization to Accelerate Training of Deep Neural Networks article https://arxiv.org/pdf/1602.07868.pdf

Version: Pytorch 2.1.1+cu118

The following methods are implemented and can be runned jointly or separately

   - no normalization (default reference model)
   - Weight normalization with initialization
   - Weight normalization without initalization
   - Weight normalization with mean only batch normalization
   - Weight normalization with mean only batch normalization (with initialization)
   - Batch normalization (without initialization)

Results

train_plot.png

test_plot.png

The "no normalization" model, trained with learning rate 0.003 got unstable behaviours during training. For this the learning rate was reduced to 0.0003 to train "no normalization" model.

The weight normalization methods without initialization converged the fastest along with batch normalization. However the best results achieved with weight normalization combined with mean only batch normalization and with initialization as per the article. Overall, the models without initialization trained/optimized faster but the same models with initialization got the better accuracy.

  • The training with weight normalization alone or mean only batchnorm combined with weight normalization were stable for different learning rates.
  • Also less overfitting occured with weight normalization or weight normalization combined with mean only batch normalization during training.

Note: Mean only batch normalization without weight normalization omitted from the results since it became unstable at learning rate 0.003 similarly to the reference model (the reference model got unstable with learning rate 0.003 as well). In this experiment mean only batch normalization without weight normalization applied before this layer like: MeanOnlyBatchNormLayer(...nn.Conv2d()...) for all layers that had trainable parameters: Linear, Conv2d, NINLayer.

Model

The model was identical with the article's model, except ZCA whitening was not applied, just a default transform at the beginning by moving the image pixel range to [-1, 1]

Training

The deterministic behaviour was kept during training, generating the same examples in the same order during training and if initialization was present then the initial batch of examples were deterministic during the initialization time. The training was done with the command below in google colab for 100 epochs for each model:

!CUBLAS_WORKSPACE_CONFIG=:4096:8 python train.py

References:


Gaussian noise layer taken from:

NINLayer rewritten from (lasagne):

About

Complete implementation of Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks article

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages