Skip to content

nikheelpandey/SimSiam-PyTorch

Repository files navigation

Py-Torch Implementation of SimSiam

The main idea behind this paper is that Siamese networks can learn meaningful representations even using none of the following:

  • (i) negative sample pairs
  • (ii) large batches
  • (iii) momentum encoders.

This paper emperically proves that the siamese architecture is a great inductive prior that complements the prior of data augmentation (spatial invarience).

Architecture

Two augmented views of one image are processed by the same encoder network f (a backbone plus a projection MLP). Then a prediction MLP h is applied on one side, and a stop-gradient operation is applied on the other side. The model maximizes the similarity between both sides. Architecture

Loss

Cosine similarity is used to calculate the loss between the two view obtained from the same image.

- F.cosine_similarity(p, z.detach(), dim=-1).mean()

Batch Size

Simsiam doesn't require bigger batch sizes. However, in my experiments, I found that bigger batch sizes had a faster conversion time as compared to the smaller batch sizes. I used 128, 256, and 512 as my batch size. All of them performed around same accuracy but 512 converged faster than 128.

Optimizer

SGD with momentum = 0.95 and weight_decay=0.0005

Performance

Here's the loss on CIFAR10

Color Batch_size Loss epochs
green 512 -0.906 50
pink 256 -0.901 50
orange 512 -0.919 150

Loss

LR Scheduling LR

TODO:

  • Design script for training downstream tasks.
  • Benchmark on STL-10

Reference: https://arxiv.org/abs/2011.10566