AK_SSL: A Self-Supervised Learning Library

📒 Table of Contents

📒 Table of Contents
📍 Overview
✍️ Self Supervised Learning
🔎 Supported Methods
📦 Installation
💡 Tutorial
📊 Benchmarks
📜 References Used
💯 License
🤝 Collaborators

📍 Overview

Welcome to the Self-Supervised Learning Library! This repository hosts a collection of tools and implementations for self-supervised learning. Self-supervised learning is a powerful paradigm that leverages unlabeled data to pre-trained models, which can then be fine-tuned on specific tasks with smaller labeled datasets. This library aims to provide researchers and practitioners with a comprehensive set of tools to experiment, learn, and apply self-supervised learning techniques effectively. This project was our assignment during the summer apprenticeship in the newly established Intelligent and Learning System (ILS) laboratory at the University of Isfahan.

✍️ Self Supervised Learning

Self-supervised learning is a subfield of machine learning where models are trained to predict certain aspects of the input data without relying on manual labeling. This approach has gained significant attention due to its ability to leverage large amounts of unlabeled data, which is often easier to obtain than fully annotated datasets. This library provides implementations of various self-supervised techniques, allowing you to experiment with and apply these methods in your own projects.

🔎 Supported Methods

BarlowTwins

Barlow Twins is a self-supervised learning method that aims to learn embeddings invariant to distortions of the input sample. It achieves this by applying two distinct sets of augmentations to the same input sample, resulting in two distorted views of the same image. The objective function measures the cross-correlation matrix between the outputs of two identical networks fed with these distorted sample versions, striving to make it as close to the identity matrix as possible. This causes the embedding vectors of the distorted sample versions to become similar while minimizing redundancy among the components of these vectors. Barlow Twins particularly benefits from utilizing high-dimensional output vectors.

Details of this method

Loss	Transformation	Transformation Prime	Projection Head	Paper	Original Code
BarlowTwins Loss	SimCLR Transformation	SimCLR Transformation	BarlowTwins Projection Head	Link	Link

BarlowTwins Loss is inspired by HSIC loss.

BYOL

BYOL (Bootstrap Your Own Latent) is one of the new approaches to self-supervised learning. Like other methods, BYOL aims to learn a representation that can be utilized for downstream tasks. It employs two neural networks for learning: the online and target networks. The online network is trained to predict the target network's representation of the same image from a different augmented view. Simultaneously, the target network is updated with a slow-moving average of the online network's parameters. While state-of-the-art methods rely on negative pairs, BYOL achieves a new state of the art without them. It directly minimizes the similarity between the representations of the same image from different augmented views (positive pair).

Details of this method

Loss	Transformation	Transformation Prime	Projection Head	Prediction Head	Paper	Original Code
BYOL Loss	SimCLR Transformation	SimCLR Transformation	BarlowTwins Projection Head	BarlowTwins Prediction Head	Link	Link

DINO

DINO (self-distillation with no labels) is a self-supervised learning method that directly predicts the output of a teacher network—constructed with a momentum encoder—by utilizing a standard cross-entropy loss. It is an innovative self-supervised learning algorithm developed by Facebook AI. Through the utilization of self-supervised learning with Transformers, DINO paves the way for creating machines that can comprehend images and videos at a much deeper level.

Details of this method

Loss	Transformation Global 1	Transformation Global 2	Transformation Local	Projection Head	Paper	Original Code
DINO Loss	SimCLR Transformation	SimCLR Transformation	SimCLR Transformation	DINO Projection Head	Link	Link

MoCos

MoCo, short for Momentum Contrast, is a self-supervised learning algorithm that employs a contrastive loss. MoCo v2 represents an enhanced iteration of the original Momentum Contrast self-supervised learning algorithm. Motivated by the findings outlined in the SimCLR paper, the authors introduced several modifications in MoCo v1, which included replacing the 1-layer fully connected layer with a 2-layer MLP head featuring ReLU activation for the unsupervised training stage. Additionally, they incorporated blur augmentation and adopted a cosine learning rate schedule. These adjustments enabled MoCo to outperform the state-of-the-art SimCLR, even when utilizing a smaller batch size and fewer epochs.

MoCo v3, introduced in the paper "An Empirical Study of Training Self-Supervised Vision Transformers," represents another advancement in self-supervised learning. It builds upon the foundation of MoCo v1 / MoCo v2 and addresses the instability issue observed when employing ViT for self-supervised learning.

In contrast to MoCo v2, MoCo v3 adopts a different approach where the keys naturally coexist within the same batch. The memory queue (memory bank) is discarded, resulting in a setting similar to that of SimCLR. The encoder fq comprises a backbone (e.g., ResNet, ViT), a projection head, and an additional prediction head.

Details of this method

Method	Loss	Transformation	Transformation Prime	Projection Head	Prediction Head	Paper	Original Code
MoCo v2	InfoNCE	SimCLR Transformation	None	SimCLR Projection Head	None	Link	Link
MoCo v3	InfoNCE	SimCLR Transformation	SimCLR Transformation	SimCLR Projection Head	BYOL Prediction Head	Link	Link

SimCLRs

SimCLR (Simple Framework for Contrastive Learning of Representations) is a self-supervised technique used to learn image representations. The fundamental building blocks of contrastive self-supervised methods, such as SimCLR, are image transformations. Each image is transformed into multiple new images through randomly applied augmentations. The goal of the self-supervised model is to identify images that originate from the same original source among a set of negative examples. SimCLR operates on the principle of maximizing the similarity between positive pairs of augmented images while minimizing the similarity with negative pairs. The training process can be summarized as follows: Data Augmentation - SimCLR employs robust data augmentation techniques to generate multiple augmented versions of each input image.

Details of this method

Loss	Transformation	Projection Head	Paper	Original Code
NT_Xent	SimCLR Transformation	SimCLR Projection Head	Link	Link

SimSiam

SimSiam is a self-supervised representation learning model that was proposed by Facebook AI Research (FAIR). It is a simple Siamese network designed to learn meaningful representations without requiring negative sample pairs, large batches, or momentum encoders.

Details of this method

Loss	Transformation	Projection Head	Prediction Head	Paper	Original Code
Negative Cosine Similarity	SimCLR Transformation	SimSiam Projection Head	SimSiam Prediction Head	Link	Link

SwAV

SwAV, or Swapping Assignments Between Views, is a self-supervised learning approach that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, it simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or views) of the same image, instead of comparing features directly as in contrastive learning. Simply put, SwAV uses a swapped prediction mechanism where we predict the cluster assignment of a view from the representation of another view.

Details of this method

Loss	Transformation Global	Transformation Local	Projection Head	Paper	Original Code
SwAV Loss	SimCLR Transformation	SimCLR Transformation	SwAV Projection Head	Link	Link

📦 Installation

You can install AK_SSL and its dependencies from PyPI with:

pip install AK-SSL

We strongly recommend that you install AK_SSL in a dedicated virtualenv, to avoid conflicting with your system packages

💡 Tutorial

Using AK_SSL, you have the flexibility to leverage the most recent self-supervised learning techniques seamlessly, harnessing the complete capabilities of PyTorch. You can explore diverse backbones, models, and optimizer while benefiting from a user-friendly framework that has been purposefully crafted for ease of use.

You can easily import Trainer module from AK_SSL library and start utilizing it right away.

from AK_SSL import Trainer

Initializing the Trainer

Now, let's initialize the self-supervised trainer with our chosen method, backbone, dataset, and other configurations.

trainer = Trainer(
    method="barlowtwins",           # training method as string (BarlowTwins, BYOL, DINO, MoCov2, MoCov3, SimCLR, SimSiam, SwAV)
    backbone=backbone,              # backbone architecture as torch.Module
    feature_size=feature_size,      # size of the extracted features as integer
    dataset=train_dataset,          # training dataset as torch.utils.data.Dataset
    image_size=32,                  # dataset image size as integer
    save_dir="./save_for_report/",  # directory to save training checkpoints and Tensorboard logs as string
    checkpoint_interval=50,         # interval (in epochs) for saving checkpoints as integer
    reload_checkpoint=False,        # reload a previously saved checkpoint as boolean
    verbose=True,                   # enable verbose output for training progress as a boolean
    **kwargs                        # other arguments 
)

Note: The use of **kwargs can differ between methods, depending on the specific method, loss function, transformation, and other factors. If you are utilizing any of the objectives listed below, you must provide their arguments during the initialization of the Trainer class.

SimCLR Transformation

  color_jitter_strength     # a float to Set the strength of color
  use_blur                  # a boolean to specify whether to apply blur augmentation
  mean                      # a float to specify the mean values for each channel
  std                       # a float to specify the standard deviation values for each channel

BarlowTwins

Method

  projection_dim          # an integer to specify dimensionality of the projection head
  hidden_dim              # an integer to specify dimensionality of the hidden layers in the neural network
  moving_average_decay    # a float to specify decay rate for moving averages during training

Loss

  lambda_param            # a float to controlling the balance between the main loss and the orthogonality loss

DINO Method

Method

  projection_dim          # an integer to specify dimensionality of the projection head
  hidden_dim              # an integer to specify dimensionality of the hidden layers in the projection head neural network
  bottleneck_dim          # an integer to specify dimensionality of the bottleneck layer in the student network
  temp_student            # a float to specify temperature parameter for the student's logits
  temp_teacher            # a float to specify temperature parameter for the teacher's logits
  norm_last_layer         # a boolean to specify whether to normalize the last layer of the network
  momentum_teacher        # a float to control momentum coefficient for updating the teacher network
  num_crops               # an integer to determines the number of augmentations applied to each input image
  use_bn_in_head          # a boolean to spcecify whether to use batch normalization in the projection head

Loss

  center_momentum        # a float to control momentum coefficient for updating the center of cluster assignments

MoCo v2

Method

  projection_dim          # an integer to specify dimensionality of the projection head
  K                       # an integer to specify number of negative samples per positive sample in the contrastive loss
  m                       # a float to control momentum coefficient for updating the moving-average encoder

Loss

  temperature             # a float to control the temperature for the contrastive loss function

MoCo v3

Method

  projection_dim          # an integer to specify dimensionality of the projection head
  hidden_dim              # an integer to specify dimensionality of the hidden layers in the projection head neural network
  moving_average_decay    # a float to specify decay rate for moving averages during training

Loss

  temperature             # a float to control the temperature for the contrastive loss function

SimCLR

Method

  projection_dim          # an integer to specify dimensionality of the projection head
  projection_num_layers   # an integer to specify the number of layers in the projection head (1: SimCLR v1, 2: SimCLR v2)
  projection_batch_norm   # a boolean to indicate whether to use batch normalization in the projection head

Loss

  temperature             # a float to control the temperature for the contrastive loss function

SimSiam

Method

  projection_dim          # an integer to specify dimensionality of the projection head

Loss

  eps                     # a float to control the stability of the loss function

SwAV

Method

  projection_dim          # an integer to specify dimensionality of the projection head
  hidden_dim              # an integer to specify dimensionality of the hidden layers in the projection head neural network
  epsilon                 # a float to control numerical stability in the algorithm
  sinkhorn_iterations     # an integer to specify the number of iterations in the Sinkhorn-Knopp algorithm
  num_prototypes          # an integer to specify the number of prototypes or clusters for contrastive learning
  queue_length            # an integer to specify rhe length of the queue for maintaining negative samples
  use_the_queue           # a boolean to indicate whether to use the queue for negative samples
  num_crops               # an integer to determines the number of augmentations applied to each input image

Loss

  temperature             # a float to control the temperature for the contrastive loss function

Training the Self-Supervised Model

Then, we'll train the self-supervised model using the specified parameters.

  trainer.train(               
      batch_size=256,          # the number of training examples used in each iteration as integer
      start_epoch=1,           # the starting epoch for training as integer (if 'reload_checkpoint' parameter was True, start epoch equals to the latest checkpoint epoch)
      epochs=100,              # the total number of training epochs as integer
      optimizer="Adam",        # the optimization algorithm used for training as string (Adam, SGD, or AdamW)
      weight_decay=1e-6,       # a regularization term to prevent overfitting by penalizing large weights as float
      learning_rate=1e-3,      # the learning rate for the optimizer as float
)

Evaluating th Self-Supervised Model

This evaluation assesses how well the pre-trained model performs on a dataset, specifically for tasks related to linear evaluation.

trainer.evaluate(
    train_dataset=train_dataset,      # to specify the training dataset as torch.utils.data.Dataset
    test_dataset=test_dataset,        # to specify the testing dataset as torch.utils.data.Dataset
    eval_method="linear",             # the evaluation method to use as string (linear or finetune)
    top_k=1,                          # the number of top-k predictions to consider during evaluation as integer
    epochs=100,                       # the number of evaluation epochs as integer
    optimizer='Adam',                 # the optimization algorithm used during evaluation as string (Adam, SGD, or AdamW)
    weight_decay=1e-6,                # a regularization term applied during evaluation to prevent overfitting as float
    learning_rate=1e-3,               # the learning rate for the optimizer during evaluation as float
    batch_size=256,                   # the batch size used for evaluation in integer
    fine_tuning_data_proportion=1,    # the proportion of training data to use during evalutation as float in range of (0.0, 1]
)

Get the Self-Supervised Model backbone

In case you want to use the pre-trained network in your own downstream task, you need to define a downstream task model. This model should include the self-supervised model backbone as one of its components. Here's an example of how to define a simple downstream model class:

  class DownstreamNet(nn.Module):
      def __init__(self, backbone, **kwargs):
          super().__init__()
          self.backbone = backbone
  
          # You can define your downstream task model here
  
      def forward(self, x):
          x = self.backbone(x)
          # ...
  
  
  downstream_model = DownstreamNet(trainer.get_backbone())

Loading Self-Supervised Model Checkpoint

To load a previous checkpoint into the network, you can do as below.

path = 'YOUR CHECKPOINT PATH'
trainer.load_checkpoint(path)

Saving Self-Supervised Model backbone

To save model backbone, you can do as below.

trainer.save_backbone()

That's it! You've successfully trained and evaluate a self-supervised model using the AK_SSL Python library. You can further customize and experiment with different self-supervised methods, backbones, and hyperparameters to suit your specific tasks. You can find the description of Trainer class and its function using help built in fuction in python.

📊 Benchmarks

We executed models and obtained results on the CIFAR10 dataset, with plans to expand our experimentation to other datasets. Please note that hyperparameters were not optimized for maximum accuracy.

Method	Backbone	Batch Size	Epoch	Optimizer	Learning Rate	Weight Decay	Linear Top1	Fine-tune Top1	Download Backbone	Download Full Checkpoint
BarlowTwins	Resnet18	256	800	Adam	1e-3	1e-6	70.92%	79.50%	Link	Link
BYOL	Resnet18	256	800	Adam	1e-3	1e-6	71.06%	71.04%
DINO	Resnet18	256	800	Adam	1e-3	1e-6	9.91%	9.76%
MoCo v2	Resnet18	256	800	Adam	1e-3	1e-6	70.08%	78.71%	Link	Link
MoCo v3	Resnet18	256	800	Adam	1e-3	1e-6	59.98%	74.20%	Link	Link
SimCLR v1	Resnet18	256	800	Adam	1e-3	1e-6	73.09%	72.75%	Link	Link
SimCLR v2	Resnet18	256	800	Adam	1e-3	1e-6	73.07%	81.52%
SimSiam	Resnet18	256	800	Adam	1e-3	1e-6	19.77%	70.77%	Link	Link
SwAv	Resnet18	256	800	Adam	1e-3	1e-6	33.36%	74.14%

📜 References Used

In the development of this project, we have drawn inspiration and utilized code, libraries, and resources from various sources. We would like to acknowledge and express our gratitude to the following references and their respective authors:

Lightly Library
PYSSL Library
SimCLR Implementation
All original codes of supported methods

These references have played a crucial role in enhancing the functionality and quality of our project. We extend our thanks to the authors and contributors of these resources for their valuable work.

💯 License

This project is licensed under the MIT License.

🤝 Collaborators

By:

Thanks to Dr. Peyman Adibi and Dr. Hossein Karshenas, for their invaluable guidance and support throughout this project.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
AK_SSL		AK_SSL
Documents		Documents
LICENSE		LICENSE
README.md		README.md
example.ipynb		example.ipynb
setup.py		setup.py

License

audrina-ebrahimi/AK_SSL

Folders and files

Latest commit

History

Repository files navigation

AK_SSL: A Self-Supervised Learning Library

📒 Table of Contents

📍 Overview

✍️ Self Supervised Learning

🔎 Supported Methods

📦 Installation

💡 Tutorial

Initializing the Trainer

Training the Self-Supervised Model

Evaluating th Self-Supervised Model

Get the Self-Supervised Model backbone

Loading Self-Supervised Model Checkpoint

Saving Self-Supervised Model backbone

📊 Benchmarks

📜 References Used

💯 License

🤝 Collaborators

About

Topics

Resources

License

Stars

Watchers

Forks

Languages