Skip to content

fundamentalvision/UniGrad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UniGrad

By Chenxin Tao, Honghui Wang, Xizhou Zhu, Jiahua Dong, Shiji Song, Gao Huang, Jifeng Dai

This is the official implementation of the CVPR 2022 paper Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework.

News

  • 2022.6.28: See Siamese Image Modeling for an application of UniGrad.
  • 2022.6.28: Release code for UniGrad and gradient implementations for other methods.

Introduction

We propose a unified framework for different self-supervised learning methods through the perspective of gradient analysis. It turns out that previous methods share similar gradient structures (positive gradient + negative gradient) as well as similar performances. Based on these analysis, we propose UniGrad as a concise and effective method for self-supervised learning.

UniGrad-intro

Method Loss formula
contrastive learning $L=\mathbb{E}_{u_1, u_2}\bigg[-\log\frac{\exp{(\mathrm{cos}(u_1^o, u_2^t)/\tau)}}{\sum_{v^t}\exp{(\mathrm{cos}(u_1^o, v^t)/\tau)}}\bigg]$
asymmetric networks $L = \mathbb{E}_{u_1,u_2}\bigg[-\mathrm{cos}(h(u_1^o), u_2^t)\bigg]$
feature decorrelation Barlow Twins: $L = \sum_{i}{(W_{ii}-1)^2+\lambda\sum_{i}{\sum_{j\ne{i}}}{W_{ij}^2}}$
VICReg: $L = \frac{1}{N}\sum_{v_1^o, v_2^s}||v_1^o-v_2^s||_2^2 + \frac{\lambda_1}{c}\sum_{i,j\ne i}W_{ij}'^2 + \frac{\lambda_2}{c}\sum_{i}\max(0, \gamma - \mathrm{std}(v_1^o)_i)$
UniGrad $L=\mathbb{E}_{u_1, u_2}\bigg[-\cos(u_1^o, u_2^m) + \frac{\lambda}{2} \sum_{v_o} \rho_v\cos^2(u_1^o, v^o)\bigg]$
Method Gradient formula
contrastive learning $\frac{\partial L}{\partial u_1^o} = -u_2^m + \lambda \sum_vs_vv,$
where $s_v$ is the softmax results over similarities.
asymmetric networks $\frac{\partial L}{\partial u_1^o} = -u_2^m + \lambda \sum_v\rho_v(u_1^Tv)v$
feature decorrelation $\frac{\partial L}{\partial u_1^o} = -u_2^m + \lambda \sum_v\frac{u_2^Tv_2}{N}v_1$
UniGrad $\frac{\partial L}{\partial u_1^o} = -u_2^m + \lambda Fu_1^o,$
where $F$ is the moving average of correlation matrix.

Main Results and Pretrained models

Loss Model Epoch Ori. Top1 acc. Grad. imple.
w/o mm enc.
Grad. imple.
w/ mm enc.
config Pretrained model
Contrastive Learning
MoCo R50 100 67.4 67.6 70.0 - -
SimCLR R50 100 62.7 67.6 70.0 config -
Asymmetric Networks
SimSiam R50 100 68.1 67.9 70.2 - -
BYOL R50 100 66.5 67.9 70.2 config model
Feature Decorrelation
VICReg R50 100 68.6 67.6 69.8 - -
Barlow Twins R50 100 68.7 67.6 70.0 config model
UniGrad
UniGrad R50 100 - - 70.3 config model
UniGrad+cutmix R50 800 - - 74.9 - -

Note:

(1) The numbers for original versions come from original papers, and all gradient versions are implemented by us;

(2) For MoCo, SimCLR and BYOL, our results improve upon their original performances by a large margin. The improvements mainly come from a stronger projector (3 layers with 2048 hidden dimension). Similarly, original Barlow Twins and VICReg get better results because they use a 3-layer MLP with 8192 hidden dimension for the projector.

Usage

Preparation

The pretraining only requires installing pytorch and downloading ImageNet dataset, whose detailed instructions can be found here.

To conduct the linear evaluation, apex should be installed because LARS optimizer is used.

Pretraining

To conduct pretraining, fill the dataset path into the config, and run the following command:

./run_pretrain.sh {CONFIG_NAME} {NUM_GPUS}

For example, the command for pretraining with UniGrad on 8 GPUs is as follows:

./run_pretrain.sh ./configs/unigrad.py 8

With 8 gpus, a 100-ep experiment usually requires 38 hours to finish, which is also reported in the paper.

Linear evaluation

To conduct linear evaluation, run the following command:

./run_linear.sh {PRETRAINED_CKPT} {DATA_PATH} {NUM_GPUS}

Note that for 100-ep pretrained model, the base learning rate is set to 0.1, which just follows the evaluation setting of SimSiam. With our provided pretrained model, this command should give ~70 top-1 accuracy.

Todo

  • release pretrained model without momentum encoder
  • release 800ep UniGrad pretrained model
  • support vit structure

License

This project is released under the Apache 2.0 license.

Citing UniGrad

If you find UniGrad useful in your research, please consider citing:

@inproceedings{tao2022exploring,
  title={Exploring the equivalence of siamese self-supervised learning via a unified gradient framework},
  author={Tao, Chenxin and Wang, Honghui and Zhu, Xizhou and Dong, Jiahua and Song, Shiji and Huang, Gao and Dai, Jifeng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={14431--14440},
  year={2022}
}