Skip to content

visresearch/patchmix

Repository files navigation

Inter-Instance Similarity Modeling for Contrastive Learning

1. Introduction

This is the official implementation of paper: "Inter-Instance Similarity Modeling for Contrastive Learning".

Framework

PatchMix is a novel image mix strategy, which mixes multiple images in patch level. The mixed image contains massive local components from multiple images and efficiently simulates rich similarities among natural images in an unsupervised manner. To model rich inter-instance similarities among images, the contrasts between mixed images and original ones, mixed images to mixed ones, and original images to original ones are conducted to optimize the ViT model. Experimental results demonstrate that our proposed method significantly outperforms the previous state-of-the-art on both ImageNet-1K and CIFAR datasets, e.g., 3.0% linear accuracy improvement on ImageNet-1K and 8.7% kNN accuracy improvement on CIFAR100.

Requirements

conda create -n patchmix python=3.8
pip install -r requirements.txt

Datasets

Please set the root paths of dataset in the *.py configuration file under the directory: ./config/. CIFAR10, CIFAR100 datasets provided by torchvision. The root paths of data are set to /path/to/dataset . The root path of ImageNet-1K (ILSVRC2012) is /path/to/ILSVRC2012

Self-Supervised Pretraining

ViT-Small with 2-node (8-GPU) training

Set hyperparameters, dataset and GPU IDs in ./config/pretrain/vit_small_pretrain.py and run the following command

python main_pretrain.py --arch vit-small

kNN Evaluation

Set hyperparameters, dataset and GPU IDs in ./config/knn/knn.py and run the following command

python main_knn.py --arch vit-small --pretrained-weights /path/to/pretrained-weights.pth

Linear Evaluation

Set hyperparameters, dataset and GPU IDs in ./config/linear/vit_small_linear.py and run the following command:

python main_linear.py --arch vit-small --pretrained-weights /path/to/pretrained-weights.pth

Fine-tuning Evaluation

Set hyperparameters, dataset and GPUs in ./config/finetuning/vit_small_finetuning.py and run the following command

python python main_finetune.py --arch vit-small --pretrained-weights /path/to/pretrained-weights.pth

Main Results and Model Weights

If you don't have a mircosoft office account, you can download the trained model weights by this link.

If you have a mircosoft office account, you can download the trained model weights by the links in the following tables.

ImageNet-1K

Arch Batch size #Pre-Epoch Finetuning Accuracy Linear Probing Accuracy kNN Accuracy
ViT-S/16 1024 300 82.8% (link) 77.4% (link) 73.3% (link)
ViT-B/16 1024 300 84.1% (link) 80.2% (link) 76.2% (link)

CIFAR10

Arch Batch size #Pre-Epoch Finetuning Accuracy Linear Probing Accuracy kNN Accuracy
ViT-T/2 512 800 97.5% (link) 94.4% (link) 92.9% (link)
ViT-S/2 512 800 98.1% (link) 96.0% (link) 94.6% (link)
ViT-B/2 512 800 98.3% (link) 96.6% (link) 95.8% (link)

CIFAR100

Arch Batch size #Pre-Epoch Finetuning Accuracy Linear Probing Accuracy kNN Accuracy
ViT-T/2 512 800 84.9% (link) 74.7% (link) 68.8% (link)
ViT-S/2 512 800 86.0% (link) 78.7% (link) 75.4% (link)
ViT-B/2 512 800 86.0% (link) 79.7% (link) 75.7% (link)

The Visualization of Inter-Instance Similarities

visualization

The query sample and the image with id 4 in key samples are from the same category. The images with id 3 and 5 come from category similar to query sample.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.