Maevit

(1) Unofficial implementation of Vision Transformer and its variants: ViT-Lite, Compact Vision Transformer and Compact Convolution Transformer:

Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
Hassani A, Walton S, Shah N, et al. Escaping the big data paradigm with compact transformers[J]. arXiv preprint arXiv:2104.05704, 2021.

For more information, plz refer to my blog post: Event Horizon: Vision Transformers.

(2) Unofficial implementation of Swin Transformer.

Liu, Ze, et al. "Swin transformer: Hierarchical vision transformer using shifted windows." arXiv preprint arXiv:2103.14030 (2021).

For more information, plz refer to my blog post: Event Horizon: Swin Transformer复现.

For swin transformer, guide for running the code is omitted (for it closely resembles to that of CCT).

Requirements

Run the following for dependency check.

python3 -m pip install -r requirements.txt

torch	torchvision	numpy	matplotlib	tensorboard	timm
>=1.7.0	>=0.8.1	>=1.18.5	>=3.3.1	>=2.4.0	>=0.4.5

Plus, cuDNN is suggested, for in the main executable train.py, cuDNN acceleration is set.

CUDA is also required. Guess you won't want to fry eggs with your CPU for several days.

Repo Structure

.
├── logs/ --- tensorboard log storage (compulsory)
├── model/ --- folder from and to which the models are loaded & stored (compulsory)
├── check_points/ --- check_points folder (compulsory)
├── train.sh --- Script for training
├── train.py --- Python main module
├── plot.sh --- quick tensorboard intialization script
├── test --- quick testing script
	 ├── ...
├── swin
	 ├── swinLayer.py --- swin transformer class definition
	 └── wniMSA.py	--- window-based and shifted-window-based multihead attention block definition
└── py/ 
	 ├── CCT.py  --- Compact Convolution Transformer class
	 ├── LECosineAnnealing.py --- LECAWS lr, see my blog for more info
	 ├── LabelSmoothing.py --- Label smooting cross entropy loss
	 ├── StochasticDepth.py --- Adopted from timm
	 ├── TEncoder.py --- Tranformer encoder
	 ├── ViTLite.py --- ViT-Lite Implementaion
	 ├── configs.py --- Mixup configurations
	 ├── train_utils.py --- training utility functions.
	 └── SeqPool.py --- Sequential pooling layer

Run the Code

Since argparser is used, some arguments must be given or they can only be loaded by defaults. In train.sh, some of the most-used args are provided. To run the code, therefore, run:

sudo chmod +x ./train.sh
./train.sh

Make sure you have check_points model logs.

In py/train_utils.py, the loading path of dataset is specified. ../dataset/ is the default cifar-10 dataset path. If you have no downloaded CIFAR-10 dataset, function getCIFAR10Dataset will download the dataset if root is set to ../datatset and the root is empty. Once the dataset is ready, everything might just be good to go.

Results

The tested implementation is CCT-7 transformer layers-1 x 3*3 conv kernel. The result of official counterpart is: (200 epochs - acc 94.78%).

The implementation of mine is: (300 epochs - training acc: roughly 100%, test acc: 94.5%), without mixup or cutmix like it does in SHI-Labs/Compact-Transformers.

The following image is the final version with mixup and cutmix (configuration is nquite different from the official implementation). I didn't train this from the scratch:

No result available for swin transformer on ImageNet. Swin transformer is trained on imagenette2-320, which has not been fully trained, yet.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.vscode		.vscode
py		py
swin		swin
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
plot.sh		plot.sh
requirements.txt		requirements.txt
swin.py		swin.py
swin_train.sh		swin_train.sh
train.py		train.py
vit_train.sh		vit_train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

py

py

swin

swin

test

test

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

plot.sh

plot.sh

requirements.txt

requirements.txt

swin.py

swin.py

swin_train.sh

swin_train.sh

train.py

train.py

vit_train.sh

vit_train.sh

Repository files navigation

Maevit

Requirements

Repo Structure

Run the Code

Results

About

Releases

Packages

Contributors 2

Languages

License

Enigmatisms/Maevit

Folders and files

Latest commit

History

Repository files navigation

Maevit

Requirements

Repo Structure

Run the Code

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Languages