AutoFocusFormer

AFF-Base:

This software project accompanies the research paper, AutoFocusFormer: Image Segmentation off the Grid (CVPR 2023).

Chen Ziwen, Kaushik Patnaik, Shuangfei Zhai, Alvin Wan, Zhile Ren, Alex Schwing, Alex Colburn, Li Fuxin

arXiv | video narration | AFF-Classification (this repo) | AFF-Segmentation

Introduction

AutoFocusFormer (AFF) is the first adaptive-downsampling network capable of dense prediction tasks such as semantic/instance segmentation.

AFF abandons the traditional grid structure of image feature maps, and automatically learns to retain the most important pixels with respect to the task goal.

AFF consists of a local-attention transformer backbone and a task-specific head. The backbone consists of four stages, each stage containing three modules: balanced clustering, local-attention transformer blocks, and adaptive downsampling.

AFF demonstrates significant savings on FLOPs (see our models with 1/5 downsampling rate), and significant improvement on recognition of small objects.

Notably, AFF-Small achieves 44.0 instance segmentation AP and 66.9 panoptic segmentation PQ on Cityscapes val with a backbone of only 42.6M parameters, a performance on par with Swin-Large, a backbone with 197M params (saving 78%!).

Main Results on ImageNet with Pretrained Models

name	pretrain	resolution	acc@1	acc@5	#params	FLOPs	FPS	1K model
AFF-Mini	ImageNet-1K	224x224	78.2	93.6	6.75M	1.08G	1337	Apple ML
AFF-Mini-1/5	ImageNet-1K	224x224	77.5	93.3	6.75M	0.72G	1678	Apple ML
AFF-Tiny	ImageNet-1K	224x224	83.0	96.3	27M	4G	528	Apple ML
AFF-Tiny-1/5	ImageNet-1K	224x224	82.4	95.9	27M	2.74G	682	Apple ML
AFF-Small	ImageNet-1K	224x224	83.5	96.6	42.6M	8.16G	321	Apple ML
AFF-Small-1/5	ImageNet-1K	224x224	83.4	96.5	42.6M	5.69G	424	Apple ML

FPS is obtained on a single V100 GPU.

We train with a total batch size 4096.

name	pretrain	resolution	acc@1	acc@5	#params	FLOPs	22K model	1K model
AFF-Base	ImageNet-22K	384x384	86.2	98.0	75.34M	42.54G	Apple ML	Apple ML

Getting Started

Clone this repo

git clone [email protected]:apple/ml-autofocusformer.git
cd ml-autofocusformer

One can download pre-trained checkpoints through the links in the table above.

Create environment and install requirements

sh create_env.sh

See further documentation inside the script file.

Our experiments are run with CUDA==11.6 and pytorch==1.12.

Prepare data

We use standard ImageNet dataset, which can be downloaded from http://image-net.org/.

For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like:

$ tree imagenet
imagenet/
├── training
│   ├── class1
│   │   ├── img1.jpeg
│   │   ├── img2.jpeg
│   │   └── ...
│   ├── class2
│   │   ├── img3.jpeg
│   │   └── ...
│   └── ...
└── validation
    ├── class1
    │   ├── img4.jpeg
    │   ├── img5.jpeg
    │   └── ...
    ├── class2
    │   ├── img6.jpeg
    │   └── ...
    └── ...

Train and evaluate

Modify the arguments in script run_aff.sh (e.g., path to dataset) and run

sh run_aff.sh

for training or evaluation.

Run python main.py -h to see full documentation of the args.

One can also directly modify the config files in configs/.

Citing AutoFocusFormer

@inproceedings{autofocusformer,
    title = {AutoFocusFormer: Image Segmentation off the Grid},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    author = {Ziwen, Chen and Patnaik, Kaushik and Zhai, Shuangfei and Wan, Alvin and Ren, Zhile and Schwing, Alex and Colburn, Alex and Fuxin, Li},
    year = {2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
clusten		clusten
configs		configs
data		data
models		models
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
ACKNOWLEDGMENTS		ACKNOWLEDGMENTS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
aff.png		aff.png
architecture.png		architecture.png
config.py		config.py
create_env.sh		create_env.sh
demo1.png		demo1.png
demo2.png		demo2.png
logger.py		logger.py
lr_scheduler.py		lr_scheduler.py
main.py		main.py
optimizer.py		optimizer.py
run_aff.sh		run_aff.sh
utils.py		utils.py

License

apple/ml-autofocusformer

Folders and files

Latest commit

History

Repository files navigation

AutoFocusFormer

Introduction

Main Results on ImageNet with Pretrained Models

Getting Started

Clone this repo

Create environment and install requirements

Prepare data

Train and evaluate

Citing AutoFocusFormer

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages