imTED: Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Code of our ICCV 2023 paper Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection. Blog in Chinese is available here.

The code is based on mmdetection, please refer to get_started.md and MMDET_README.md to set up the environment and prepare the data.

Config Files and Performance and Trained Weights

We provide 9 configuration files in the configs directory.

Config File	Backbone	Epochs	Box AP	Mask AP	Download
imted_faster_rcnn_vit_small_3x_coco	ViT-S	36	48.2		model
imted_faster_rcnn_vit_base_3x_coco	ViT-B	36	52.9		model
imted_faster_rcnn_vit_large_3x_coco	ViT-L	36	55.4		model
imted_mask_rcnn_vit_small_3x_coco	ViT-S	36	48.7	42.7	model
imted_mask_rcnn_vit_base_3x_coco	ViT-B	36	53.3	46.4	model
imted_mask_rcnn_vit_large_3x_coco	ViT-L	36	55.5	48.1	model
imted_faster_rcnn_vit_base_2x_base_training_coco	ViT-B	24	50.6		model
imted_faster_rcnn_vit_base_2x_finetuning_10shot_coco	ViT-B	108	23.0		model
imted_faster_rcnn_vit_base_2x_finetuning_30shot_coco	ViT-B	108	30.4		model

MAE Pre-training

The pre-trained model is trained with the official MAE code. For ViT-S, we use a 4-layer decoder with dimension 256 for 800 epochs of pre-training. For ViT-B, we use an 8-layer decoder with dimension 512 for 1600 epochs of pre-training. Pre-trained weights can be downloaded from the official MAE weight. For ViT-L, we use an 8-layer decoder with dimension 512 for 1600 epochs of pre-training. Pre-trained weights can be downloaded from the official MAE weight.

Last Step of Preparation

For all experiments, remember to modify the path of pre-trained weights in the configuration files, e.g. configs/imted/imted_faster_rcnn_vit_small_3x_coco.py.

For few-shot experiments, please refer to FsDet for data preparation. Remember to modify the path of json in the configuration files, e.g. configs/imted/few_shot/imted_faster_rcnn_vit_base_2x_base_training_coco.py. Json files used for few-shot training and evaluation can also be downloaded from here.

Evaluating with 1 GPU

tools/dist_test.sh "path/to/config/file.py" "path/to/trained/weights.pth" 1 --eval bbox

Training with 8 GPUs

tools/dist_train.sh "path/to/config/file.py" 8

Few-shot Training with 8 GPUs

Base Training

tools/dist_train.sh configs/imted/few_shot/imted_faster_rcnn_vit_base_2x_base_training_coco.py 8

Finetuning

Replace the the ckeckpoint path of your own checkpoint from base training or just use our provided checkpoint here.

tools/dist_train.sh configs/imted/few_shot/imted_faster_rcnn_vit_base_2x_finetuning_30shot_coco.py 8

Acknowledgement

This project is based on MAE, mmdetection and timm. Thanks for their wonderful works.

Some works based on imTED

Citation

If you find imTED is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

@inproceedings{liu2023integrally,
  title={Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection},
  author={Liu, Feng and Zhang, Xiaosong and Peng, Zhiliang and Guo, Zonghao and Wan, Fang and Ji, Xiangyang and Ye, Qixiang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={6825--6834},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.dev_scripts		.dev_scripts
.github		.github
configs		configs
demo		demo
docker		docker
docs		docs
figs		figs
mmcv_custom		mmcv_custom
mmdet		mmdet
models		models
requirements		requirements
resources		resources
tests		tests
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
MMDET_README.md		MMDET_README.md
MMDET_README_zh-CN.md		MMDET_README_zh-CN.md
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

License

LiewFeng/imTED

Folders and files

Latest commit

History

Repository files navigation

imTED: Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Config Files and Performance and Trained Weights

MAE Pre-training

Last Step of Preparation

Evaluating with 1 GPU

Training with 8 GPUs

Few-shot Training with 8 GPUs

Base Training

Finetuning

Acknowledgement

Some works based on imTED

Citation

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages