Skip to content

Latest commit

 

History

History

detection

Applying MogaNet to Object Detection

This repo is a PyTorch implementation of applying MogaNet to object detaction and instance segmentation with Mask R-CNN and RetinaNet on COCO. The code is based on MMDetection. For more details, see Efficient Multi-order Gated Aggregation Network (ICLR 2024).

Note

Please note that we simply follow the hyper-parameters of PVT and ConvNeXt, which may not be the optimal ones for MogaNet. Feel free to tune the hyper-parameters to get better performance.

Environement Setup

Install MMDetection from souce code, or follow the following steps. This experiment uses MMDetection>=2.19.0, and we reproduced the results with MMDetection v2.26.0 and Pytorch==1.10.

pip install openmim
mim install mmcv-full
pip install mmdet

Apex (optional) for Pytorch<=1.6.0:

git clone https://github.com/NVIDIA/apex
cd apex
python setup.py install --cpp_ext --cuda_ext --user

By default, we run experiments with fp32 or fp16 (Apex). If you would like to disable apex, modify the type of runner as EpochBasedRunner and comment out the following code block in the configuration files:

fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

Note: Since we write MogaNet backbone code of detection, segmentation, and pose estimation in the same file, it also works for MMSegmentation and MMPose through @BACKBONES.register_module(). Please continue to install MMSegmentation or MMPose for further usage.

Data preparation

Download COCO2017 and prepare COCO experiments according to the guidelines in MMDetection.

(back to top)

Results and models on COCO

Notes: All the models can also be downloaded by Baidu Cloud (z8mf) at MogaNet/COCO_Detection. We preform object detection experiments based on RetinaNet for 1x training setting, while performing detection and instance segmentation experiments based on Mask R-CNN and Cascade Mask R-CNN for 1x or MS 3x (multiple scales) training settings. The params (M) and FLOPs (G) are measured by get_flops with 1280 $\times$ 800 resolutions.

python get_flops.py /path/to/config --shape 1280 800

MogaNet + RetinaNet

Method Backbone Pretrain Params FLOPs Lr schd box mAP Config Download
RetinaNet MogaNet-XT ImageNet-1K 12.1M 167.2G 1x 39.7 config log / model
RetinaNet MogaNet-T ImageNet-1K 14.4M 173.4G 1x 41.4 config log / model
RetinaNet MogaNet-S ImageNet-1K 35.1M 253.0G 1x 45.8 config log / model
RetinaNet MogaNet-B ImageNet-1K 53.5M 354.5G 1x 47.7 config log / model
RetinaNet MogaNet-L ImageNet-1K 92.4M 476.8G 1x 48.7 config log / model

MogaNet + Mask R-CNN

Method Backbone Pretrain Params FLOPs Lr schd box mAP mask mAP Config Download
Mask R-CNN MogaNet-XT ImageNet-1K 22.8M 185.4G 1x 40.7 37.6 config log / model
Mask R-CNN MogaNet-T ImageNet-1K 25.0M 191.7G 1x 42.6 39.1 config log / model
Mask R-CNN MogaNet-S ImageNet-1K 45.0M 271.6G 1x 46.6 42.2 config log / model
Mask R-CNN MogaNet-B ImageNet-1K 63.4M 373.1G 1x 49.0 43.8 config log / model
Mask R-CNN MogaNet-L ImageNet-1K 102.1M 495.3G 1x 49.4 44.2 config log / model
Mask R-CNN MogaNet-T ImageNet-1K 25.0M 191.7G MS 3x 45.3 40.7 config log / model
Mask R-CNN MogaNet-S ImageNet-1K 45.0M 271.6G MS 3x 48.5 43.1 config log / model
Mask R-CNN MogaNet-B ImageNet-1K 63.4M 373.1G MS 3x 50.3 44.4 config log / model
Mask R-CNN MogaNet-L ImageNet-1K 63.4M 373.1G MS 3x 50.6 44.6 config log / model

MogaNet + Cascade Mask R-CNN

Method Backbone Pretrain Params FLOPs Lr schd box mAP mask mAP Config Download
Cascade Mask R-CNN MogaNet-S ImageNet-1K 77.9M 405.4G MS 3x 51.4 44.9 config log / model
Cascade Mask R-CNN MogaNet-S ImageNet-1K 82.8M 750.2G GIOU+MS 3x 51.7 45.1 config log / model
Cascade Mask R-CNN MogaNet-B ImageNet-1K 101.2M 851.6G GIOU+MS 3x 52.6 46.0 config log / model
Cascade Mask R-CNN MogaNet-L ImageNet-1K 139.9M 973.8G GIOU+MS 3x 53.3 46.1 config -

Demo

We provide some demos according to MMDetection. Please use inference_demo or run the following script:

cd demo
python image_demo.py demo.png ../configs/moganet/mask_rcnn_moganet_small_fpn_1x_coco.py ../../work_dirs/checkpoints/mask_rcnn_moganet_small_fpn_1x_coco.pth --out-file pred.png

Training

We train the model on a single node with 8 GPUs (a batch size of 16) by default. Start training with the config as:

PORT=29001 bash dist_train.sh /path/to/config 8

Evaluation

To evaluate the trained model on a single node with 8 GPUs, run:

bash dist_test.sh /path/to/config /path/to/checkpoint 8 --out results.pkl --eval bbox # or `bbox segm`

Citation

If you find this repository helpful, please consider citing:

@inproceedings{iclr2024MogaNet,
  title={Efficient Multi-order Gated Aggregation Network},
  author={Siyuan Li and Zedong Wang and Zicheng Liu and Cheng Tan and Haitao Lin and Di Wu and Zhiyuan Chen and Jiangbin Zheng and Stan Z. Li},
  booktitle={International Conference on Learning Representations},
  year={2024}
}

Acknowledgment

Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.

(back to top)