Skip to content

Balanced Classification: A Unified Framework for Long-Tailed Object Detection (TMM 2023)

License

Notifications You must be signed in to change notification settings

Tianhao-Qi/BACL

Repository files navigation

Balanced Classification: A Unified Framework for Long-Tailed Object Detection

arXiv preprint

This repo is the official implementation for paper: Balanced Classification: A Unified Framework for Long-Tailed Object Detection (Accepted by IEEE Transactions on Multimedia).

News

2023-08-19: We upload visualizations of different methods to this repo!

2023-08-15: We update the downloading urls of annotations for LVIS dataset (see issue #1), which are expired.

2023-08-14: Our paper receives publicity from the 极市平台!

2023-08-09: Our paper is reported and interpreted by CVHub!

2023-08-03: Our paper is accepted by IEEE Transactions on Multimedia (TMM) and to be published!

TODO

  • Integrate other SOTA methods to this repo
  • Release pretrained Faster R-CNN detector with Swin Transformer as backbone

Introduction

Conventional detectors suffer from performance degradation when dealing with long-tailed data due to a classification bias towards the majority head categories. In this paper, we contend that the learning bias originates from two factors: 1) the unequal competition arising from the imbalanced distribution of foreground categories, and 2) the lack of sample diversity in tail categories. To tackle these issues, we introduce a unified framework called BAlanced CLassification (BACL), which enables adaptive rectification of category distribution disparities and dynamic intensification of sample diversities in a synchronized manner. Specifically, a novel foreground classification balance loss (FCBL) is developed to ameliorate the domination of head categories and shift attention to difficult-to-differentiate categories by introducing pairwise class-aware margins and auto-adjusted weight terms, respectively. This loss prevents the over-suppression of tail categories by dominant head categories in the context of unequal competition. Moreover, we propose a dynamic feature hallucination module (FHM), which expands the representation of tail categories in the feature space by synthesizing hallucinated samples to introduce additional data variances. In this divide-and-conquer approach, BACL sets the new state-of-the-art on the challenging LVIS benchmark with a decoupled training pipeline, surpassing vanilla Faster R-CNN with ResNet-50-FPN by 5.8% AP and 16.1% AP for overall and tail categories. Extensive experiments demonstrate that BACL consistently achieves performance improvements across various datasets with different backbones and architectures.

Framework

Requirements

1. Environment:

We tested on the following settings:

  • python 3.8
  • cuda 11.0
  • pytorch 1.7.0
  • torchvision 0.4.0
  • mmcv 1.2.7

Use MMDetection by Docker

We provide a Dockerfile to build an image. Ensure that you are using docker version >=19.03。

# build an image with PyTorch 1.7.0, CUDA 11.0
# If you want to use another version, just modify the Dockerfile
docker build -t mmdetection docker/

Run it with:

docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmdetection/data mmdetection

2. Data:

a. For dataset images:

# Make sure you are in dir BACL

mkdir data
cd data
mkdir lvis_v0.5
mkdir lvis_v1
  • If you already have COCO2017 dataset, it will be great. Link train2017 and val2017 folders under folder lvis_v0.5 and lvis_v1.
  • If you do not have COCO2017 dataset, please download: COCO train set and COCO val set and unzip these files and mv them under folder lvis_v0.5 and lvis_v1.

b. For dataset annotations:

After all these operations, the folder data should be like this:

    data
    ├── lvis_v0.5
    │   ├── annotations
    │   │   ├── lvis_v0.5_train.json
    │   │   ├── lvis_v0.5_val.json
    │   ├── train2017
    │   │   ├── 000000100582.jpg
    │   │   ├── 000000102411.jpg
    │   │   ├── ......
    │   └── val2017
    │       ├── 000000062808.jpg
    │       ├── 000000119038.jpg
    │       ├── ......
    ├── lvis_v1
    │   ├── annotations
    │   │   ├── lvis_v1_train.json
    │   │   ├── lvis_v1_val.json
    │   ├── train2017
    │   │   ├── 000000100582.jpg
    │   │   ├── 000000102411.jpg
    │   │   ├── ......
    │   └── val2017
    │       ├── 000000062808.jpg
    │       ├── 000000119038.jpg
    │       ├── ......

Training

Use the following commands to train a model for lvis_v0.5.

# use decoupled training pipeline:

# 1. representation learning stage of BACL
./tools/dist_train.sh configs/bacl/bacl_representation_faster_rcnn_r50_fpn_1x_lvis_v0.5.py 8

# 2. classifier learning stage of BACL
./tools/dist_train.sh configs/bacl/bacl_classifier_faster_rcnn_r50_fpn_mstrain_1x_lvis_v0.5.py 8

Use the following commands to train a model for lvis_v1.

# use decoupled training pipeline:

# 1. representation learning stage of BACL
./tools/dist_train.sh configs/bacl/bacl_representation_faster_rcnn_r50_fpn_1x_lvis_v1.py 8

# 2. classifier learning stage of BACL
./tools/dist_train.sh configs/bacl/bacl_classifier_faster_rcnn_r50_fpn_mstrain_1x_lvis_v1.py 8

Important: The default learning rate in config files is for 8 GPUs and 2 img/gpu (batch size = 8*2 = 16). According to the Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. (Cited from mmdetection.)

Testing

Use the following commands to test a trained model.

./tools/dist_test.sh \
 ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
  • $RESULT_FILE: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.
  • $EVAL_METRICS: Items to be evaluated on the results. bbox for bounding box evaluation only. bbox segm for bounding box and mask evaluation.

For example (assume that you have finished the training of BACL models.):

  • To evaluate the trained BACL model with Faster R-CNN R50-FPN for object detection:
./tools/dist_test.sh configs/bacl/bacl_classifier_faster_rcnn_r50_fpn_mstrain_1x_lvis_v0.5.py \
./work_dirs/bacl_classifier_faster_rcnn_r50_fpn_mstrain_1x_lvis_v0.5/epoch_12.pth 8 \
--eval bbox

Results and models

For your convenience, we provide the following trained models. All models are trained with 16 images in a mini-batch.

Method Backbone Dataset box AP Model
baseline R50_FPN LVIS v0.5 22.0 config / model
BACL R50_FPN LVIS v0.5 27.8 config / model
baseline R50_FPN LVIS v1 19.3 config / model
BACL R50_FPN LVIS v1 26.1 config / model
baseline R101_FPN LVIS v0.5 23.3 config / model
BACL R101_FPN LVIS v0.5 29.4 config / model
baseline R101_FPN LVIS v1 20.9 config / model
BACL R101_FPN LVIS v1 27.8 config / model

[0] All results are obtained with a single model and without any test time data augmentation such as multi-scale, flipping and etc..
[1] Refer to more details in config files in config/bacl/.

Visualization

visualization

Citation

If you find it useful in your research, please consider citing our paper as follows:

@misc{qi2023balanced,
      title={Balanced Classification: A Unified Framework for Long-Tailed Object Detection}, 
      author={Tianhao Qi and Hongtao Xie and Pandeng Li and Jiannan Ge and Yongdong Zhang},
      year={2023},
      eprint={2308.02213},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Credit

Thanks MMDetection team for the wonderful open source project!

About

Balanced Classification: A Unified Framework for Long-Tailed Object Detection (TMM 2023)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages