VLPD

Official Code of CVPR'23 Paper "VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision"

[Paper] [arXiv] [Supp. Materials] [Video] [Poster]

Introduction

We propose in this paper a novel approach via Vision-Language semantic self-supervision for context-aware Pedestrian Detection (VLPD) to model explicitly semantic contexts without any extra annotations:

Firstly, we propose a self-supervised Vision-Language Semantic (VLS) segmentation method, which learns both fully-supervised pedestrian detection and contextual segmentation via self-generated explicit labels of semantic classes by vision-language models.
Furthermore, a self-supervised Prototypical Semantic Contrastive (PSC) learning method is proposed to better discriminate pedestrians and other classes, based on more explicit and semantic contexts obtained from VLS.
Extensive experiments on popular benchmarks CityPersons and Clatech show that our proposed VLPD achieves superior performances over the previous state-of-the-arts, particularly under challenging circumstances like small scale and heavy occlusion.

Below are the visualizations of detection results on Caltech dataset between the baseline CSP (top) and our proposed VLPD (bottom). Green are correct detections, red are wrong detections, and dashed blue are missing detections.

STEP 1: Environment

Software and Hardware requirements:
- Install the conda (miniconda3 is recommended) for creating environment.
- Matlab 2021a (or above) is recommended for evaluation on Caltech.
- At least one Nvidia GPU which supports CUDA 11 is required. (> 24G x 2 for training is recommended)
Create conda environment:
- Modify the "name" item of "environment.yaml" to name the environment, and change the "prefix" item to your conda path.
- Run "conda env create -f environment.yaml" to create a new environment.
- If some install errors occur due to invalid CUDA versions of PyTorch-related packages by PIP, try install the *.whl files manually from official site.
Compile the NMS for current python version:
- Change the working directory to the folder by "cd utils/".
- Under the created conda environment, run "make" for compilng.
Install the PIP package of CLIP
- Please refer to the official repository openai/CLIP. And install it via "pip3" of the created conda environment.
- Download the ResNet50-CLIP weight: RN50.pt.
- Keep "clip_weight" item in "config.py" consistent to where this weight is saved.

STEP 2: Dataset

For CityPersons:
- Following the instruction of official repository cvgroup-njust/CityPersons, download the images (leftImg8bit_trainvaltest.zip (11GB)) of CityScapes dataset and annotations file (*.mat).
- If the data path is "/root/data", place images and annotations in the path format like (take validation set as example):
```
/root/data/cityperson/images/val/frankfurt/frankfurt_000000_000294_leftImg8bit.png

/root/data/cityperson/annotations/anno_val.mat
```
For Caltech:
- Following the instruction of the baseline method liuwei16/CSP to prepare the cached files and images of Caltech dataset (42782 for training, 4024 for testing (2.2GB in total)).
- If the data path is "/root/data/", place images and annotations in the path format like:
```
/root/data/caltech/images/train_3/images/
/root/data/caltech/images/test/images/

/root/data/cache/caltech/train_gt
/root/data/cache/caltech/train_nogt
/root/data/cache/caltech/test
```
If the data path has been customized, change the "root_path" item in "config.py" or "config_caltech.py"
Due to the copyright issues, we can not release the copies of generated data that we used.

STEP 3: Training

For CityPersons:
- Under the conda environment you created, Run "dist_train.sh" to start training.
- Training hyper-parameters are set in "config.py".
For Caltech:
- Under the conda environment you created, Run "dist_train_caltech.sh" to start training.
- Training hyper-parameters are set in "config_caltech.py".

NOTE: For more details about configs not in "config.py", please refer to Others.md.

STEP 4: Evaluation

For CityPersons:
- Miss-Rate is printed in training log of each evalution epoch, from "val_begin" to "val_end" in "config.py" during training.
- If you need to evaluate single checkpoints or get results on other subsets, please refer to Evaluation.md.
For Caltech:
- Detection result files are generated in MODEL_DIR directory, from epoch "val_begin" to "val_end" in "config_caltech.py" during training.
- To get Miss-Rate of these generated files, please refer to Evaluation.md.

Checkpoints of our VLPD are available to download in Evaluation.md.
NOTE: Adjusting "val_begin" will affect the reproducibility, please refer to Others.md.

Citation

If you find our research helpful or make further research, please consider citing:

@InProceedings{Liu_2023_CVPR,
    author    = {Liu, Mengyin and Jiang, Jie and Zhu, Chao and Yin, Xu-Cheng},
    title     = {VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {6662-6671}
}

For our more publications, please refer to Mengyin Liu's Academic Page and Google Scholar of Prof. Zhu and Prof. Yin.

Thanks

Official code of the baseline CSP: liuwei16/CSP
Unofficial PyTorch implementation of CSP: ligang-cs/CSP-Pedestrian-detection
Official code of OpenAI-CLIP: openai/CLIP

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
dataloader		dataloader
docs		docs
eval_caltech		eval_caltech
eval_city		eval_city
lib		lib
net		net
output		output
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
config_caltech.py		config_caltech.py
dist_train.sh		dist_train.sh
dist_train_caltech.sh		dist_train_caltech.sh
environment.yaml		environment.yaml
test.py		test.py
test.sh		test.sh
trainval_distributed.py		trainval_distributed.py
trainval_distributed_caltech.py		trainval_distributed_caltech.py

License

lmy98129/VLPD

Folders and files

Latest commit

History

Repository files navigation

VLPD

Introduction

Contents

STEP 1: Environment

STEP 2: Dataset

STEP 3: Training

STEP 4: Evaluation

Citation

Thanks

About

Topics

Resources

License

Stars

Watchers

Forks

Languages