GitHub - uvavision/AMC-grounding: [CVPR 2023] Code for "Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations"

Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

Ziyan Yang, Kushal Kafle, Franck Dernoncourt, Vicente Ordonez, CVPR 2023

If you have any questions, please email [email protected]

✨ We make a demo for this work! Feel free to try it!

Abstract

We propose a margin-based loss for vision-language model pretraining that encourages gradient-based explanations that are consistent with region-level annotations. We refer to this objective as Attention Mask Consistency (AMC) and demonstrate that it produces superior visual grounding performance compared to models that rely instead on region-level annotations for explicitly training an object detector such as Faster R-CNN. AMC works by encouraging gradient-based explanation masks that focus their attention scores mostly within annotated regions of interest for images that contain such annotations. Particularly, a model trained with AMC on top of standard vision-language modeling objectives obtains a state-of-the-art accuracy of 86.59% in the Flickr30k visual grounding benchmark, an absolute improvement of 5.48% when compared to the best previous model. Our approach also performs exceedingly well on established benchmarks for referring expression comprehension and offers the added benefit by design of gradient-based explanations that better align with human annotations.

Requirements

Python 3.8
PyTorch 1.8.0+cu111
transformers==4.8.1
Numpy, scikit-image, opencv-python, pillow, matplotlib, timm

Data

Visual Genome (VG) images: Please download VG images first.
Annotations: Please download our pre-processed text annotations for VG images. You may need to modify the image path in each sample to load images.

Train

After downloading the pre-trained ALBEF-14M model, You can run the following command to train the model:

# Train the model using bounding box annotations from VG
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --use_env Pretrain.py --config configs/Pretrain.yaml --output_dir ALBEF_Grounding --checkpoint ALBEF.pth

Evaluation

To evaluate Flickr30k, please follow info-ground to process the data.

You can run the following command to evaluate the RefCOCO+, RefCLEF and Flickr30k datasets using all the checkpoints in your ALBEF_Grounding folder:

CUDA_VISIBLE_DEVICES=1 python grounding_eval_singlegpu.py --checkpoint ALBEF_Grounding --output_dir ALBEF_Grounding/refcoco_results --config configs/Grounding_refcoco.yaml

CUDA_VISIBLE_DEVICES=1 python grounding_eval_singlegpu_refclef.py --checkpoint ALBEF_Grounding --output_dir ALBEF_Grounding/refclef_results --config configs/Grounding_refclef.yaml

CUDA_VISIBLE_DEVICES=1 python grounding_eval_singlegpu_flickr.py --checkpoint ALBEF_Grounding --output_dir ALBEF_Grounding/flickr_results --config configs/Grounding_flickr.yaml

You can also download these checkpoints and put them into the corresponding folder to reproduce our results:

CUDA_VISIBLE_DEVICES=1 python grounding_eval_singlegpu.py --checkpoint best_refcoco.pth --output_dir best_refcoco_results --config configs/Grounding_refcoco.yaml

CUDA_VISIBLE_DEVICES=1 python grounding_eval_singlegpu_refclef.py --checkpoint best_refclef.pth --output_dir best_refclef_results --config configs/Grounding_refclef.yaml

CUDA_VISIBLE_DEVICES=1 python grounding_eval_singlegpu_flickr.py --checkpoint best_flickr.pth --output_dir best_flickr_results --config configs/Grounding_flickr.yaml

Citing

If you find our paper/code useful, please consider citing:

@inproceedings{yang2023improving,
  title={Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations},
  author={Yang, Ziyan and Kafle, Kushal and Dernoncourt, Franck and Ordonez, Vicente},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={19165--19174},
  year={2023}
}

Acknowledgement

The implementation of AMC relies on the code from ALBEF. We would like to thank the authors who have open-sourced their work and made it available to the community.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
configs		configs
dataset		dataset
models		models
optim		optim
refTools		refTools
scheduler		scheduler
LICENSE		LICENSE
Pretrain.py		Pretrain.py
README.md		README.md
demo_amc.ipynb		demo_amc.ipynb
grounding_eval_singlegpu.py		grounding_eval_singlegpu.py
grounding_eval_singlegpu_flickr.py		grounding_eval_singlegpu_flickr.py
grounding_eval_singlegpu_refclef.py		grounding_eval_singlegpu_refclef.py
support.py		support.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

Abstract

Requirements

Data

Train

Evaluation

Citing

Acknowledgement

About

Releases

Packages

Languages

License

uvavision/AMC-grounding

Folders and files

Latest commit

History

Repository files navigation

Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

Abstract

Requirements

Data

Train

Evaluation

Citing

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages