VinVL VisualBackbone

Original VinVL visual backbone with simplified APIs to easily extract features, boxes, object detections in a few lines of code. This repo is based on microsoft/scene_graph_benchmark please refer that repo for further info about the benchmark

Installation

Create your virtual environment an install the following dependencies according to your system specs.

PyTorch 1.7
torchvision

Then run:

# glone this repo
git clone [email protected]:michelecafagna26/vinvl-visualbackbone.git

# good practice
pip install --upgrade pip

# install the requirements
pip install -r requirements.txt

cd scene_graph_benchmark

# install Scene Graph Detection with the VisualBackbone apis
pyton setup.py build develop

You can check the original INSTALL.md for alternative installation options

Model download

Download the model before running your code.

mkdir -p scene_graph_benchmark/models/
cd scene_graph_benchmark/models/

# download from the huggingface model hub
git lfs install # if not installed
git clone https://huggingface.co/michelecafagna26/vinvl_vg_x152c4

Quick start: feature extraction

from scene_graph_benchmark.wrappers import VinVLVisualBackbone

img_file = "scene_graph_bechmark/demo/woman_fish.jpg"

detector = VinVLVisualBackbone()

dets = detector(img_file)

dets contains the following keys: ["boxes", "classes", "scores", "features", "spatial_features"] You can obtain the full VinVL's visual features by concatenating "features" and "spatial_features"

import numpy as np

v_feats = np.concatenate((dets['features'],  dets['spatial_features']), axis=1)
# v_feats.shape = (num_boxes, 2054)

Demo

Coming Soon!

Citations

Please consider citing the original project and the VinVL paper

@misc{han2021image,
      title={Image Scene Graph Generation (SGG) Benchmark}, 
      author={Xiaotian Han and Jianwei Yang and Houdong Hu and Lei Zhang and Jianfeng Gao and Pengchuan Zhang},
      year={2021},
      eprint={2107.12604},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@inproceedings{zhang2021vinvl,
  title={Vinvl: Revisiting visual representations in vision-language models},
  author={Zhang, Pengchuan and Li, Xiujun and Hu, Xiaowei and Yang, Jianwei and Zhang, Lei and Wang, Lijuan and Choi, Yejin and Gao, Jianfeng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5579--5588},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
scene_graph_benchmark		scene_graph_benchmark
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scene_graph_benchmark

scene_graph_benchmark

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

VinVL VisualBackbone

Installation

Model download

Quick start: feature extraction

Demo

Citations

About

Releases

Packages

Languages

michelecafagna26/vinvl-visualbackbone

Folders and files

Latest commit

History

Repository files navigation

VinVL VisualBackbone

Installation

Model download

Quick start: feature extraction

Demo

Citations

About

Topics

Resources

Stars

Watchers

Forks

Languages