Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

Zhihao Yuan, Jinke Ren, Chun-Mei Feng, Hengshuang Zhao, Shuguang Cui, Zhen Li

CVPR 2024

Comparative overview of two 3DVG approaches. (a) Supervised 3DVG involves input from 3D scans combined with text queries, guided by object-text pair annotations, (b) Zero-shot 3DVG identifies the location of target objects using programmatic representation generated by LLMs, i.e., target category, anchor category, and relation grounding, thereby highlighting its superiority in decoding spatial relations and object identifiers within a given space, e.g., the location of the keyboard (outlined in green) can be retrieved based on the distance between the keyboard and the door (outlined in blue).

Instructions

Zero-shot evaluation

Download our preproceed 3D features from here and place them under data/scannet folder.

Run the following command:

python visprog_nr3d.py

Uncomment this line to use the BLIP2 models for LOC module. You can download our preprocessed images from here and change the image_path to your downloaded path.

Data Preparation

You can also process the features by yourself.

First, install the dependencies:

cd ./models/pointnext/PointNeXt
bash install.sh

Prepare ScanNet 2D data following OpenScene and 3D data following vil3dref.

Then, run the following scripts:

python preprocess/process_feat_3d.py
python preprocess/process_feat_2d.py

You can refer to preprocess/process_mask3d.ipynb for processing 3D instance segments.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
docs		docs
models		models
preprocess		preprocess
weights		weights
zsvg		zsvg
.gitignore		.gitignore
README.md		README.md
visprog_nr3d.py		visprog_nr3d.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

docs

docs

models

models

preprocess

preprocess

weights

weights

zsvg

zsvg

.gitignore

.gitignore

README.md

README.md

visprog_nr3d.py

visprog_nr3d.py

Repository files navigation

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

CVPR 2024

Instructions

Zero-shot evaluation

Data Preparation

About

Releases

Packages

Languages

CurryYuan/ZSVG3D

Folders and files

Latest commit

History

Repository files navigation

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

CVPR 2024

Instructions

Zero-shot evaluation

Data Preparation

About

Topics

Resources

Stars

Watchers

Forks

Languages