Skip to content

[WACV2024] SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data

Notifications You must be signed in to change notification settings


Repository files navigation

Ziyan Yang, Kushal Kafle, Zhe Lin, Scott Cohen, Zhihong Ding, Vicente Ordonez

If you have any questions, you can email [email protected]


We propose Subject-Conditional Relation Detection SCoRD, where conditioned on an input subject, the goal is to predict all its relations to other objects in a scene along with their locations. Based on the Open Images dataset, we propose a challenging OIv6-SCoRD benchmark such that the training and testing splits have a distribution shift in terms of the occurrence statistics of <subject, relation, object> triplets. To solve this problem, we propose an auto-regressive model that given a subject, it predicts its relations, objects, and object locations by casting this output as a sequence of tokens. First, we show that previous scene-graph prediction methods fail to produce as exhaustive an enumeration of relation-object pairs when conditioned on a subject on this benchmark. Particularly, we obtain a recall@3 of 83.8% for our relation-object predictions compared to the 49.75% obtained by a recent scene graph detector. Then, we show improved generalization on both relation-object and object-box predictions by leveraging during training relation-object pairs obtained automatically from textual captions and for which no object-box annotations are available. Particularly, for <subject, relation, object> triplets for which no object locations are available during training, we are able to obtain a recall@3 of 33.80% for relation-object pairs and 26.75% for their box locations.


Please follow ALBEF to install the required packages.


Download the training and testing splits here. To download images:


Download the checkpoint for the removing 50% experiment here.


First, run this command to generate <relation, object, object location> triples:

# start and end indices indicate the index of your target checkpoint in the checkpoint folder. If you only have one checkpoint in the folder, the start flag should be 0 and the end flag should be 1
# chunk size indicates how many batches of evaluation samples should be processed
CUDA_VISIBLE_DEVICES=0 python --root your_checkpoint_folder --start 0 --end 1 --chunk 0 --num_seq 3 --num_beams 5 --chunk_size 100 --round 2

Then, run this command to get evaluation results:

python --results_folder your_checkpoint_folder/oidv6_results/  --report_unseen True --topk 3


First, download the pre-trained checkpoint from PEVL: Run:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=12888 --use_env --config configs/relation_grounding.yaml --output_dir your_checkpoint_folder --checkpoint pevl_pretrain.pth


We would like to thank ALBEF and PEVL. Their released codebases help a lot in this project.


If you think this work is interesting, please consider to cite it:

  title={SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data},
  author={Yang, Ziyan and Kafle, Kushal and Lin, Zhe and Cohen, Scott and Ding, Zhihong and Ordonez, Vicente},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},


[WACV2024] SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data






No releases published


No packages published
