QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection (CVPR 2023 Paper)

by WonJun Moon^*1, SangEek Hyun^*1, SangUk Park², Dongchan Park², Jae-Pil Heo¹

¹ Sungkyunkwan University, ² Pyler, ^* Equal Contribution

[Arxiv] [Paper] [Project Page] [Video]

Updates & News

Charades-STA experiments with C3D features are actually conducted with I3D features and I3D benchmarking tables. Features are provided here from VSLNET Github. Sorry for the inconvenience.
Our new paper on moment retrieval and highlight detection is now available at [CG-DETR arxiv] (Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding). Codes will be soon available at [CG-DETR Github].

Prerequisites

0. Clone this repo

1. Prepare datasets

(2023/11/21) For a newer version of instructions for preparing datasets, please refer to CG-DETR.

QVHighlights : Download official feature files for QVHighlights dataset from Moment-DETR.

Download moment_detr_features.tar.gz (8GB), extract it under '../features' directory. You can change the data directory by modifying 'feat_root' in shell scripts under 'qd_detr/scripts/' directory.

tar -xf path/to/moment_detr_features.tar.gz

TVSum : Download feature files for TVSum dataset from UMT.

Download TVSum (69.1MB), and either extract it under '../features/tvsum/' directory or change 'feat_root' in TVSum shell files under 'qd_detr/scripts/tvsum/'.

2. Install dependencies. Python version 3.7 is required.

pip install -r requirements.txt

For anaconda setup, please refer to the official Moment-DETR github.

QVHighlights

Training

Training with (only video) and (video + audio) can be executed by running the shell below:

bash qd_detr/scripts/train.sh --seed 2018
bash qd_detr/scripts/train_audio.sh --seed 2018

To calculate the standard deviation in the paper, we ran with 5 different seeds 0, 1, 2, 3, and 2018 (2018 is the seed used in Moment-DETR). Best validation accuracy is yielded at the last epoch.

Inference Evaluation and Codalab Submission for QVHighlights

Once the model is trained, hl_val_submission.jsonl and hl_test_submission.jsonl can be yielded by running inference.sh.

bash qd_detr/scripts/inference.sh results/{direc}/model_best.ckpt 'val'
bash qd_detr/scripts/inference.sh results/{direc}/model_best.ckpt 'test'

where direc is the path to the saved checkpoint. For more details for submission, check standalone_eval/README.md.

Pretraining and Finetuning

Pretraining with ASR captions is also available. To launch pretraining, run:

bash qd_detr/scripts/pretrain.sh

This will pretrain the QD-DETR model on the ASR captions for 100 epochs, the pretrained checkpoints and other experiment log files will be written into results. With the pretrained checkpoint, we can launch finetuning from a pretrained checkpoint PRETRAIN_CHECKPOINT_PATH as:

bash qd_detr/scripts/train.sh  --resume ${PRETRAIN_CHECKPOINT_PATH}

Note that this finetuning process is the same as standard training except that it initializes weights from a pretrained checkpoint.

TVSum

Training with (only video) and (video + audio) can be executed by running the shell below:

bash qd_detr/scripts/tvsum/train_tvsum.sh 
bash qd_detr/scripts/tvsum/train_tvsum_audio.sh

Best results are stored in 'results_[domain_name]/best_metric.jsonl'.

Others

Pretraining with ASR captions
Runninng predictions on customized datasets

are also available as we use the official implementation for Moment-DETR as the basis. For the instructions, check their github.

QVHighlights pretrained checkpoints

Method (Modality)	Model file
QD-DETR (Video+Audio) Checkpoint	link
QD-DETR (Video only) Checkpoint	link

Cite QD-DETR (Query-Dependent Video Representation for Moment Retrieval and Highlight Detection)

If you find this repository useful, please use the following entry for citation.

@inproceedings{moon2023query,
  title={Query-dependent video representation for moment retrieval and highlight detection},
  author={Moon, WonJun and Hyun, Sangeek and Park, SangUk and Park, Dongchan and Heo, Jae-Pil},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={23023--23033},
  year={2023}
}

Contributors and Contact

If there are any questions, feel free to contact with the authors: WonJun Moon ([email protected]), Sangeek Hyun ([email protected]).

LICENSE

The annotation files and many parts of the implementations are borrowed Moment-DETR. Following, our codes are also under MIT license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

qd_detr

qd_detr

results

results

run_on_video

run_on_video

standalone_eval

standalone_eval

utils

utils

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection (CVPR 2023 Paper)

Updates & News

Prerequisites

QVHighlights

Training

Inference Evaluation and Codalab Submission for QVHighlights

Pretraining and Finetuning

TVSum

Others

QVHighlights pretrained checkpoints

Cite QD-DETR (Query-Dependent Video Representation for Moment Retrieval and Highlight Detection)

Contributors and Contact

LICENSE

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
qd_detr		qd_detr
results		results
run_on_video		run_on_video
standalone_eval		standalone_eval
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

wjun0830/QD-DETR

Folders and files

Latest commit

History

Repository files navigation

QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection (CVPR 2023 Paper)

Updates & News

Prerequisites

QVHighlights

Training

Inference Evaluation and Codalab Submission for QVHighlights

Pretraining and Finetuning

TVSum

Others

QVHighlights pretrained checkpoints

Cite QD-DETR (Query-Dependent Video Representation for Moment Retrieval and Highlight Detection)

Contributors and Contact

LICENSE

About

Topics

Resources

License

Stars

Watchers

Forks

Languages