awesome grounding: A curated list of research papers in visual grounding
-
Updated
Apr 9, 2023
awesome grounding: A curated list of research papers in visual grounding
[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction"
Source code for "Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction"
[CVPR2022] Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)
Implementation of paper "Not All Frames Are Equal: Weakly-Supervised Video Grounding with Contextual Similarity and Visual Clustering Losses"
Tensorflow Reproduction of the EMNLP-2018 paper "Temporally Grounding Natural Sentence in Video"
"Video Moment Retrieval from Text Queries via Single Frame Annotation" in SIGIR 2022.
Official pytorch implementation of "Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos"
Awesome papers & datasets specifically focused on long-term videos.
Code for the paper Multimodal Dialogue State Tracking (NAACL22)
[arXiv 23] Pytorch code for "Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval"
[EMNLP 2022] Pytorch code for "Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval"
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos
paper list on Video Moment Retrieval (VMR), or Natural Language Video Localization (NLVL), or Temporal Sentence Grounding in Videos (TSGV))
Add a description, image, and links to the video-grounding topic page so that developers can more easily learn about it.
To associate your repository with the video-grounding topic, visit your repo's landing page and select "manage topics."