[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
-
Updated
Aug 8, 2023 - Python
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
Video Foundation Models & Data for Multimodal Understanding
Video embeddings for retrieval with natural language queries
[NeurIPS 2021] Moment-DETR code and QVHighlights dataset
Authors official PyTorch implementation of the "ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning" [ICCV 2019]
[ECCV 2020] PyTorch code for XML on TVRetrieval dataset - TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Authors official Tensorflow implementation of the "Near-Duplicate Video Retrieval with Deep Metric Learning" [ICCVW 2017]
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
W2VV++: A fully deep learning solution for ad-hoc video search
Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Authors official PyTorch implementation of the "DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval" [IJCV 2022]
[WACV'22] Code repository for the paper "Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting", https://arxiv.org/abs/2106.10137.
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
A PyTorch implementation of VIOLET
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)
Undergraduate Dissertation: Content-based video retrieval prototype for movies written in Python using OpenCV.
TransVCL: Attention-enhanced Video Copy Localization Network with Flexible Supervision [AAAI2023 Oral]]
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval
Add a description, image, and links to the video-retrieval topic page so that developers can more easily learn about it.
To associate your repository with the video-retrieval topic, visit your repo's landing page and select "manage topics."