Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
-
Updated
Dec 12, 2023 - Python
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".
Code for ECIR 2023 paper "Dialogue-to-Video Retrieval"
Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition
Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation (Published in IEEE TMM 2023)
Socratic models for multimodal reasoning & image captioning
Add a description, image, and links to the vision-language-learning topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-learning topic, visit your repo's landing page and select "manage topics."