The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".
-
Updated
Dec 8, 2023 - Python
The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".
Unofficial implementation for Sigmoid Loss for Language Image Pre-Training
an API built on FastAPI for visual question answering. It's open source
Related papers about Referring Image Segmentation (RIS)
[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation
The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"
[CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Vision-Controllable Natural Language Generation
My solutions to CS231N CNN assignments
PyTorch code for Finding in NAACL 2022 paper "Probing the Role of Positional Information in Vision-Language Models".
Arabic WordNet matches for synsets in ImageNet
Source code and documentation for the LREC-COLING'24 paper "Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies"
An end-to-end masked contrastive video-and-language pre-training framework
A comprehensive hub for updates on generative AI research, including interviews, notebooks, and additional resources.
Under review. [IROS 2024] PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
Counting dataset for Vision & Language models. Introduced in the paper "Seeing Past Words: Testing the Cross-Modal Capabilities of Pretrained V&L Models". https://arxiv.org/abs/2012.12352
VinVL+L: Enriching Visual Representation with Location Context in Visual Question Answering (VQA)
[INLG2023] The High-Level (HL) dataset is a Vision and Language (V&L) resource aligning object-centric descriptions from COCO with high-level descriptions crowdsourced along 3 axes: scene, action, rationale.
An end-to-end vision and language model incorporating explicit knowledge graphs and OOD-detection.
[Frontiers in AI Journal] Implementation of the paper "Interpreting Vision and Language Generative Models with Semantic Visual Priors"
Add a description, image, and links to the vision-and-language topic page so that developers can more easily learn about it.
To associate your repository with the vision-and-language topic, visit your repo's landing page and select "manage topics."