vision-and-language

Source code and documentation for the LREC-COLING'24 paper "Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies"

reinforcement-learning multi-agent vision-and-language

Updated Mar 27, 2024
Python

shufangxun / MAC

Star

An end-to-end masked contrastive video-and-language pre-training framework

pytorch clip mae end-to-end-learning multimodal vision-and-language activitynet pretraining msrvtt contrastive-learning vision-transformer video-text-retrieval video-language didemo

Updated Dec 13, 2022

nicholasnouri / ai-resources

Star

A comprehensive hub for updates on generative AI research, including interviews, notebooks, and additional resources.

awesome awesome-list interview-questions vision-and-language notebook-jupyter large-language-models llm llms generative-ai

Updated Apr 2, 2024

JHKim-snu / PGA

Star

Under review. [IROS 2024] PGA: Personalizing Grasping Agents with Single Human-Robot Interaction

personalization semi-supervised-learning vision-and-language robotic-manipulation visual-grounding multi-modal-learning

Updated Mar 30, 2024
Python

Heidelberg-NLP / counting-probe

Star

Counting dataset for Vision & Language models. Introduced in the paper "Seeing Past Words: Testing the Cross-Modal Capabilities of Pretrained V&L Models". https://arxiv.org/abs/2012.12352

dataset counting multimodal vision-and-language probing-task

Updated Dec 15, 2021

vyskocj / VinVL-L

Star

VinVL+L: Enriching Visual Representation with Location Context in Visual Question Answering (VQA)

computer-vision visual-question-answering vision-and-language location-recognition

Updated Jan 26, 2023
Python

[INLG2023] The High-Level (HL) dataset is a Vision and Language (V&L) resource aligning object-centric descriptions from COCO with high-level descriptions crowdsourced along 3 axes: scene, action, rationale.

dataset image-captioning image2text vision-and-language multimodal-data huggingface-datasets multimodal-grounding

Updated Nov 13, 2023

ellenzhuwang / implicitOOD

Star

An end-to-end vision and language model incorporating explicit knowledge graphs and OOD-detection.

deep-learning transformer knowledge-graph multimodal-learning mscoco-dataset visual-question-answering vision-and-language ood-detection

Updated May 3, 2024
Python

michelecafagna26 / vl-shap

Star

[Frontiers in AI Journal] Implementation of the paper "Interpreting Vision and Language Generative Models with Semantic Visual Priors"

semantic vl stego explanations interpretable-ai explainable-ai xai vision-and-language multimodal-deep-learning shap vision-language explainable-machine-learning generative-ai vl-shap

Updated Nov 26, 2023
Jupyter Notebook

Improve this page

Add a description, image, and links to the vision-and-language topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-and-language topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-and-language

Here are 220 public repositories matching this topic...

SHTUPLUS / GITM-MR

ahmdtaha / distributed_sigmoid_loss

camiloavil / AI-Vision-Language-Transformer-API

Huntersxsx / RIS-Learning-List

JHKim-snu / GVCCI

kyegomez / MegaVIT

CurryYuan / ZSVG3D

LivXue / VCNLG

tanmaybinaykiya / CS231N-CNN-Solutions

phiyodr / plxmert

alsudais / ImageNet_to_AWN

clp-research / cost-sharing-reference-game

shufangxun / MAC

nicholasnouri / ai-resources

JHKim-snu / PGA

Heidelberg-NLP / counting-probe

vyskocj / VinVL-L

michelecafagna26 / HL-dataset

ellenzhuwang / implicitOOD

michelecafagna26 / vl-shap

Improve this page

Add this topic to your repo