The good practice in the VQA system such as pos-tag attention, structed triplet learning and triplet attention is very general and can be inserted into almost any visual and language task
-
Updated
Jan 23, 2018 - Python
The good practice in the VQA system such as pos-tag attention, structed triplet learning and triplet attention is very general and can be inserted into almost any visual and language task
My solutions to CS231N CNN assignments
semantic parser trained by using videos only instead of labeled logical forms
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
Code for 'Chasing Ghosts: Instruction Following as Bayesian State Tracking' published at NeurIPS 2019
Repository to generate CLEVR-Dialog: A diagnostic dataset for Visual Dialog
Visual Storytelling with Cross-Modal Rules
Dataset API for "PhraseCut: Language-based Image Segmentation in the Wild"
VisionDetect let you track user face gestures like blink, smile etc.
[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)
PyTorch implementation of the paper: All For One: Multi-modal Multi-Task Learning
PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision
Tensorflow Implementation on Paper [CVPR2020]Image Search with Text Feedback by Visiolinguistic Attention Learning
Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": LXMERT adversarial training part
Code for ACMMM'20 ✨"Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue"
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part
Code for the ACL paper "No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling"
CIZSL++: Creativity Inspired Generative Zero-Shot Learning. T-PAMI under review.
Understanding Synonymous Referring Expressions via Contrastive Features
Add a description, image, and links to the vision-and-language topic page so that developers can more easily learn about it.
To associate your repository with the vision-and-language topic, visit your repo's landing page and select "manage topics."