LAVIS - A One-stop Library for Language-Vision Intelligence
-
Updated
May 19, 2024 - Jupyter Notebook
LAVIS - A One-stop Library for Language-Vision Intelligence
Compose multimodal datasets 🎹
[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.
Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields
Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
Official Git repository for "Hakimov, S., and Schlangen, D., (2023). Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks. Findings of the Association for Computational Linguistics (ACL 2023 Findings)"
Create a large, well-managed and clean data-set for the task of music composition for video soundtracks.
Collects a multimodal dataset of Wikipedia articles and their images
All experiments were done to classify multimodal data.
[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics
AI-multimodal : Modeling the new text - video retrieval framework
This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers". As a part of this release we share the information about recent multimodal datasets which are available for research purposes. We found that although 100+ multimodal language resources are available…
Image Recommendation for Wikipedia Articles
Pre-Processing of Annotated Music Video Corpora (COGNIMUSE and DEAP)
500,000 multimodal short video data and baseline models. 50万条多模态短视频数据集和基线模型(TensorFlow2.0)。
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
Add a description, image, and links to the multimodal-datasets topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-datasets topic, visit your repo's landing page and select "manage topics."