An open source implementation of CLIP.
-
Updated
May 11, 2024 - Jupyter Notebook
An open source implementation of CLIP.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
A concise but complete implementation of CLIP with various experimental improvements from recent papers
[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
Build high-performance AI models with modular building blocks
[CVPR 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
Pytorch version of the HyperDenseNet deep neural network for multi-modal image segmentation
A python tool to perform deep learning experiments on multimodal remote sensing data.
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
Code repository for Rakuten Data Challenge: Multimodal Product Classification and Retrieval.
SAM-SLR-v2 is an improved version of SAM-SLR for sign language recognition.
Add a description, image, and links to the multi-modal-learning topic page so that developers can more easily learn about it.
To associate your repository with the multi-modal-learning topic, visit your repo's landing page and select "manage topics."