Official code for Paper "Mantis: Multi-Image Instruction Tuning"
-
Updated
May 24, 2024 - Python
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Personal Project: MPP-Qwen14B(Multimodal Pipeline Parallel-Qwen14B). Don't let the poverty limit your imagination! Train your own 14B LLaVA-like MLLM on RTX3090/4090 24GB.
Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai
Grounded Multimodal Large Language Model with Localized Visual Tokenization
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
Composition of Multimodal Language Models From Scratch
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Unified Multi-modal IAA Baseline and Benchmark
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.
To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."