GitHub - X-PLUG/mPLUG-DocOwl: mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

The Powerful Multi-modal LLM Family
for OCR-free Document Understanding

Alibaba Group

📢 News

🔥🔥🔥 [2024.5.08] We have released the training code of DocOwl1.5 supported by DeepSpeed. You can now finetune a stronger model based on DocOwl1.5!
🔥🔥🔥 [2024.4.26] We release the arxiv paper of TinyChart, a SOTA 3B Multimodal LLM for Chart Understanding with Program-of-Throught ability (ChartQA: 83.6 > Gemin-Ultra 80.8 > GPT4V 78.5). The demo of TinyChart is available on HuggingFace 🤗. Both codes, models and data are released in TinyChart.
🔥🔥🔥 [2024.4.3] We build demos of DocOwl1.5 on both ModelScope and HuggingFace 🤗, supported by the DocOwl1.5-Omni. The source codes of launching a local demo are also released in DocOwl1.5.
🔥🔥 [2024.3.28] We release the training data (DocStruct4M, DocDownstream-1.0, DocReason25K), codes and models (DocOwl1.5-stage1, DocOwl1.5, DocOwl1.5-Chat, DocOwl1.5-Omni) of mPLUG-DocOwl 1.5 on both HuggingFace 🤗 and ModelScope .
🔥 [2024.3.20] We release the arxiv paper of mPLUG-DocOwl 1.5, a SOTA 8B Multimodal LLM on OCR-free Document Understanding (DocVQA 82.2, InfoVQA 50.7, ChartQA 70.2, TextVQA 68.6).
[2024.01.13] Our Scientific Diagram Analysis dataset M-Paper has been available on both HuggingFace 🤗 and ModelScope , containing 447k high-resolution diagram images and corresponding paragraph analysis.
[2023.10.13] Training data, models of mPLUG-DocOwl/UReader has been open-sourced.
[2023.10.10] Our paper UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model is accepted by EMNLP 2023.

[2023.07.10] The demo of mPLUG-DocOwl on ModelScope is avaliable.
[2023.07.07] We release the technical report and evaluation set of mPLUG-DocOwl.

🤖 Models

mPLUG-DocOwl1.5 (Arxiv 2024) - mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
TinyChart (Arxiv 2024) - TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
mPLUG-PaperOwl (Arxiv 2023) - mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
UReader (EMNLP 2023) - UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
mPLUG-DocOwl (Arxiv 2023) - mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

📺 Online Demo

Note: The demo of HuggingFace is not as stable as ModelScope because the GPU in ZeroGPU Spaces of HuggingFace is dynamically assigned.

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
DocOwl		DocOwl
DocOwl1.5		DocOwl1.5
PaperOwl		PaperOwl
TinyChart		TinyChart
UReader		UReader
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DocOwl

DocOwl

DocOwl1.5

DocOwl1.5

PaperOwl

PaperOwl

TinyChart

TinyChart

UReader

UReader

assets

assets

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

The Powerful Multi-modal LLM Family
for OCR-free Document Understanding

Alibaba Group

📢 News

🤖 Models

📺 Online Demo

📖 DocOwl 1.5

📈 TinyChart-3B

🌰 Cases

Related Projects

About

Releases

Packages

Contributors 3

Languages

License

X-PLUG/mPLUG-DocOwl

Folders and files

Latest commit

History

Repository files navigation

The Powerful Multi-modal LLM Family for OCR-free Document Understanding

Alibaba Group

📢 News

🤖 Models

📺 Online Demo

📖 DocOwl 1.5

📈 TinyChart-3B

🌰 Cases

Related Projects

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

The Powerful Multi-modal LLM Family
for OCR-free Document Understanding