MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning [NAACL 2024]

This is the PyTorch implementation of the paper MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning, the paper is available at https://arxiv.org/abs/2311.10774.

Contact

If you have any questions about this work, please email Fuxiao Liu [email protected].

Note

We introduce a large-scale MultiModal Chart Instruction (MMC-Instruction) dataset supporting diverse tasks and chart types. Leveraging this data.
We develop Multi-Modal Chart Assistant (MMCA), an LMM that achieves state-of-the-art performance on existing chart QA benchmarks.
We also propose a Multi-Modal Chart Benchmark (MMC-Benchmark), a comprehensive human-annotated benchmark with nine distinct tasks evaluating reasoning capabilities over charts. Extensive experiments on MMC-Benchmark reveal the limitations of existing LMMs on correctly interpreting charts, even for the most recent GPT-4V model.

News

[03/13]🔥 Our paper "MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning" is accepted to NAACL 2024.
[02/26]🔥 Our paper "HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models" is accpeted to CVPR 2024.
[01/15]🔥 Our paper Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning is accepted by ICLR 2024

MMC-Alignment Dataset

Scientific(Arxiv) Chart-Caption

#Images
gdown https://drive.google.com/file/d/1pDGOkE8AfVTCdy9-5yTltvGlYzgw0wyk
#Text
gdown https://drive.google.com/file/d/1l7Ft4xIl9fvSuxNQ-QPo0hLP7jNx6Mnt

Existing Datasets

We select the data with the chart summarization task from Source: Statist, PlotQA, VisText, ChartInfo, Unichart. Please see details in our paper.

#Images
gdown https://drive.google.com/file/d/1e1mx_nb5PWjPkuIsJkY8B4xSET9DOWTa
#Text
gdown https://drive.google.com/file/d/18SJ13V4qEt1ixOQPbRmEnZKQrjS5v14T

MMC-Instruction Dataset

Non-arxiv

#Part 1
#Images
gdown https://drive.google.com/file/d/1Y17wNYdBlPxhB5KKiux2BD8C2FlA5MC9
#Text
gdown https://drive.google.com/file/d/1tUtntLRgsBJ9v5NcdTMvVI32ruLHAyFe
#Part2
#Images
gdown https://drive.google.com/uc?id=1Dey-undzW2Nl21CYLFSkP_Y4RrfRJkYd
#Text
gdown https://drive.google.com/uc?id=13j2U-ectsYGR92r6J5hPdhT8T5ezItHF
#Part3
#Images
gdown https://drive.google.com/file/d/1W8sQ6fLkOm8bZo_SrRIiIGELS90aNKq7
#Text
gdown https://drive.google.com/file/d/1o3Edkf6bdyZloe_FSth8wtNmkFf_Bda_

arxiv

#Images
gdown https://drive.google.com/file/d/1QKQOMIH9Wd3EQEYr3IV2u9qQXmSZy49G
#Text
gdown https://drive.google.com/file/d/1PI1EdgPk-gBi29adk25cjGtwskyBjcN9

MMC-Benchmark

Images

gdown https://drive.google.com/file/d/19CA-AFKshOVEOabK3-pkkZfjbtaJFP7Y

Questions and Answers

gdown https://drive.google.com/file/d/1HOVhPuFJ0roaHt-6AFyYX2E5MxKjoFug

MMCA Gradio demo

1. Install the environment according to mplug-owl.

We finetuned mplug-owl on 8 V100. If you meet any questions when implement on V100, feel free to let me know!

2. Download the Checkpoint

3. Edit the Code

As for the mplug-owl/serve/model_worker.py, edit the following code and enter the path of the lora model weight in lora_path.

self.image_processor = MplugOwlImageProcessor.from_pretrained(base_model)
self.tokenizer = AutoTokenizer.from_pretrained(base_model)
self.processor = MplugOwlProcessor(self.image_processor, self.tokenizer)
self.model = MplugOwlForConditionalGeneration.from_pretrained(
     base_model,
     load_in_8bit=load_in_8bit,
     torch_dtype=torch.bfloat16 if bf16 else torch.half,
     device_map="auto"
 )
self.tokenizer = self.processor.tokenizer

        
peft_config = LoraConfig(target_modules=r'.*language_model.*\.(q_proj|v_proj)', inference_mode=False, r=8,lora_alpha=32, lora_dropout=0.05)
self.model = get_peft_model(self.model, peft_config)
lora_path = 'Your lora model path'
prefix_state_dict = torch.load(lora_path, map_location='cpu')
self.model.load_state_dict(prefix_state_dict)

4. Local Demo

When you launch the demo in local machine, you might find there is no space for the text input. This is because of the version conflict between python and gradio. The simplest solution is to do conda activate LRV

python -m serve.web_server --base-model 'the mplug-owl checkpoint directory' --bf16

Citation

@article{liu2023mmc,
  title={MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning},
  author={Liu, Fuxiao and Wang, Xiaoyang and Yao, Wenlin and Chen, Jianshu and Song, Kaiqiang and Cho, Sangwoo and Yacoob, Yaser and Yu, Dong},
  journal={arXiv preprint arXiv:2311.10774},
  year={2023}
}

Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
Eval		Eval
MMC-Benchmark		MMC-Benchmark
images		images
README.md		README.md
dataset1.jpg		dataset1.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval

Eval

MMC-Benchmark

MMC-Benchmark

images

images

README.md

README.md

dataset1.jpg

dataset1.jpg

Repository files navigation

MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning [NAACL 2024]

Contact

Note

News

MMC-Alignment Dataset

Scientific(Arxiv) Chart-Caption

Existing Datasets

MMC-Instruction Dataset

Non-arxiv

arxiv

MMC-Benchmark

Images

Questions and Answers

MMCA Gradio demo

Citation

Disclaimer

About

Releases

Packages

Languages

FuxiaoLiu/MMC

Folders and files

Latest commit

History

Repository files navigation

MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning [NAACL 2024]

Contact

Note

News

MMC-Alignment Dataset

Scientific(Arxiv) Chart-Caption

Existing Datasets

MMC-Instruction Dataset

Non-arxiv

arxiv

MMC-Benchmark

Images

Questions and Answers

MMCA Gradio demo

Citation

Disclaimer

About

Topics

Resources

Stars

Watchers

Forks

Languages