InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4

Lai Wei , Zihao Jiang, Weiran Huang, Lichao Sun.

Shanghai Jiao Tong University & Lehigh University

Introduction

We introduce InstructionGPT-4, which is fine-tuned on a small dataset comprising only 200 examples, amounting to approximately 6% of the instruction-following data used in the alignment dataset for MiniGPT-4.
We first propose several metrics to access the quality of multimodal instruction data. Based on these metrics, we present an effective and trainable data selector to automatically identify and filter low-quality vision-language data. By employing this method, InstructionGPT-4 outperforms the original MiniGPT-4 on various evaluations.
Our findings demonstrate that less but high-quality instruction tuning data is efficient to enable multimodal large language models to generate better output.

Getting Started

Prepare MiniGPT-4 Environment and Dataset

git clone https://github.com/Vision-CAIR/MiniGPT-4.git

Follow this doc to prepare the environment and download the cc_sbu_align dataset.

You also need to replace '/path/to' with your own path.

Compute Indicators

1. CLIP Score, Reward Score, Length Score, GPT Score

Fill in your openai key here.

Generate GPT Score:

cd cc_sbu_align_test
python generate_gpt_score.py
python convert_score.py

Generate CLIP Score, Reward Score and Length Score:

python full_score.py

2. Multimodal Features

cd selector
python utils_image.py
python utils_text.py
python get_all_image_features.py
python get_all_text_features.py

Compute Genuine Quality Labels

1. Split into 30 subsets

cd cluster/kmeans++
python kmeans_pp.py
python average.py
python image_show.py
python split.py
python image2cap.py

2. Evaluate to get the labels

Use each subset to fine-tune a new MiniGPT-4 model and follow this doc for evaluation. We choose GQA, IconQA, OKVQA and ScienceQA as the validation sets.

Data Selector

1. Training

cd selector
bash train.sh

2. Testing

Conduct clustering to ensure the diversity.

cd cluster/spectral
python spectral_clustering.py
python image_show.py

Select the final dataset for InstructionGPT-4

cd selector
bash eval_clus.sh

200 multimodal instruction data are selected here.

InstructionGPT-4 is fine-tuned from these 200 selected samples based on pre-trained MiniGPT-4.

Acknowledgement

MiniGPT-4 The model architecture of InstructionGPT-4 follows MiniGPT-4.
LIMA Our inspiration comes from LIMA: Less Is More for Alignment.
Alpagasus We follow Alpagasus to design the GPT Score.

If you're using InstructionGPT-4 in your research or applications, please cite using this BibTeX:

@article{wei2023instructiongpt,
  title={InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4},
  author={Wei, Lai and Jiang, Zihao and Huang, Weiran and Sun, Lichao},
  journal={arXiv preprint arXiv:2308.12067},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cc_sbu_align_test		cc_sbu_align_test
cluster		cluster
images		images
selector		selector
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cc_sbu_align_test

cc_sbu_align_test

cluster

cluster

images

images

selector

selector

.gitattributes

.gitattributes

LICENSE

LICENSE

README.md

README.md

Repository files navigation

InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4

Introduction

Getting Started

Prepare MiniGPT-4 Environment and Dataset

Compute Indicators

1. CLIP Score, Reward Score, Length Score, GPT Score

2. Multimodal Features

Compute Genuine Quality Labels

1. Split into 30 subsets

2. Evaluate to get the labels

Data Selector

1. Training

2. Testing

Acknowledgement

About

Releases

Languages

License

MIFA-Lab/InstructionGPT-4

Folders and files

Latest commit

History

Repository files navigation

InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4

Introduction

Getting Started

Prepare MiniGPT-4 Environment and Dataset

Compute Indicators

1. CLIP Score, Reward Score, Length Score, GPT Score

2. Multimodal Features

Compute Genuine Quality Labels

1. Split into 30 subsets

2. Evaluate to get the labels

Data Selector

1. Training

2. Testing

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Languages