Skip to content

CLUE Classification

zhezhaoa edited this page Aug 15, 2023 · 6 revisions

Here is a short summary of our solution on CLUE classification benchmark. We submitted two results, single model and model ensemble to the benchmark. The results of single model is based on the cluecorpussmall_roberta_wwm_large_seq512_model.bin pre-trained weights. The results of model ensemble is based on the ensemble of a large number of models. This section mainly focuses on single model. More details of ensemble are discussed in here.

AFQMC

We firstly do multi-task learning. We select LCQMC and XNLI as auxiliary tasks:

python3 finetune/run_classifier_mt.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
                                      --vocab_path models/google_zh_vocab.txt \
                                      --config_path models/bert/large_config.json \
                                      --dataset_path_list datasets/afqmc/ datasets/lcqmc/ datasets/xnli/ \
                                      --output_model_path models/afqmc_multitask_classifier_model.bin \
                                      --epochs_num 1 --batch_size 64

Then we load afqmc_multitask_classifier_model.bin and fine-tune it on AFQMC:

python3 finetune/run_classifier.py --pretrained_model_path models/afqmc_multitask_classifier_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/bert/large_config.json \
                                   --train_path datasets/afqmc/train.tsv \
                                   --dev_path datasets/afqmc/dev.tsv \
                                   --output_model_path models/afqmc_classifier_model.bin \
                                   --epochs_num 3 --batch_size 32

Then we do inference with afqmc_classifier_model.bin:

python3 inference/run_classifier_infer.py --load_model_path models/afqmc_classifier_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/bert/large_config.json \
                                          --test_path datasets/afqmc/test_nolabel.tsv \
                                          --prediction_path datasets/afqmc/prediction.tsv \
                                          --seq_length 128 --labels_num 2

CMNLI

We firstly do multi-task learning. We select XNLI as auxiliary task:

python3 finetune/run_classifier_mt.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
                                      --vocab_path models/google_zh_vocab.txt \
                                      --config_path models/bert/large_config.json \
                                      --dataset_path_list datasets/cmnli/ datasets/xnli/ \
                                      --output_model_path models/cmnli_multitask_classifier_model.bin \
                                      --epochs_num 1 --batch_size 64

Then we load cmnli_multitask_classifier_model.bin and fine-tune it on CMNLI:

python3 finetune/run_classifier.py --pretrained_model_path models/cmnli_multitask_classifier_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/bert/large_config.json \
                                   --train_path datasets/cmnli/train.tsv \
                                   --dev_path datasets/cmnli/dev.tsv \
                                   --output_model_path models/cmnli_classifier_model.bin \
                                   --epochs_num 1 --batch_size 64

Then we do inference with cmnli_classifier_model.bin:

python3 inference/run_classifier_infer.py --load_model_path models/cmnli_classifier_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/bert/large_config.json \
                                          --test_path datasets/cmnli/test_nolabel.tsv \
                                          --prediction_path datasets/cmnli/prediction.tsv \
                                          --seq_length 128 --labels_num 3

IFLYTEK

The example of fine-tuning and doing inference on IFLYTEK dataset:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/bert/large_config.json \
                                   --train_path datasets/iflytek/train.tsv \
                                   --dev_path datasets/iflytek/dev.tsv \
                                   --output_model_path models/iflytek_classifier_model.bin \
                                   --epochs_num 3 --batch_size 32 --seq_length 256

python3 inference/run_classifier_infer.py --load_model_path models/iflytek_classifier_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/bert/large_config.json \
                                          --test_path datasets/iflytek/test_nolabel.tsv \
                                          --prediction_path datasets/iflytek/prediction.tsv \
                                          --seq_length 256 --labels_num 119

CSL

Chinese Scientific Literature task is to tell whether the given keywords are real keywords of a paper or not. The key of achieving competitive results on CSL is to use a special symbol to split keywords. We find that the pseudo keywords in CSL dataset are usually short. Special symbols can explicitly tell the model the length of keywords. The example of fine-tuning and doing inference on CSL dataset:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/bert/large_config.json \
                                   --train_path datasets/csl/train.tsv \
                                   --dev_path datasets/csl/dev.tsv \
                                   --output_model_path models/csl_classifier_model.bin \
                                   --epochs_num 3 --batch_size 32 --seq_length 384

python3 inference/run_classifier_infer.py --load_model_path models/csl_classifier_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/bert/large_config.json \
                                          --test_path datasets/csl/test_nolabel.tsv \
                                          --prediction_path datasets/csl/prediction.tsv \
                                          --seq_length 384 --labels_num 2

CLUEWSC2020

The example of fine-tuning and doing inference on CLUEWSC2020 dataset:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/bert/large_config.json \
                                   --train_path datasets/cluewsc2020/train.tsv \
                                   --dev_path datasets/cluewsc2020/dev.tsv \
                                   --output_model_path models/cluewsc2020_classifier_model.bin \
                                   --learning_rate 5e-6 --epochs_num 20 --batch_size 8

python3 inference/run_classifier_infer.py --load_model_path models/cluewsc2020_classifier_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/bert/large_config.json \
                                          --test_path datasets/cluewsc2020/test_nolabel.tsv \
                                          --prediction_path datasets/cluewsc2020/prediction.tsv \
                                          --seq_length 128 --labels_num 2

A useful trick for CLUEWSC2020 is to use the trainset of WSC (the former version of CLUEWSC2020) as training samples.

TNEWS

The example of fine-tuning and doing inference on TNEWS dataset:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/bert/large_config.json \
                                   --train_path datasets/tnews/train.tsv \
                                   --dev_path datasets/tnews/dev.tsv \
                                   --output_model_path models/tnews_classifier_model.bin \
                                   --epochs_num 3 --batch_size 32

python3 inference/run_classifier_infer.py --load_model_path models/tnews_classifier_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/bert/large_config.json \
                                          --test_path datasets/tnews/test_nolabel.tsv \
                                          --prediction_path datasets/tnews/prediction.tsv \
                                          --seq_length 128 --labels_num 15

OCNLI

We firstly do multi-task learning. We select XNLI and CMNLI as auxiliary tasks:

python3 finetune/run_classifier_mt.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
                                      --vocab_path models/google_zh_vocab.txt \
                                      --config_path models/bert/large_config.json \
                                      --dataset_path_list datasets/ocnli/ datasets/cmnli/ datasets/xnli/ \
                                      --output_model_path models/ocnli_multitask_classifier_model.bin \
                                      --epochs_num 1 --batch_size 64

Then we load ocnli_multitask_classifier_model.bin and fine-tune it on OCNLI:

python3 finetune/run_classifier.py --pretrained_model_path models/ocnli_multitask_classifier_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/bert/large_config.json \
                                   --train_path datasets/ocnli/train.tsv \
                                   --dev_path datasets/ocnli/dev.tsv \
                                   --output_model_path models/ocnli_classifier_model.bin \
                                   --epochs_num 1 --batch_size 64

Then we do inference with ocnli_classifier_model.bin:

python3 inference/run_classifier_infer.py --load_model_path models/ocnli_classifier_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/bert/large_config.json \
                                          --test_path datasets/ocnli/test_nolabel.tsv \
                                          --prediction_path datasets/ocnli/prediction.tsv \
                                          --seq_length 128 --labels_num 3
Clone this wiki locally