Investigating the Impact of Multi-Task Learning strategies on Selective Prediction

Clone the repo

git clone https://github.com/Arutselvan/selective_prediction_mtl

Change the current directory to the cloned repository

cd selective_prediction_mtl

The datasets for both train and eval are present in the dataset folder.

To convert the dataset into QA format [Question, Context, Answer]:

python code/squad_converter.py

Both training files, the combined dataset (main.json) and evaluation samples will be generated and stored in the dataset folder.

To perform training (Heterogeneous Sampling) and evaluation on all datasets' eval files, run the below command (from the root directory of the repo)

python code/run_qa.py --model_name_or_path bert-base-cased --do_train --train_file main.json --validation_files "['snli_squad_eval.json', 'swag_squad_eval.json', 'csqa_squad_eval.json', 'anli_squad_eval.json', 'siqa_squad_eval.json']"  --max_seq_length 256 --output_dir ./output-hetero --overwrite_output_dir --num_train_epochs 5 --evaluation_strategy epoch --per_device_train_batch_size 16 --per_device_eval_batch_size 32

To perform training (Homogenous Sampling) and evaluation on all datasets' eval files, run the below command (from the root directory of the repo)

python code/run_qa.py --model_name_or_path bert-base-cased --do_train --train_file main.json --validation_files "['snli_squad_eval.json', 'swag_squad_eval.json', 'csqa_squad_eval.json', 'anli_squad_eval.json', 'siqa_squad_eval.json']"  --max_seq_length 256 --output_dir ./output-homo --overwrite_output_dir --num_train_epochs 5 --evaluation_strategy epoch --per_device_train_batch_size 16 --per_device_eval_batch_size 32 --sampling Homogenous

Note: The evaluation accuracy metrics won't output anything/will output zero because the the predictions format was changed for the purpose of selective prediction.

The predictions will be of this format:

"csqa_0": {
    "prediction": "fail to work",
    "maxProb": "0.9998784"
 }

To create files with exact match and max probability for both homogenous and heterogenous models' predictions, run

python code/evaluate.py

This will create another .json file of format:

{
    "expected_prediction": "Levin was very successful at running the store",
    "prediction": "Levin was very successful at running the store",
    "correct": true,
    "maxProb": "0.99999547"
}

Note: For the above command to execute correctly, all steps needs to followed exactly as the folder paths are hardcoded.

To view plots of selective prediction metrics for all datasets (for both homogenous and heterogenous sampling), run

python code/evaluate_sp.py

This will create plot files comparing various selective prediction metrics of the models for all datasets. The plots are shown one by one for each dataset. (Close the current plot to view the next) Note: The texts of the graph created might be cluttered on some devices. Make the graph full screen to view it clearly.

Example plot:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
code		code
dataset		dataset
final_plots		final_plots
plots		plots
.gitignore		.gitignore
MTL_SP_report.pdf		MTL_SP_report.pdf
README.md		README.md
anli_eval_prob_n_preds_hetero.json		anli_eval_prob_n_preds_hetero.json
anli_eval_prob_n_preds_homo.json		anli_eval_prob_n_preds_homo.json
csqa_eval_prob_n_preds_hetero.json		csqa_eval_prob_n_preds_hetero.json
csqa_eval_prob_n_preds_homo.json		csqa_eval_prob_n_preds_homo.json
siqa_eval_prob_n_preds_hetero.json		siqa_eval_prob_n_preds_hetero.json
siqa_eval_prob_n_preds_homo.json		siqa_eval_prob_n_preds_homo.json
snli_eval_prob_n_preds_hetero.json		snli_eval_prob_n_preds_hetero.json
snli_eval_prob_n_preds_homo.json		snli_eval_prob_n_preds_homo.json
swag_eval_prob_n_preds_hetero.json		swag_eval_prob_n_preds_hetero.json
swag_eval_prob_n_preds_homo.json		swag_eval_prob_n_preds_homo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Investigating the Impact of Multi-Task Learning strategies on Selective Prediction

About

Releases

Packages

Languages

Arutselvan/selective_prediction_mtl

Folders and files

Latest commit

History

Repository files navigation

Investigating the Impact of Multi-Task Learning strategies on Selective Prediction

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages