Skip to content

Latest commit

 

History

History
126 lines (91 loc) · 6.25 KB

File metadata and controls

126 lines (91 loc) · 6.25 KB

Contents

This repository contains the implementation and results of Fusion-in-Decoder (FiD) algorithm for the task of open-domain question answering.

We demonstrate that performing inference using UPR re-ranked passages and a pre-trained FiD checkpoint leads to an improved answer generation performance.

Downloading Data and Checkpoints

We've provided pretrained checkpoints and datasets on Dropbox for use to train models for dense retrieval and open-domain QA tasks. This data can be downloaded here:

Required data files to be downloaded

The pre-tokenized evidence file(s) can be obtained with this command.

python tools/create_evidence_indexed_dataset.py --input wikipedia-split/psgs_w100.tsv --tsv-keys text title --tokenizer-type BertWordPieceLowerCase --vocab-file bert-vocab/bert-large-uncased-vocab.txt --output-prefix wikipedia-evidence --workers 25

T5 checkpoints for training FiD

Finetuned FiD checkpoints on individual datasets

  • Please download the FiD models for each dataset using their URLs/links provided in the tables below.

Dataset-specific files

  • Train, dev, and test datasets along with retrieved passages and UPR re-ranked passages can be downloaded as described in the README.md of the landing page.

Usage

We have provided a demo script for training an FiD model for open-domain QA tasks in examples directory.

Please ensure to change the data, config, and checkpoint paths in this scripts.

To train or do inference using a pre-trained model, please see the options and run the script as

bash examples/fid_common.sh

The default settings in this script are useful for doing inference with pre-trained FiD checkpoint(s).

To train FiD models, please set the paths of the VALID_DATA and TEST_DATA in the above script accordingly.

Pre-trained FiD Checkpoints

SQuAD-Open

Retriever Reader Config Dev EM Test EM Checkpoint
MSS base 36.2 39.6 link
MSS + UPR base 43.7 50.1
Retriever Reader Config Dev EM Test EM Checkpoint
DPR base 48.8 45.8 link
DPR + UPR base 51.5 54.0
Retriever Reader Config Dev EM Test EM Checkpoint
MSS-DPR base 50.1 52.2 link
MSS-DPR + UPR base 51.9 55.6
Retriever Reader Config Dev EM Test EM Checkpoint
MSS-DPR large 51.9 54.4 link
MSS-DPR + UPR large 53.1 58.1

TriviaQA

Retriever Reader Config Dev EM Test EM Checkpoint
MSS base 60.9 60.3 link
MSS + UPR base 68.5 68.9
Retriever Reader Config Dev EM Test EM Checkpoint
DPR base 67.9 68.5 link
DPR + UPR base 70.1 71.2
Retriever Reader Config Dev EM Test EM Checkpoint
MSS-DPR base 69.9 70.2 link
MSS-DPR + UPR base 71.5 71.8
Retriever Reader Config Dev EM Test EM Checkpoint
MSS-DPR large 71.5 71.6 link
MSS-DPR + UPR large 72.7 73.2

Natural Questions

Retriever Reader Config Dev EM Test EM Checkpoint
MSS base 43.7 44.5 link
MSS + UPR base 45.8 47.3
Retriever Reader Config Dev EM Test EM Checkpoint
DPR base 49.4 50.8 link
DPR + UPR base 49.8 51.3
Retriever Reader Config Dev EM Test EM Checkpoint
MSS-DPR base 49.7 50.8 link
MSS-DPR + UPR base 49.9 51.5
Retriever Reader Config Dev EM Test EM Checkpoint
MSS-DPR large 51.8 53.6 link
MSS-DPR + UPR large 51.5 54.5