Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplifying evaluation process #52

Open
xhluca opened this issue Jul 11, 2022 · 1 comment
Open

Simplifying evaluation process #52

xhluca opened this issue Jul 11, 2022 · 1 comment

Comments

@xhluca
Copy link

xhluca commented Jul 11, 2022

Right now, it's possible to train DPR in a single command, via the tevatron.driver.train module. However, to evaluate, a more complex series of command (involving lower-level for loops) is needed, e.g. for DPR on NQ:

mkdir $ENCODE_DIR
for s in $(seq -f "%02g" 0 19)
do
python -m tevatron.driver.encode \
  --output_dir=temp \
  --model_name_or_path model_nq \
  --fp16 \
  --per_device_eval_batch_size 156 \
  --dataset_name Tevatron/wikipedia-nq-corpus \
  --encoded_save_path corpus_emb.$s.pkl \
  --encode_num_shard 20 \
  --encode_shard_index $s
done

python -m tevatron.driver.encode \
  --output_dir=temp \
  --model_name_or_path model_nq \
  --fp16 \
  --per_device_eval_batch_size 156 \
  --dataset_name Tevatron/wikipedia-nq/test \
  --encoded_save_path query_emb.pkl \
  --encode_is_qry

python -m tevatron.faiss_retriever \
--query_reps query_emb.pkl \
--passage_reps 'corpus_emb.*.pkl' \
--depth 100 \
--batch_size -1 \
--save_text \
--save_ranking_to run.nq.test.txt

python -m tevatron.utils.format.convert_result_to_trec \
              --input run.nq.test.txt \
              --output run.nq.test.trec

pip install pyserini

python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
              --topics dpr-nq-test \
              --index wikipedia-dpr \
              --input run.nq.test.trec \
              --output run.nq.test.json

python -m pyserini.eval.evaluate_dpr_retrieval \
                --retrieval run.nq.test.json \
                --topk 20 100

I think it would be nicer if all this could be reduce to 1 or 2 commands:

pip install pyserini

python -m tevatron.driver.evaluate \
    --output_dir "temp" \
    --model_name_or_path "model_nq" \
    ...
    --query_dataset "Tevatron/wikipedia-nq/" \
    --passage_dataset "Tevatron/wikipedia-nq/test" \
    --save_ranking_to "nq_results/test/" \
    --encode_method "faiss" \
    --save_format "trec" "pyserini_dpr"  # save in both .trec and .json

python -m pyserini.eval.evaluate_dpr_retrieval \
                --retrieval "nq_results/test/run.json" \
                --topk 20 100

Note the usage of tevatron.driver.evaluate in order to keep driver.encode at a lower level and backward compatible, while evaluate would be for higher-level usage like reproducing results. Moreover, tevatron.driver.evaluate could throw an error if pyserini is not available, e.g.:

ImportError: could not import pyserini, a library needed to save as format "pyserini_dpr". Please install with `pip install pyserini`
@MXueguang
Copy link
Contributor

Hi @xhluca,
Thanks for the suggestion.
I guess here one reason we keep the encoding process separately is to keep it flexible wrt tasks (e.g. NQ/MSMARCO) and GPU/RAM resources.
I agree that the evaluation process of dpr can be simpler, maybe we can have a simpler dpr evaluation in pyserini.
I'll take a look.

Xueguang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants