GitHub - oaqa/FlexNeuART: Flexible classic and NeurAl Retrieval Toolkit

FlexNeuART (flex-noo-art)

Flexible classic and NeurAl Retrieval Toolkit, or shortly FlexNeuART (intended pronunciation flex-noo-art) is a substantially reworked knn4qa package. The overview can be found in our EMNLP OSS workshop paper: Flexible retrieval with NMSLIB and FlexNeuART, 2020. Leonid Boytsov, Eric Nyberg.

In Aug-Dec 2020, we used this framework to generate best traditional and/or neural runs in the MSMARCO Document ranking task. In fact, our best traditional (non-neural) run slightly outperformed a couple of neural submissions. Please, see our write-up for details: Boytsov, Leonid. "Traditional IR rivals neural models on the MS MARCO Document Ranking Leaderboard." arXiv preprint arXiv:2012.08020 (2020).

In 2021, after being outsripped by a number of participants, we again advanced to a good position with a help of newly implemented models for ranking long documents. Please, see our write-up for details: Boytsov, L., Lin, T., Gao, F., Zhao, Y., Huang, J., & Nyberg, E. (2022). Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding. At the moment of writing (October 2022), we have competitive submissions on both MS MARCO leaderboards.

Regretffully, for adminstrative and licensing/patenting issues (there is a patent submitted), neural Model 1 code cannot be released. This model (together with its non-contextualized variant) is described and evaluated in our ECIR 2021 paper: Boytsov, Leonid, and Zico Kolter. "Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency Benefits." ECIR 2021.

In terms of pure effectiveness on long documents, other models (CEDR & PARADE) seem to be perform equally well (or somewhat better). They are available in our codebase. We are not aware of the patents inhibiting the use of the traditional (non-neural) Model 1.

Objectives

Develop & maintain a (relatively) light-weight modular middleware useful primarily for:

Research
Education
Evaluation & leaderboarding

Main features

Dense, sparse, or dense-sparse retrieval using Lucene and NMSLIB (dense embeddings can be created using any Sentence BERT model).
Multi-field multi-level forward indices (+parent-child field relations) that can store parsed and "raw" text input as well as sparse and dense vectors.
Forward indices can be created in append-only mode, which requires much less RAM.
Pluggable generic rankers (via a server)
SOTA neural (PARADE, BERT FirstP/MaxP/Sum, Longformer, COLBERT (re-ranking), dot-product Senence BERT models) and non-neural models (multi-field BM25, IBM Model 1).
Multi-GPU training and inference with out-of-the box support for ensembling
Basic experimentation framework (+LETOR)
Python API to use retrievers and rankers as well as to access indexed data.

Documentation

We support a number of neural BERT-based ranking models as well as strong traditional ranking models including IBM Model 1 (description of non-neural rankers to follow).

The framework supports data in generic JSONL format. We provide conversion (and in some cases download) scripts for the following collections:

Configurable dataset processing of standard datasets provided by ir-datasets.
MS MARCO v1 and v2 (documents and passages)
Wikipedia DPR (Natural Questions, Trivia QA, SQuAD)
Yahoo Answers
Cranfield (a small toy collection)

Acknowledgements

For neural network training FlexNeuART incorporates a substantially re-worked variant of CEDR (MacAvaney et al' 2019).

Name		Name	Last commit message	Last commit date
Latest commit History 2,231 Commits
demo		demo
flexneuart		flexneuart
java		java
legacy_docs		legacy_docs
scripts		scripts
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
LICENSE.RankLib		LICENSE.RankLib
README.md		README.md
build.sh		build.sh
build_clean.sh		build_clean.sh
build_main.sh		build_main.sh
flexneuart_install_extra.sh		flexneuart_install_extra.sh
install_extra_flexneuart_main.sh		install_extra_flexneuart_main.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

FlexNeuART (flex-noo-art)

Objectives

Main features

Documentation

Acknowledgements

About

Licenses found

Releases 4

Packages

Contributors 9

Languages

License

Licenses found

oaqa/FlexNeuART

Folders and files

Latest commit

History

Repository files navigation

FlexNeuART (flex-noo-art)

Objectives

Main features

Documentation

Acknowledgements

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 9

Languages

Packages