TF4ces Search Engine

An experiment driven Search Engine Project, developed to index and retrieve best documents given a query using ensemble of models.

Architecture Diagram

System Design : Search Engine

System Design : Ensemble Model

Retrieval Models

Filter Models
- BM25
- TF-IDF
Voter Models
- MPNET
- RoBERTa

Project Plan

Future works

Finetune ColBERT
Implement Clustering of docs

How to run Project

Note : The project was tested on linux and MacOS. (Windows has dependency issues, refer Troubleshooting)

Clone repository

$ git clone https://github.com/TF4ces/TF4ces-search-engine.git

Setup Environment repository

$ python3 -m venv venv
$ source venv/bin/activate                [LINUX/MAC]
$ .\venv\Scripts\activate                 [WINDOWS]
$ pip install -r src/requirements.txt

Download pre-loaded embeddings to this path: ./dataset/embeddings_test from GDrive

Note: To generate embeddings from scratch run./tests/test_evaluate_model.py script setting MODEL to all-mpnet-base-v2, all-roberta-large-v1 individually twice.

WARNING: use a GPU machine and it is expected to take 1hr to generate.
Run TF4ces Search Engine [install jupyter by $pip install jupyter notebook and to run $jupyter notebook]
1. Run Eval Pipeline from ./tests/notebooks/TF4ces_Search_Eval.ipynb ipynb notebook.
2. Run prediction Demo Pipeline from ./tests/notebooks/TF4ces_Search_Demo.ipynb ipynb notebook.

Troubleshooting :

Windows Systems are seen to have issue while reading data with ir-datasets==0.4.1

For windows the doc.iter might throw decoding error while reading tsv file, You would need to change the encoding in source files of dependency as per this issue.

Issue : allenai/ir_datasets#208 (comment)

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
config		config
dataset/dummy.v1		dataset/dummy.v1
src		src
static/images		static/images
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

dataset/dummy.v1

dataset/dummy.v1

src

src

static/images

static/images

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

TF4ces Search Engine

Architecture Diagram

System Design : Search Engine

System Design : Ensemble Model

Retrieval Models

Project Plan

Future works

How to run Project

Troubleshooting :

About

Releases

Packages

Contributors 4

Languages

License

TF4ces/TF4ces-search-engine

Folders and files

Latest commit

History

Repository files navigation

TF4ces Search Engine

Architecture Diagram

System Design : Search Engine

System Design : Ensemble Model

Retrieval Models

Project Plan

Future works

How to run Project

Troubleshooting :

About

Topics

Resources

License

Stars

Watchers

Forks

Languages