IR-SMART (TEAM-002)
Information Retrieval - Semantic Answer Type Prediction

About The Project

IR-SMART contain the generated code for a university project located here.

Given a query formatted in natural language, the code should be able to predict the expected answer type from a set of candidate entitites from collected target ontology. In this project the target ontology used is from the DBpedia 2016 dump.

Built With

The project has utilized the following tools and libraries extensively:

Getting Started

To get a local copy up and running follow these simple steps. It is assumed that the user has jupyter notebook available, and it is recommended to use a Conda distribution(Anaconda/Miniconda).

Prerequisites

Install the necessary python libraries(if conda is not used):

pip install --upgrade elasticsearch gensim numpy scipy scikit-learn

Other dependencies might exist, but they have been installed through conda-distribution

Dataset

Due to overall size of dataset this has be downloaded separately:

File structure

Once all the files has been downloaded, extract them and place them in such a way that the directory structure is as follows(the files highlighted with ## are the files you need to download&place yourself):

📦IR-SMART
 ┣ 📂datasets
 ┃ ┣ 📂DBpedia
 ┃ ┃ ┣ 📜instance_types_en.ttl ##
 ┃ ┃ ┣ 📜long_abstracts_en.ttl ##
 ┃ ┃ ┣ 📜smarttask_dbpedia_test_questions.json ##
 ┃ ┃ ┗ 📜smarttask_dbpedia_train.json ##
 ┃ ┣ 📂gensim
 ┃ ┃ ┗ 📜...
 ┃ ┗ 📂glove
 ┃   ┣ 📜glove.6B.100d.txt ##
 ┃   ┣ 📜glove.6B.200d.txt ##
 ┃   ┣ 📜glove.6B.300d.txt ##
 ┃   ┗ 📜glove.6B.50d.txt  ##
 ┣ 📂results
 ┃ ┣ 📜advanced.csv
 ┃ ┣ 📜advanced_word2vec.csv
 ┃ ┣ 📜baseline.csv
 ┃ ┗ 📜test_type_predictions.csv
 ┣ 📜.gitignore
 ┣ 📜baseline_variable_test.ipynb
 ┣ 📜evaluation.ipynb
 ┣ 📜indexer.ipynb
 ┣ 📜indexer_compact.ipynb
 ┣ 📜LICENSE
 ┣ 📜README.md
 ┗ 📜trial_and_error.ipynb

The necessary code to execute is located in indexer_compact.ipynb and evaluation.ipynb

The other ipynb-files, contain an alternative larger index(indexer.ipynb), tests to see how varying parameter values affected the score(baseline_variable_test). trial_and_error contain a failed early attempt to make the ES-indexing more effective by first loading all datafiles into memory and then initializing ES-indexing(not recommended to run)

Final Steps

Execute all cells within indexer_compact.ipynb, this will generate the ElasticSearch index necessary for all consecutive steps.
- PS: Ensure that Elasticsearch is running either as a systemd-process(linux), or that the bat-file is running(Windows)
- PS: You will have to uncomment the functioncall createTheIndex(), in cell 5 to generate the index, and indexData(10000) mear the bottom of the file.
Execute all cells within evaluation.ipynb, this will perform the evaluation using both the baseline and advanced implementation.
- PS: Uncomment the convertGlovetoGensim() function call in cell 5, this is necessary to allow GenSim to parse the GloVe embedding-file.

Result

The achieved accuracy scores has been summarized in the table below:

Method	Accuracy	NDCG@5	NDCG@10
Strict Baseline	0.492	0.237	0.323
Lenient Baseline	0.492	0.312	0.414
Strict Word2Vec	0.522	0.280	0.367
Lenient Word2Vec	0.522	0.364	0.455
Strict LTR(pointwise)	0.776	0.731	0.754
Lenient LTR(pointwise	0.776	0.753	0.780

Contributors

Contributors

License

Distributed under the GPL-3.0 License. See LICENSE for more information.

Contact

Bernt Andreas Eide

e-mail: [email protected]
GitHub: @BerntA

Stian Seglem Bjåland

e-mail: [email protected]
GitHub: @Chrystallic

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
datasets		datasets
evaluation/dbpedia		evaluation/dbpedia
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
baseline_variable_test.ipynb		baseline_variable_test.ipynb
evaluation.ipynb		evaluation.ipynb
indexer.ipynb		indexer.ipynb
indexer_compact.ipynb		indexer_compact.ipynb
plots_graphs_etc.ipynb		plots_graphs_etc.ipynb
trial_and_error.ipynb		trial_and_error.ipynb

License

BerntA/IR-SMART

Folders and files

Latest commit

History

Repository files navigation

IR-SMART (TEAM-002) Information Retrieval - Semantic Answer Type Prediction

Table of Contents

About The Project

Built With

Getting Started

Prerequisites

Dataset

File structure

Final Steps

Result

Contributors

License

Contact

Bernt Andreas Eide

Stian Seglem Bjåland

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

IR-SMART (TEAM-002)
Information Retrieval - Semantic Answer Type Prediction