GitHub - JuiP/Information-Retrieval

Search Engine for Movies

Domain specific Information Retrieval System

Problem Statement:

The task is to build a search engine which will cater to the needs of a particular domain. You have to feed your IR model with documents containing information about the chosen domain. It will then process the data and build indexes. Once this is done, the user will give a query as an input. You are supposed to return top 10 relevant documents as the output.

About the project

Dataset used - Kaggle-movie-plots

Have a look at the file Design Architecture. It includes the concepts used along with the modified implementation of the TF-IDF ranking.

Project By:

Kriti Jethlia: Email- [email protected]
Jui Pradhan: Email- [email protected]
Anusha Agarwal: Email- [email protected]

How to run the code

Clone the repository : [email protected]:JuiP/Information-Retrieval.git
cd Information-Retrieval

Run files in the order:

       python3 preprocess.py
       python3 tfidf.py
       python3 server.py

In your browser go to http://0.0.0.0:3000/
Type your query in the search bar and wait till it returns the relevant documents :)

Dependencies/modules used

time
nltk
pandas
pickle
Numpy
heapq
flask
os

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Screenshots		Screenshots
static		static
templates		templates
.gitignore		.gitignore
Design Architecture.pdf		Design Architecture.pdf
README.md		README.md
documentation_		documentation_
inverted_index.obj		inverted_index.obj
inverted_index_title.obj		inverted_index_title.obj
movie_data.obj		movie_data.obj
movie_plot.obj		movie_plot.obj
preprocess.py		preprocess.py
processed_data.obj		processed_data.obj
processed_data_title.obj		processed_data_title.obj
query.py		query.py
server.py		server.py
tf-idf_title.obj		tf-idf_title.obj
tfidf.py		tfidf.py
wiki_movie_plots_deduped.csv		wiki_movie_plots_deduped.csv

JuiP/Information-Retrieval

Folders and files

Latest commit

History

Repository files navigation

Search Engine for Movies

How to run the code

Dependencies/modules used

About

Resources

Stars

Watchers

Forks

Languages