Skip to content

Information Retrieval Engine, Information Retrieval 2016. University of Aveiro

License

Notifications You must be signed in to change notification settings

ruipoliveira/IR-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Information Retrieval Engine

Requirements

Execution

java -jar IR-maven.jar <path corpus folder > <path stoptword file> <Max memory>

--

Task 1

Modelling: classes and main methods definition. a) Keep in mind modularity and flexibility. b) Describe your classes, main methods, and data flow in the report.

Task 2

Implement a simple corpus reader, tokenizer, and Boolean indexer. a) Develop your own tokenizer from scratch. Integrate the Porter stemmer (http://snowball.tartarus.org/) and a stopword filter in your code. b) Index a small corpus (to be defined later) and submit a text file with the resulting index, following the scheme: term,document frequency,list of documents

Task 3

Implement an indexer based on the vector-space model, using the tf-idf weighting scheme and lnc.ltc strategy, as described in the slides. a) Write your index to disk so that the searcher module can efficiently load it. b) Index the corpus (to defined later on).

Task 4

Implement a ranked retrieval method. a) Load the index from disk.

Authors

Releases

No releases published

Packages

No packages published

Languages