RAKE Java Search Engine

A Java 15 implementation of a miniature search engine using jSoup and the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons.

The RAKE Algorithm implementation is based on the python one from https://github.com/aneesha/RAKE

The HTMLParser scrapes through webpages and stores all the keywords in RAKE score sorted order.

The tester code here shows how the HTML parser can work.

  public void traversalTest() throws IOException {
  \\ The method has 3 configurable presets as can be seen in the HTML Parser: A URL keyphrase to 
  \\ ensure the search doesn't spiral out of control, a tree depth variable and a debug flag. 
  \\ The debug flag shows all the URLs that the parser failed to connect with
   
        String root = "https://css.csail.mit.edu/";
        HTMLParser parser = new HTMLParser(root, 2);

        System.out.println(parser.urlMap.size());
        for (String link: parser.urlMap.keySet())
            System.out.println("URL:" + link + "\n" + parser.urlMap.get(link));
    }

The SearchEngine is part of the McGill course COMP 250: Introduction to Computer Science's final project. Small modifications were made to ensure the script worked with the new parser.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
src		src
README.md		README.md
RunEngine.java		RunEngine.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

README.md

README.md

RunEngine.java

RunEngine.java

Repository files navigation

RAKE Java Search Engine

About

Releases

Packages

Languages

shirsho-12/RakeSearchEngineCOMP250

Folders and files

Latest commit

History

Repository files navigation

RAKE Java Search Engine

About

Topics

Resources

Stars

Watchers

Forks

Languages