Wikipedia Crawler

this crawler starts from the homepage and crawls all the links, saving the result in a rethinkdb database it then counts the number of word repeats.

first run the database and then run the code

rethinkdb
python main.py --db 
python main.py --website

--db uses the database to count the number of repeats and --website first crawls and writes to the database and calculates the word count.

you need to have rethinkdb installed, you can do so using:

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
databaseStuff.py		databaseStuff.py
general.py		general.py
main.py		main.py
requirements.txt		requirements.txt
singlePageCrawler.py		singlePageCrawler.py

Provide feedback