wikipedia-crawler

Here are 11 public repositories matching this topic...

Sarthakjain1206 / Intelligent_Document_Finder

Document Search Engine Tool

search-engine scrapy-spider indexer scrapy text-summarization search-algorithm webcrawler latent-dirichlet-allocation bm25 spellchecker document-similarity wikipedia-search wikipedia-crawler ranking-algorithm document-summarization reverse-index

Updated Dec 8, 2022
Python

lehinevych / MediaWikiAPI

Star

Python wrapper for the MediaWiki API to access and parse data from Wikipedia

wikipedia python3 wikipedia-api mediawiki-api wikipedia-crawler wikipedia-scraper wikipedia-sc

Updated May 6, 2024
Python

nazaninsbr / Wikipedia-Crawler

Star

a crawler for Wikipedia (for now only the English pages)

python crawler wikipedia python-crawler wikipedia-crawler

Updated Aug 7, 2018
Python

A search engine that takes keyword queries as input and retrieves a ranked list of relevant results as output. It scraps a few thousand pages from one of the seed Wiki pages and uses Elasticsearch for a full-text search engine.

react search-engine elasticsearch information-retrieval material-ui web-scraping marvel elasticsearch-client scrapy-crawler nodejs-server wikipedia-crawler lavenshtein okapi-bm25 dirichlet-smoothing marvel-wiki

Updated Jan 13, 2023
JavaScript

TimurKasatkin / IR_system

Star

Innopolis IR 2016 course semester project IR system part

cli search-engine crawler information-retrieval scala sbt vector-space-model tfidf ranked-fulltext-searches wikipedia-crawler

Updated Nov 30, 2016
Scala

Relex12 / Wikipedia-Translate-Crawler

Star

A Wikipedia crawler that gives the worst translated page around an english starting using hypertext links

bash translation wikipedia web-crawler wikipedia-crawler

Updated Aug 21, 2022
Shell

WillCaton2350 / Wikipedia-WebCrawler

Star

Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from multiple wikipedia web pages/links using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench.

mysql python json web-crawler scrapy-spider scrapy-crawler python-crawler wikipedia-crawler