#

wikipedia-crawler

Here are 11 public repositories matching this topic...

Relex12 / Wikipedia-Translate-Crawler

A Wikipedia crawler that gives the worst translated page around an english starting using hypertext links

bash translation wikipedia web-crawler wikipedia-crawler

Updated Aug 21, 2022
Shell

mayankkumar2 / wikipedia-index-scraper

The program can map out the shortest path between 2 wikipedia pages.

wikipedia wikipedia-crawler wikipedia-scraper wikipedia-entries

Updated May 24, 2020
Go

adidottxt / wikipedia-crawler

python web crawler to test theory that repeatedly clicking on the first link on ~97% of wiki pages eventually leads to the wiki page for knowledge 📡

udacity beautifulsoup python-web-crawler wikipedia-crawler beautifulsoup4

Updated Feb 8, 2018
Python

WillCaton2350 / Wikipedia-WebCrawler

Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from multiple wikipedia web pages/links using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench.

mysql python json web-crawler scrapy-spider scrapy-crawler python-crawler wikipedia-crawler

Updated Sep 12, 2023
Python

ambirpatel / Wikipedia-crawler

Web scraping is data scraping technique used for extracting data from websites.

wikipedia-crawler webcrawling

Updated Apr 25, 2024
Jupyter Notebook

jamesponddotco / wikiextract

[READ-ONLY] A word extractor for Wikipedia articles.

go crawler wikipedia crawling diceware wikipedia-crawler word-extraction

Updated Apr 8, 2024
Go

Smile040501 / Search-Engine

A search engine that takes keyword queries as input and retrieves a ranked list of relevant results as output. It scraps a few thousand pages from one of the seed Wiki pages and uses Elasticsearch for a full-text search engine.

react search-engine elasticsearch information-retrieval material-ui web-scraping marvel elasticsearch-client scrapy-crawler nodejs-server wikipedia-crawler lavenshtein okapi-bm25 dirichlet-smoothing marvel-wiki

Updated Jan 13, 2023
JavaScript

TimurKasatkin / IR_system

Innopolis IR 2016 course semester project IR system part

cli search-engine crawler information-retrieval scala sbt vector-space-model tfidf ranked-fulltext-searches wikipedia-crawler

Updated Nov 30, 2016
Scala

nazaninsbr / Wikipedia-Crawler

a crawler for Wikipedia (for now only the English pages)

python crawler wikipedia python-crawler wikipedia-crawler

Updated Aug 7, 2018
Python

lehinevych / MediaWikiAPI

Python wrapper for the MediaWiki API to access and parse data from Wikipedia

wikipedia python3 wikipedia-api mediawiki-api wikipedia-crawler wikipedia-scraper wikipedia-sc

Updated May 21, 2024
Python

Intelligent_Document_Finder

Sarthakjain1206 / Intelligent_Document_Finder

Document Search Engine Tool

search-engine scrapy-spider indexer scrapy text-summarization search-algorithm webcrawler latent-dirichlet-allocation bm25 spellchecker document-similarity wikipedia-search wikipedia-crawler ranking-algorithm document-summarization reverse-index

Updated Dec 8, 2022
Python

Improve this page

Add a description, image, and links to the wikipedia-crawler topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the wikipedia-crawler topic, visit your repo's landing page and select "manage topics."