Skip to content
This repository has been archived by the owner on Aug 29, 2020. It is now read-only.

youtux/wikidump

Repository files navigation

wikidump

Framework for the extraction of features from Wikipedia XML dumps.

Installation

This project has been tested with Python 3.5.0, but should also work with Python 3.4.3.

You need to install dependencies first, as usual.

pip install -r requirements.txt

Usage

You need to download Wikipiedia dumps first:

./download.sh

Then run the extractor:

python -m wikidump FILE [FILE ...] OUTPUT_DIR

It will take some time... RAM will not suffer, I promise.

About

Framework for the extraction of features from Wikipedia XML dumps.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published