A corpus builder for evaluation of plagiarism detection tools
-
Updated
Dec 12, 2016 - PHP
A corpus builder for evaluation of plagiarism detection tools
This is a text corpus management system for the german linguistic department of the university of Basel.
App and Scripts working with the corpus-builder CorpusCook, to have a corpus updated with corrected wrong predictions
Extract text from Vikidia/Wikipedia articles [fr]
Corpus Development Software for Machine Translation
Builds Wikipedia corpora in I5 (a TEI-based format)
Crawl Ask.fm QA lists and create corpus for ML.
The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing
The user interface for the Corpus & Repository of Writing, built in Angular
Ebook Corpus - A parser and extractor for electronic books
Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!
Article title, authors, date and body extraction dataset.
Collector and speech cutter for librivox audiobooks
Praaline is an open-source system to manage, annotate, visualise and analyse spoken language corpora
Crawler for linguistic corpora
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Add a description, image, and links to the corpus-builder topic page so that developers can more easily learn about it.
To associate your repository with the corpus-builder topic, visit your repo's landing page and select "manage topics."