corpus-builder

Star

Here are 17 public repositories matching this topic...

FerreroJeremy / Plagiarized-Corpus-Generator

Star

A corpus builder for evaluation of plagiarism detection tools

plagiarism corpus-generator corpus-builder

Updated Dec 12, 2016
PHP

sorinmarti / fruechtekorb

Star

This is a text corpus management system for the german linguistic department of the university of Basel.

corpus linguistics corpus-linguistics corpus-builder

Updated Apr 15, 2020
PHP

c0ntradicti0n / CorpusCookApp

Star

App and Scripts working with the corpus-builder CorpusCook, to have a corpus updated with corrected wrong predictions

amp python3 twisted corpus-linguistics nlp-machine-learning corpus-builder kivy-application

Updated Mar 20, 2020
Python

CristinaGHolgado / vikitext

Star

Extract text from Vikidia/Wikipedia articles [fr]

corpus readability corpus-builder wikipedia-scraper text-simplification french-nlp vikidia

Updated Jul 20, 2021
Python

binayachaudari / Corpus-Development-Software

Star

Corpus Development Software for Machine Translation

machine-learning machine-translation corpus-builder

Updated Apr 23, 2024
JavaScript

IDS-Mannheim / Wikipedia-Corpus-Builder

Star

Builds Wikipedia corpora in I5 (a TEI-based format)

wikipedia xml tei corpus-builder wikipedia-corpus

Updated Jun 21, 2022
Java

tubone24 / askfm-qa-crawler

Star

Crawl Ask.fm QA lists and create corpus for ML.

crawler selenium chromedriver corpus-builder askfm

Updated Dec 15, 2023
Python

writecrow / crow_backend

Star

The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing

api natural-language-processing backend corpus corpus-linguistics corpus-generator corpus-builder

Updated Apr 23, 2024
PHP

writecrow / crow_frontend

Star

The user interface for the Corpus & Repository of Writing, built in Angular

natural-language-processing angular corpus corpora corpus-linguistics corpus-builder

Updated Apr 10, 2024
TypeScript

dohliam / ebook-corpus

Star

Ebook Corpus - A parser and extractor for electronic books

corpus mobi epub ebooks corpus-linguistics fb2 corpus-builder ebook-parsing

Updated Aug 6, 2019
Ruby

thecsw / katya-dev

Star

Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!

corpus russian tagger corpus-linguistics corpus-generator corpus-builder text-corpus russian-literature corpus-processing corpus-analysis

Updated Mar 14, 2024
Go

AndyTheFactory / article-extraction-dataset

Star

Article title, authors, date and body extraction dataset.

text-mining news html-to-markdown scraping corpus news-aggregator text-extraction dataset web-scraping readability datasets scraping-websites html2text news-crawler corpus-builder corpus-tools article-extractor text-cleaning text-preprocessing

Updated Mar 26, 2024
HTML

carlfm01 / librivox-tools

Star

Collector and speech cutter for librivox audiobooks

data-collector speech-to-text corpus-builder corpus-tools librivox

Updated Dec 8, 2022
C#

uma-pi1 / OPIEC-pipeline

Star

praaline / Praaline

Star

Praaline is an open-source system to manage, annotate, visualise and analyse spoken language corpora

annotations corpus visualisation linguistics corpus-linguistics speech-processing corpus-builder corpus-tools speech-analysis spoken-language-processing

Updated Sep 21, 2022
C

google / corpuscrawler

Star

Crawler for linguistic corpora

crawling linguistics corpus-linguistics corpus-builder minority-language

Updated Dec 5, 2023
Python

adbar / trafilatura

Star

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Updated Jun 13, 2024
Python

Improve this page

Add a description, image, and links to the corpus-builder topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus-builder topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corpus-builder

Here are 17 public repositories matching this topic...

FerreroJeremy / Plagiarized-Corpus-Generator

sorinmarti / fruechtekorb

c0ntradicti0n / CorpusCookApp

CristinaGHolgado / vikitext

binayachaudari / Corpus-Development-Software

IDS-Mannheim / Wikipedia-Corpus-Builder

tubone24 / askfm-qa-crawler

writecrow / crow_backend

writecrow / crow_frontend

dohliam / ebook-corpus

thecsw / katya-dev

AndyTheFactory / article-extraction-dataset

carlfm01 / librivox-tools

uma-pi1 / OPIEC-pipeline

praaline / Praaline

google / corpuscrawler

adbar / trafilatura

Improve this page

Add this topic to your repo