Web based database for sign language lexicons and corpuses. Fork of NGT-signbank (https://github.com/Signbank/Global-signbank).
-
Updated
Jun 13, 2024 - Python
Web based database for sign language lexicons and corpuses. Fork of NGT-signbank (https://github.com/Signbank/Global-signbank).
An advanced, extensible web front-end for the Manatee-open corpus search engine
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
🛠 Tools to create, edit and export texts and annotations
OpusFilter - Parallel corpus processing toolkit
A parser for annotated MuseScore 3 files.
Bitextor generates translation memories from multilingual websites
Analyzes binary executables and can generate a test corpus for defined instruction paths, each discovered function, or it can generate a test corpus to reach every basic block detected in non library/shared object parts of the bin's text section.
MFTE (Multi Feature Tagger of English) Python is the Python version based on Le Foll's MFTE written in Perl. It is extended to include semantic tags from Biber (2006) and Biber et al. (1999), including other specific tags.
CorpusQnATool that uses a Corpus and chatGPT to for answers to input queries.
General Missives in Text-Fabric
Article title, authors, date and body extraction dataset.
Scripts for building a geo-located web corpus using Common Crawl data
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Measure the similarity of text corpora for 74 languages
Yet another search platform for linguistic corpora.
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
Add a description, image, and links to the corpus-tools topic page so that developers can more easily learn about it.
To associate your repository with the corpus-tools topic, visit your repo's landing page and select "manage topics."