Resources for conservation, development, and documentation of low resource (human) languages.
-
Updated
May 9, 2024 - TeX
Resources for conservation, development, and documentation of low resource (human) languages.
This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.
This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, Hausa, Yoruba and Pidgin.
Exploring the Limits of Low-Resource Neural Machine Translation
CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
GlotLID: Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
This is a repository for the IGBONLP Project.
[EACL 2021] - Unsupervised Abstractive Summarization of Bengali Text Documents.
Improving Zero-shot Translation of Low-resource Languages [IWSLT 2017]
Code + data for the EMNLP'20 publication "Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages"
Speech synthesis (TTS) in low-resource languages by training from scratch with Fastpitch and fine-tuning with HifiGan
[AAAI 2021] - Simple or Complex? Learning to Predict Readability of Bengali Texts.
Chatbot Solution for Resource-Poor Languages. Contains code and data for Journal Article 'Focused domain contextual AI chatbot framework for resource poor languages'.
Repository for multilingual speech data resources for native languages of Zambia.
Graph Convolutional Network for Swahili News Classification: https://arxiv.org/abs/2103.09325
Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks https://arxiv.org/abs/1910.11933 or https://ieeexplore.ieee.org/document/9053264
My thesis on "Open Source Code and Low Resource Languages" for an MSc in Language Science and Technology at Saarland University
Add a description, image, and links to the low-resource-languages topic page so that developers can more easily learn about it.
To associate your repository with the low-resource-languages topic, visit your repo's landing page and select "manage topics."