Build software better, together

Resources for conservation, development, and documentation of low resource (human) languages.

nlp list natural-language-processing awesome natural-language language-learning awesome-list language-resources endangered-languages human-language language-documentation resourced-languages minority-language low-resource-languages lrls

Updated May 9, 2024
TeX

csebuetnlp / banglanmt

Star

This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.

machine-translation neural-machine-translation parallel-corpus parallel-corpora bangla-nlp low-resource-languages bangla-machine-translation bangla-dataset-machine-translation emnlp-2020 low-resource-nlp low-resource-machine-translation

Updated Jan 30, 2023
Python

csebuetnlp / xl-sum

Star

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

multilingual machine-learning deep-learning dataset text-summarization abstractive-text-summarization abstractive-summarization text-summarisation low-resource-languages multilinguality summarization-corpora summarization-dataset multilingual-text-summarization text-summarization-dataset text-summarization-model low-resource-summarization low-resource-text-summarizarion multilingual-summarization

Updated Mar 26, 2024
Python

Andrews2017 / africanlp-public-datasets

Star

A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.

natural-language-processing african-languages datasets low-resource-languages

Updated Apr 26, 2024

hausanlp / NaijaSenti

Star

This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, Hausa, Yoruba and Pidgin.

nlp sentiment-analysis sentiment dataset african-languages yoruba hausa sentiment-classification nigeria yorubaname-dictionary igbo low-resource-languages igbo-language nigerian-data sentiment-data low-resource-nlp hausa-nlp hausanlp

Updated Jan 10, 2024

cdli-gh / Semi-Supervised-NMT-for-Sumerian-English

Star

Exploring the Limits of Low-Resource Neural Machine Translation

translation unsupervised transformers nmt semi-supervised backtranslation xlm low-resource-languages

Updated Feb 16, 2023
Jupyter Notebook

kbatsuren / CogNet

Star

CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates

wordnet corpus-linguistics language-resources cognate bilingual-lexicon-extraction low-resource-languages cross-lingual-simialrity multilinguality cross-lingual-transfer bilingual-lexicon-induction

Updated Jun 15, 2023

jcblaisecruz02 / Filipino-Text-Benchmarks

Star

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

benchmark deep-learning text-classification corpus transformer transfer-learning tagalog bert filipino electra nli low-resource-languages tagalog-transformers electra-models

Updated Dec 13, 2020
Python

cisnlp / GlotLID

Star

GlotLID: Language Identification with Support for More Than 2000 Labels -- EMNLP 2023

language-detection multlingual language-detector language-recognition glot lid language-identification language-classification language-identification-toolkit low-resource-languages language-detection-library language-identifier language-detection-lib langid low-resource-nlp

Updated May 12, 2024
Python

IgnatiusEzeani / IGBONLP

Star

This is a repository for the IGBONLP Project.

nlp deep-learning machine-translation low-resource-languages igbo-language

Updated Feb 27, 2022
Modula-3

tafseer-nayeem / BengaliSummarization

Star

[EACL 2021] - Unsupervised Abstractive Summarization of Bengali Text Documents.

abstractive-summarization low-resource-languages bengali-nlp summarization-dataset bengali-summarization bengali-abstractive-summarization bengali-summarization-dataset

Updated Apr 26, 2021
Python

surafelml / improving-zeroshot-nmt

Star

Improving Zero-shot Translation of Low-resource Languages [IWSLT 2017]

nmt low-resource-languages multilingual-nmt zero-shot-translation unsupervised-nmt

Updated Aug 30, 2021
Shell

uds-lsv / transfer-distant-transformer-african

Star

Code + data for the EMNLP'20 publication "Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages"

african-languages ner topic-classification low-resource low-resource-languages transformer-models

Updated Dec 16, 2021
Python

Rumeysakeskin / Turkish-Text-to-Speech

Star

Speech synthesis (TTS) in low-resource languages by training from scratch with Fastpitch and fine-tuning with HifiGan

pytorch tts speech-synthesis nvidia-docker waveform-generator low-resource-languages nvidia-nemo hifigan fastpitch turkish-text-to-speech phonetical-conversion spectrogram-generator

Updated Dec 5, 2023
Python

tafseer-nayeem / BengaliReadability

Star

[AAAI 2021] - Simple or Complex? Learning to Predict Readability of Bengali Texts.

low-resource-languages bengali-language-processing bengali-natural-language-processing bengali-nlp bengali-readability bengali-readability-analysis bengali-readability-prediction readability-dataset bengali-readability-dataset

Updated Apr 6, 2021
Python

AsifulNobel / Metsys

Star

Chatbot Solution for Resource-Poor Languages. Contains code and data for Journal Article 'Focused domain contextual AI chatbot framework for resource poor languages'.

nlp website natural-language-processing neural-network chatbot django-application django-channels restful-api nlp-machine-learning bangla-nlp low-resource-languages bangla-ai low-resource-nlp resource-poor-languages customer-service-chatbot

Updated Jul 25, 2021
Python

unza-speech-lab / zambezi-voice

Star

Repository for multilingual speech data resources for native languages of Zambia.

speech-recognition speech-to-text zambia low-resource-languages

Updated Oct 2, 2023

alecokas / swahili-text-gcn

Star

Graph Convolutional Network for Swahili News Classification: https://arxiv.org/abs/2103.09325

python natural-language-processing text-classification pytorch semi-supervised-learning swahili graph-convolutional-networks gcn graph-neural-networks low-resource-languages

Updated Jun 2, 2021
Jupyter Notebook

alecokas / BiLatticeRNN-Confidence

Star

Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks https://arxiv.org/abs/1910.11933 or https://ieeexplore.ieee.org/document/9053264

pytorch lstm speech-recognition attention lattice speech-processing asr lattices confidence-estimation low-resource-languages pytorch-implementation confidence-scores confusion-networks latticernn confidence-estimates

Updated Apr 16, 2020
Python

RichardLitt / thesis

Sponsor

Star

My thesis on "Open Source Code and Low Resource Languages" for an MSc in Language Science and Technology at Saarland University

nlp thesis dissertation saarland endangered-languages nlproc lrl saarland-university low-resource-languages

Updated Jun 29, 2018
TeX

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

low-resource-languages

Here are 105 public repositories matching this topic...

RichardLitt / low-resource-languages

csebuetnlp / banglanmt

csebuetnlp / xl-sum

Andrews2017 / africanlp-public-datasets

hausanlp / NaijaSenti

cdli-gh / Semi-Supervised-NMT-for-Sumerian-English

kbatsuren / CogNet

jcblaisecruz02 / Filipino-Text-Benchmarks

cisnlp / GlotLID

IgnatiusEzeani / IGBONLP

tafseer-nayeem / BengaliSummarization

surafelml / improving-zeroshot-nmt

uds-lsv / transfer-distant-transformer-african

Rumeysakeskin / Turkish-Text-to-Speech

tafseer-nayeem / BengaliReadability

AsifulNobel / Metsys

unza-speech-lab / zambezi-voice

alecokas / swahili-text-gcn

alecokas / BiLatticeRNN-Confidence

RichardLitt / thesis

Improve this page

Add this topic to your repo