Skip to content

This repo contains LUO corpus for Named Entity Recognition. The text comes from the news domain and was scrapped from Radio Ramogi.

Notifications You must be signed in to change notification settings

Pogayo/Luo-News-Dataset

Repository files navigation

LUO LANGUAGE DATASET FOR NER

About

This repo contains LUO corpus for Named Entity Recognition. The text comes from the news domain and was scrapped from Radio Ramogi. I did this as part of the Masakhane NER project

NLP, NER , Masakhane

Table of Contents

About Dataset

The sentences were obtained from Ramogi FM website: https://rmsradio.co.ke/brands/ramogi-fm/

Dates published: 1/9/2018 - 10/3/2021. Get the most updated information from README.txt

Categories

Get the most updated information from README.txt

Repo Structure

This repo contains 3 main files of interest.

1. README.md

This file

2. README.txt

Contains statistical description of the data- News domains, publication and collection dates

3. LUO.txt

Contains a cleaned compilation the text

The rest are just files used in the collection and cleaning process.

Clone

  • Clone this repo to your local machine using https://github.com/Pogayo/Luo-News-Dataset

Contributing

To get started...

Step 1

  • Option 1

    • 🍴 Fork this repo!
  • Option 2

    • 👯 Clone this repo to your local machine using https://github.com/Pogayo/Luo-News-Dataset

Step 2

  • HACK AWAY! 🔨🔨🔨

Step 3

  • 🔃 Create a new pull request

Team

Perez Ogayo

Perez Ogayo

Verrah Otiende

  • We are a small team. Join us and let's put Luo on the NLP Map together!

FAQ

  • How do I do collect the sentences?
    • Go to the Ramogi Website . Typically, you will only find the latest news.
    • If you have exhausted the latest news, go to the web archive to get links of earlier news.

Support me

I am in the process of setting up a wallet. Feel free to reach out to me so that I can give you other payment details in the meantime.


License

CCBY4 licence

This work is licensed under a Creative Commons Attribution 4.0 International License.

About

This repo contains LUO corpus for Named Entity Recognition. The text comes from the news domain and was scrapped from Radio Ramogi.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published