Welcome to Roberta-Marathi-MLM

language: "mr"

Welcome to Roberta-Marathi-MLM

Model Description

This is a small language model for Marathi language with 1M data samples taken from OSCAR page

Model hosted on huggingface model hub https://huggingface.co/deepampatel/roberta-mlm-mr

Training params

Dataset - 1M data samples are used to train this model from OSCAR page(https://oscar-corpus.com/) eventhough data set is of 2.7 GB due to resource constraint to train I have picked only 1M data from the total 2.7GB data set. If you are interested in collaboration and have computational resources to train on you are most welcome to do so.
Preprocessing - ByteLevelBPETokenizer is used to tokenize the sentences at character level and vocabulary size is set to 52k as per standard values given by Ã°Å¸Â¤â€”

Intended uses & limitations this is for anyone who wants to make use of marathi language models for various tasks like language generation, translation and many more use cases.

Whatever else is helpful! If you are intersted in collaboration feel free to reach me Deepam

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
static		static
MarathiRoberta.ipynb		MarathiRoberta.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

static

static

MarathiRoberta.ipynb

MarathiRoberta.ipynb

README.md

README.md

Repository files navigation

language: "mr"

Welcome to Roberta-Marathi-MLM

Model Description

Training params

About

Releases

Packages

Languages

deepampatel/MarathiBert

Folders and files

Latest commit

History

Repository files navigation

language: "mr"

Welcome to Roberta-Marathi-MLM

Model Description

Training params

About

Topics

Resources

Stars

Watchers

Forks

Languages