Skip to content

Our source code for EACL2021 workshop: Offensive Language Identification in Dravidian Languages. We ranked 4th, 4th and 3rd in Tamil, Malayalam and Kannada language of this task finally!πŸ₯³

Notifications You must be signed in to change notification settings

codewithzichao/Multilingual-Transformers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Offensive Language Identification in Dravidian Languages at EACL2021 Workshop

Our source code for EACL2021 workshop: Offensive Language Identification in Dravidian Languages. We ranked 4th,4th and 3th in Tamil, Malayalam and Kannada language of this task finally!πŸ₯³

Updated: Source code is released!🀩

I will release the code very soon.

Repository structure

β”œβ”€β”€ README.md
β”œβ”€β”€ ckpt                        # store model weights during training
β”‚Β Β  └── README.md
β”œβ”€β”€ data                        # store the data
β”‚Β Β  └── README.md
β”œβ”€β”€ gen_data.py                 # generate Dataset
β”œβ”€β”€ install_cli.sh              # install required package
β”œβ”€β”€ loss.py                     # loss function
β”œβ”€β”€ main_xlm_bert.py            # train mulingual-BERT
β”œβ”€β”€ main_xlm_roberta.py         # train XLM-RoBERTa
β”œβ”€β”€ model.py                    # model implementation
β”œβ”€β”€ pred_data
β”‚Β Β  └── README.md
β”œβ”€β”€ preprocessing.py            # preprocess the data
β”œβ”€β”€ pretrained_weights          # store the pretrained weights
β”‚Β Β  └── README.md
└── train.py                    # define training and validation loop

Installation

Use the following command so that you can install all of required packages:

sh install_cli.sh

Preprocessing

The first step is to preprocess the data. Just use the following command:

python3 -u preprocessing.py

Training

The second step is to train our model. In our solution, We trained two models which use multilingual-BERT and XLM-RoBERTa as the encoder, respectively.

If you want to train model which use multilingual-BERT as the encoder, use the following command:

nohup python3 -u main_xlm_bert.py \
        --base_path your base path \
        --batch_size 8 \
        --epochs 50 \
        > train_xlm_bert_log.log 2>&1 &

If you want to train model which use XLM-RoBERTa as the encoder, use the following command:

nohup python3 -u main_xlm_roberta.py \
        --base_path your base path \
        --batch_size 8 \
        --epochs 50 \
        > train_xlm_roberta_log.log 2>&1 &

Inference

The final step is inference after training. Use the following command:

nohup python3 -u inference.py > inference.log 2>&1 &

Congralutions! You have got the final results!🀩

If you use our code, please indicate the source.

About

Our source code for EACL2021 workshop: Offensive Language Identification in Dravidian Languages. We ranked 4th, 4th and 3rd in Tamil, Malayalam and Kannada language of this task finally!πŸ₯³

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published