Skip to content
This repository has been archived by the owner on May 24, 2021. It is now read-only.

arruw/fri-2021-nlp-project

Repository files navigation

Cross-Lingual Offensive Language Identification

Authors: Nikolina Grabovica, Selma Halilčević, Matjaž Mav

Advisors: Slavko Žitnik

Organization: University of Ljubljana, Faculty of Computer and Information Science

Course: Natural Language Processing 2020/2021


Description

In this short paper we reviewed a few publicly available datasets and a few different methods for offensivelanguage identification. We explored traditional methods using handcrafted features, contextual embeddings andembedding alignment methods and current state of the art transformer models.

Report: report.pdf


Requirements

Installation

Folder structure

├── .gitignore                      Git ignore config
├── README.md                       This file
├── requirements.txt                Conda environment definition
├── data/                           Contains datasets 
├── reports/                        Contains reports
├── results/                        Contains final results and visualizations
├── checkpoints/                    !!Contains downloaded checkpoints, see installation steps!!
    ├── elmoformanylanguages/       Contains pre-trained ELMo for EN and SI language
    ├── outputs/                    Contains pre-trained BERT, mBERT, T5 and mT5 models
    ├── .gitignore                  
└── src/                            Contains source files
    └── eval-*.ipynb                Model evaluation notebooks