Skip to content
This repository has been archived by the owner on Oct 26, 2020. It is now read-only.

Experimental Data and Annotations for Master's Thesis as submitted to CUNI in July 2020

Notifications You must be signed in to change notification settings

Akshayanti/Masters-Thesis-CUNI-2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Consistency of Linguistic Annotation

Abstract

This thesis attempts at correction of some errors and inconsistencies in different treebanks. The inconsistencies can be related to linguistic constructions, failure of the guidelines of annotation, failure to understand the guidelines on annotator's part, or random errors caused by annotators, among others. We propose a metric to attest the POS annotation consistency of different treebanks in the same language, when the annotation guidelines remain the same. We offer solutions to some previously identified inconsistencies in the scope of the Universal Dependencies Project, and check the viability of a proposed inconsistency detection tool in a low-resource setting. The solutions discussed in the thesis are language-neutral, intended to work with multiple languages with efficiency.

List of Experiments

  1. Estimating POS Annotation Consistency of Different Treebanks in a Language
  2. conj_head: Head Identification in Coordinating Conjunctions
  3. Mining Errors in Low-Resource Languages by Combining LISCA And Cross-Validation
  4. AUX vs. VERB: Attempt at Separation of Verbs and Auxiliary Verbs

About Repository

The repository contains the data in form of codes, and the experiment results. The thesis was finished in July 2020.

Supervisor: Dan Zeman, UFAL, Charles University, Prague
Co-Supervisor: Koldo Gojenola, Computer Languages and Systems, University of Basque Country (UPV-EHU), Spain

Thesis Main Document (Latex Source)