Skip to content

Source code for EMNLP 2018 paper: "Data Augmentation via Dependency Tree Morphing for Low-Resource Languages"

Notifications You must be signed in to change notification settings

gozdesahin/crop-rotate-augment

Repository files navigation

crop-rotate-augment

The code for our EMNLP18 paper "Data Augmentation via Dependency Tree Morphing for Low-Resource Languages".

First you need to download UD treebanks v2.1. You can do so by running 'sh preprocess.sh' Then you can either experiment with the method parameters and single connlu files by running 'sh augment_single.sh'. File parameters are:

  • infile: UD file to augment
  • outfile: Name of the output file
  • maxrot: Maximum number of rotations per sentence
  • prob: Probability of the augmentation operation
  • operation: rotate or crop

We also provide the script, augment_all.sh to augment all training UD files. The parameters are:

  • input: Root folder where UD treebanks are downloaded (e.g., ./data/ud-treebanks-v2.1)
  • maxrot: Maximum number of rotations per sentence
  • prob: Probability of the augmentation operation

Beware that this is Python 2.7 code!

About

Source code for EMNLP 2018 paper: "Data Augmentation via Dependency Tree Morphing for Low-Resource Languages"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published