-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UD POS tagging Results #620
base: master
Are you sure you want to change the base?
Conversation
Added the UD POS tagging results from Substructure Substitution: Structured Data Augmentation for NLP (Shi et al., Findings of ACL 2021).
@ExplorerFreda THANK YOU FOR THIS!! does not need to be in this PR but note that the pen treebank recently had a new SOTA https://paperswithcode.com/paper/sequence-alignment-ensemble-with-a-single The huge (multiple percents) error rate in mainstream datasets and their non-evolution, is also demonstrated in https://labelerrors.com/
A common example of abandonware dataset is Wordnet, unlike its open source successor https://github.com/globalwordnet/english-wordnet which is much more complete and accurate and evolving as language evolve. As the paper shows, 84% of errors in POS taggers are because of pen treebank errors.
Therefore improving POS tagging sentence accuracy via a simple, relatively low-financial cost, expert paid correction of the pen treebank, would lead to an explosion of possibilities for NLU research and products as it is the most salient bottleneck. @ExplorerFreda So my question is: does the UD en POS tag dataset is versionned/improved over time unlike the pen treebank? |
Added the UD POS tagging results from Substructure Substitution: Structured Data Augmentation for NLP (Shi et al., Findings of ACL 2021).