Skip to content

POS priority list

Paul Lynn edited this page Feb 22, 2019 · 5 revisions

Priority list is a list which represents dominating of one POS over another. That means that even though some token can be interpreted as a different part of speech depends on context some POS occurs more frequently.

Of course, these lists can break rare contexts, but that is the cost you pay.

JSON declaring

When you run your recognizer and it supports priority list, you can pass it with the following manner:

[
	{
		"__what": {
			"xpos": "A",
			"upos": "B"
		},
		"__replace": {
			"xpos": "C",
			"upos": "D"
		}
	}
]

That means that every token with A XPOS and B UPOS occurring in the text will be modified with given property-value pairs.

MorhologyRecognizer class described in libs/morphology.py supports this technique, but with some restrictions:

  1. Only xpos and upos properties of the token can be changed.
  2. Modifying will be made only when both of "before modifying" and "after modifying" xpos-upos pairs exist in db response. That guarantee that only tokens which really can be both of POS will be modified.

List for ukrainian

Here the good list of priority for ukrainian. It will supply in the future.

[
	{
		"__what": {
			"xpos": "Q"
		},
		"__replace": {
			"xpos": "Ccs",
			"upos": "CCONJ"
		}
	},
	{
		"__what": {
			"xpos": "Y"
		},
		"__replace": {
			"xpos": "Spsl",
			"upos": "ADP"
		}
	},
	{
		"__what": {
			"xpos": "Q"
		},
		"__replace": {
			"xpos": "Spsl",
			"upos": "ADP"
		}
	},
	{
		"__what": {
			"xpos": "Ncmpan"
		},
		"__replace": {
			"xpos": "Vmen",
			"upos": "VERB"
		}
	},
	{
		"__what": {
			"xpos": "Ncmpan"
		},
		"__replace": {
			"xpos": "Vmpn",
			"upos": "VERB"
		}
	},
	{
		"__what": {
			"xpos": "Nc-piy"
		},
		"__replace": {
			"xpos": "Ncmpin",
			"upos": "NOUN"
		}
	},
	{
		"__what": {
			"xpos": "Spsi"
		},
		"__replace": {
			"xpos": "Spsa",
			"upos": "ADP"
		}
	},
	{
		"__what": {
			"xpos": "Pd--m-sga"
		},
		"__replace": {
			"upos": "Pd--nnsgn",
			"xpos": "PRON"
		}
	},
	{
		"__what": {
			"xpos": "Ncmpan"
		},
		"__replace": {
			"upos": "Vmpn",
			"xpos": "VERB"
		}
	},
	{
		"__what": {
			"xpos": "Ncmsan"
		},
		"__replace": {
			"upos": "Ncmsgn",
			"xpos": "NOUN"
		}
	}
]

Here's explanation on some tags:

Change this To this Examples (list) Description
Q CCONJ Ccs та, і These words appear to be particles in some context, but more often they are conjunctions.
Y ADP Spsl у That can be letter of abbreviation, but that is also adpositions.
Q ADP Spsl на

Declare using Contextual19

You can declare priority list using this techique:

if
	token
		xpos is ...
		upos is ...
then
	xpos becomes ...
	upos becomes ...

No syntactic sugar for now.

Clone this wiki locally