Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible wrong #92

Open
wellington36 opened this issue May 20, 2021 · 2 comments
Open

Possible wrong #92

wellington36 opened this issue May 20, 2021 · 2 comments

Comments

@wellington36
Copy link

When running the following command

cat *.conllu | udapy -q util.Eval node='if (node.upos == "ADJ" and node.deprel == "amod" and node.parent.upos == "NOUN" and (node.feats["Gender"] != node.parent.feats["Gender"] or node.feats["Number"] != node.parent.feats["Number"])): node.parent.parent.draw(attributes="form,upos,feats,deprel")'

From the output we get

# sent_id = CP458-6#7
# text = de normas sociais einversa e complementarmente, práticas sociais que avaliam do grau de integração de cada um
 ╭─╼ de ADP _ case
─┾ normas NOUN Gender=Fem|Number=Plur obl
 ┡─╼ sociais ADJ Gender=Fem|Number=Plur amod
 │ ╭─╼ e CCONJ _ cc
 │ ┢─┮ inversa ADJ Gender=Fem|Number=Sing amod
 │ │ │ ╭─╼ e CCONJ _ cc
 │ │ ╰─┶ complementarmente ADV _ conj
 │ ┢─╼ , PUNCT _ punct
 ╰─┾ práticas NOUN Gender=Fem|Number=Plur conj
   ┡─╼ sociais ADJ Gender=Fem|Number=Plur amod
   │ ╭─╼ que PRON Gender=Fem|Number=Plur|PronType=Rel nsubj
   ╰─┾ avaliam VERB Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin acl:relcl
     │ ╭─╼ de ADP _ case
     │ ┢─╼ o DET Definite=Def|Gender=Masc|Number=Sing|PronType=Art det
     ╰─┾ grau NOUN Gender=Masc|Number=Sing obj
       │ ╭─╼ de ADP _ case
       ╰─┾ integração NOUN Gender=Fem|Number=Sing nmod
         │ ╭─╼ de ADP _ case
         ╰─┾ cada DET Gender=Masc|Number=Sing nmod
           ╰─╼ um NUM NumType=Card fixed

However, the "text" value is not part of the respective conllu

text = Uma verdade subjectiva incorporada através de normas sociais e, inversa e complementarmente, práticas sociais que avaliam do grau de integração de cada um.

Udapy replaced "e, inversa" for "eiversa".

@martinpopel
Copy link
Contributor

node.draw() prints a subtree rooted in node. With the default setting print_text=True, also the # text value represents only the word forms of the subtree, not the whole tree.
In your case, you printed the subtree rooted in "normas", but the comma between "e" and "inversa" is not part of the subtree. The node "e" has SpaceAfter=No in the MISC column, so it is printed without any space after the token. I admit, it may be better if SpaceAfter=No is ignored when printing a subtree and when there is a "gap" - a PR is welcome.

It may be better for your purposes, if you just print the whole tree with a given node (i.e. the ADJ node, not its grandparent) highlighted:

cat *.conllu | udapy -TMA util.Mark node='node.upos == "ADJ" and node.deprel == "amod" and node.parent.upos == "NOUN" and (node.feats["Gender"] != node.parent.feats["Gender"] or node.feats["Number"] != node.parent.feats["Number"])' | less -R

Another solution would be to keep using util.Eval, but print the whole tree with node.root.draw().
You can additionally highlight any node with node.misc["Mark"]=1 (which is what is done internally in util.Mark), but util.Eval uses Python eval() which takes a single Python expression. So you would need to convert the solution from util.Eval one-liner to a full Udapi block.

@arademaker
Copy link

Thank you @martinpopel for your detailed explanation. @wellington36 is working with me, hope at some point he can eventually collaborate with udapi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants