Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced (was: effective) parents and children #10

Open
VincTheSecond opened this issue Dec 1, 2016 · 9 comments
Open

Enhanced (was: effective) parents and children #10

VincTheSecond opened this issue Dec 1, 2016 · 9 comments
Assignees

Comments

@VincTheSecond
Copy link

Implement methods eparent and echildren for the Node object.

@martinpopel
Copy link
Contributor

UD uses a Stanford style of coordinations, not the Prague style as Treex+PDT.
So we cannot use the same implementation of eparents and echildren as in Treex and thus I don't think it is a good idea to use the same names.

My original plan was to not use the Stanford style internally in Udapi (see my Czech notes) or even to persuade UD team to adopt my style, but this is not likely to happen.
Anyway, shared dependants in UD are/will be solved by the enhanced dependencies, which makes the whole problem a bit simpler (now, there is always just one primary "effective" parent and possible other parents are marked by the enhanced edges).
Also aposition is solved hypotacticaly as a special deprel, so it does not mix with coordinations.

I agree it would be nice to have some coordination-specific API, probably reconsidering my suggestions, but taking the UD v2 as the basis.

@VincTheSecond
Copy link
Author

I read the page http://universaldependencies.org/u/overview/enhanced-syntax.html and I'm confused. In Section "Conjoined verbs and verb phrases" I see a subject which have two parents (two coordinated verbs). Is it just a proposal of a new UD version or something like that?

With Silvie we defined echild/eparent on this sample tree:

1 John     subj   2
2 read     root   0
3 and      cc     2
4 write    conj   2
5 books    dobj   2

In this situation:

  • eparent(John) = [read, write]
  • echild(white) = [John, books]

@martinpopel
Copy link
Contributor

Is it just a proposal of a new UD version or something like that?

It is a fresh UD version 2.0. In UD 1.0 enhanced (secondary) dependencies were not described in such detail (but they were allowed).
Note that basic dependencies are stored in the 7th column of CoNLL-U and each node has exactly one parent in the basic dependencies (the graph is a tree).
Enhanced dependencies are stored in the 9th column and there may be more parents and the graph can contain cycles.

As I think about it, we need first the API for enhanced dependencies.
Once we have it, API for coordinations will be just a thin syntactic sugar on top of it - it will use the enhanced dependencies, but only for coordinations (Propagation of Conjuncts), not for Ellipsis, Controlled/raised subjects etc. I hope these cases will be easy to distinguish.

We may also create a block which will heuristically add the enhanced dependencies (at least the coordination-related ones) given the basic dependencies. We could not expect that all the treebanks in UD 2.0 will have the enhanced dependencies filled. Of course, the heuristic may not be able to distinguish "The store buys and sells cameras" (where cameras has two parents) from "She was reading or watching a movie" (where movie has just one parent), but something is better than nothing.

@VincTheSecond
Copy link
Author

It seems that this will be really complicated and big issue.
For my project, at least this will be needed:

  • Write a block which will guess the secondary dependencies (in the first version, some analogy of eparent/echild from the PML). Do you have some preference for the block name and location?
  • Write a eparent/echild node methods which will look up not only to the basic tree but also into secondary dependencies.

Are you OK with this?

@martinpopel
Copy link
Contributor

Write a block which will guess the secondary dependencies

Yes.

(in the first version, some analogy of eparent/echild from the PML).

In PDT/Treex, it was difficult to implement eparents/echildren correctly, but it was possible (uniquely definer) because of the Prague style of coordinations and the attribute is_member.
In UD basic dependencies, it is not possible without heuristics. I would not call these heuristics an analogy of eparent/echild from the PML.

Do you have some preference for the block name and location?

Not really, but luckily this could be easily changed once we find a better naming pattern for Udapi blocks.
As there is no m-layer, a-layer, t-layer, the Treex naming pattern (W2A, A2T,...) does not make sense for Udapi.
I've drafted some first-level block namespaces in Perl Udapi, but none of these really fits here. What about Block::Util::AddEnhancedDependencies?
Note that in UD v2, these dependencies are called enhanced (I think secondary was used in some v1 docs, but maybe it was only I who used the term).

Write a eparent/echild node methods which will look up not only to the basic tree but also into secondary dependencies.

First, how to call the method for getting all enhanced dependencies of a node (where the node is dependent)?
The CoNLL-U column is called DEPS, but it stores also the deprels and if we choose node.deps(), we miss the parallel with node.parent.
What about node.enhanced_parents or node.e_parents?
Then we could provide also node.e_children.
So the names will be almost the same as Treex $node->get_eparents() and $node->get_echildren(), but now the e stands for enhanced, not effective.

However, I think it will cover the cases when eparents/echildren were used in Treex.
Note that enhanced dependencies are almost always a superset of basic dependencies - there are just few exceptions, e.g. the orphan deprels are only in basic, but not in enhanced.
Of course, enhanced dependencies are now used not only for coordinations,
but I am not sure if there is a use case when it would hurt.

We will also need a method for accessing and setting the deprels of the enhanced dependencies. Let's think about it for a while.

@VincTheSecond
Copy link
Author

I try to summarize the previous discussion to this specification:

Accessing enhanced dependencies

I propose the same strategy as we applied on morphological features. The raw string will be stored in node.raw_deps. After the first use of the node.deps, the raw string will be serialized into a list with elements {parent, deprel}.

Algorithm for obtaining eparents/echildren

A first attempt was proposed by @cinkova. The proposed algorithm should be enough for our project. We do not have an ambition to implement the exact analogy of eparents/echildren from PML for now. If this is not enough to become the part of the Node object, we can implement it only "locally" in our block.

Enhanced parents/children

I propose to create a methods node.get_eparents() and node.get_echildren()

@nschneid
Copy link

What is the status of this? The Node class offers Node.deps, with which it is easy to access enhanced parents. But I don't see an easy way to access enhanced children.

@martinpopel
Copy link
Contributor

Thanks for reminding me about this. I will try to look at it next week. I have a draft implementation with node.enh_parents and node.enh_children (already since 2017). The tricky part is to make sure that after any edits (changing basic/enhanced deps, deleting/adding/reordering nodes) everything will remain in a consistent state, without making Udapi slower for those who don't need enhanced deps.
Recently, I've added support for coreference including bridging links, which is in some aspects similar to the enhanced deps.

@dan-zeman
Copy link
Collaborator

with node.enh_parents and node.enh_children

My experience is that I rarely need a list of enhanced parents without the corresponding deprels. Typically I'm looking for parents whose deprels (meaning: the deprel of the enhanced relation between the parent and the current node) match a regular expression. So I would probably still use Node.deps (with my post-filtering) rather than Node.enh_parents.

I would do the same thing (look at the deprels + possibly filter) when accessing enhanced children but @nschneid is right that it is not easy to get a list of what I would call childdeps or cdeps. Something like the following would be needed and maybe Udapi could do it once the enhanced graph (or specifically the childdeps) is accessed for the first time:

nodes = node.root.descendants
for n in nodes:
    n.childdeps = []
for n in nodes:
    for edep in n.deps:
        edep['parent'].childdeps.append({'child': n, 'deprel': edep['deprel']})

I understand that it could be tricky to keep these lists up-to-date when someone modifies node.deps. Maybe a lightweight approach could be that the above code would be available as a method (Node.compute_childdeps()) and the user would be responsible for calling it when needed.

@dan-zeman dan-zeman changed the title Effective parents, effective children Enhanced (was: effective) parents and children Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants