Skip to content
This repository has been archived by the owner. It is now read-only.

[Ontologies] Find a reliable approach to manage ontology terms ID consistency #16

Open
christian-oreilly opened this issue Apr 11, 2018 · 7 comments
Assignees

Comments

@christian-oreilly
Copy link
Contributor

from nat.treeData import getChildren
print(list(getChildren("BIRNLEX:160").keys()))

produces

['BIRNLEX:421', 'BIRNLEX:266', 'NLXORG:20081201', 'BIRNLEX:498', 'BIRNLEX:160', 'BIRNLEX:710', 'BIRNLEX:202', 'BIRNLEX:211', 'BIRNLEX:254']

and

print(list(getChildren("NIFORG:birnlex_160").keys()))

produces

['NIFORG:nlx_organ_20081201', 'NIFORG:birnlex_211', 'NIFORG:birnlex_266', 'NIFORG:birnlex_498', 'NIFORG:birnlex_421', 'NIFORG:birnlex_710', 'NIFORG:birnlex_202', 'NIFORG:birnlex_254', 'NIFORG:birnlex_160']

The terms identified by "BIRNLEX:160" and "NIFORG:birnlex_160" are identical. These alternative ways to format the ID of ontological terms cause difficulties for basic operation like asking "Is this model organism (e.g., wistar rat) is a subclass of another model organism (e.g., rodent)." When looking if 'NIFORG:birnlex_211' is a subclass of "BIRNLEX:160" presently we get False when we would expect True. This is due to comparison of a given ID with a list of subclasses ID, but this list does not contain all possible alternatives ways to write the ID. We need to find a consistent and reliable way to check these equivalences. Most importantly, it has to be relatively efficient, e.g., systematic REST call to check for equivalences could quickly results in poor performances.

@christian-oreilly
Copy link
Contributor Author

@tgbugs Do you have any insights on the structure of NIFSTD or the scigraph client that could help us with respect to this issue?

@tgbugs
Copy link

tgbugs commented Apr 11, 2018

I think this is the result of the fact that we transitioned the ontology away from the ontology.neuinfo.org identifiers to the uri.neuinfo.org identifiers. See SciCrunch/NIF-Ontology@b268a6b for details. I do not load the mapping file into SciGraph to avoid confusion, though in this case it seems to have caused some. I also have not tested whether SciGraph treats owl:sameAs correctly with regard to issuing queries against the graph, so there is a possibility that you would have to issue two SciGraph queries even if I did. I would suggest switching to the new uri scheme but totally understand the human readability needs. Therefore I suggest that you can load the mapping file into a python dict to do the translation and it will be performant. I might insert a translation shim that switches the representation of those identifiers whenever a call is made in or out of SciGraph. We use an equivalent implementation to do the translations in nginx for the resolver. One note is that you should not do this computationally by trying to replace prefixes because there are exceptions.

Also, the endpoint you have here https://github.com/BlueBrain/nat/blob/master/nat/treeData.py#L107 is no longer accessible, so I'm not entirely sure where that data is coming from. If you have hardcoded the IP to old matrix in your hosts file or something like that then you are almost certainly get stale data. If you want to switch to our maintained endpoint (which is now finally up) see (newly added) note in the readme https://github.com/SciCrunch/NIF-Ontology#using-nifstd and switch your query to

    api_key = os.environ['SCICRUNCH_API_KEY']
    baseKS = "http://scicrunch.org/api/1/"
    response = requests.get(baseKS + "/scigraph/graph/neighbors/" + 
                            root_id + "?direction=" + direction + 
                            "&depth=" + str(maxDepth) + 
                            "&project=%2A&blankNodes=false&" + relationshipType +
                            "&key=" + api_key)

Please let me know if this addresses the issue. Best!

@pafonta
Copy link
Contributor

pafonta commented Apr 12, 2018

NB: For the endpoint, we have an open issue (#11). Due to several things, it has not yet been fixed.

@christian-oreilly
Copy link
Contributor Author

Thanks @tgbugs for the info. It is very useful. I'll uses the mapping file you pointed us to implement explicit equivalences and avoid defining general rules on prefix equivalences due to exceptions.

@christian-oreilly
Copy link
Contributor Author

@tgbugs I'm back working on things related to this issue. I went at https://github.com/SciCrunch/NIF-Ontology#using-nifstd and tried to create a key for the API but both https://scicrunch.org/register and https://scicrunch.org/account/developer are currently empty pages. Were you aware of that? Is that normal? When is the situation expected to be resolved?

@tgbugs
Copy link

tgbugs commented Oct 5, 2018

Definitely not normal. It looks like the UCSD data center went down some time over night. It should be back up some time later today PDT. I will take a look at it when I get in later today and let you know.

@christian-oreilly
Copy link
Contributor Author

OK, so I'll resume this work on Monday then. Thanks for the feedback @tgbugs

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants