Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shaclgen: Add --include-annotations option to let annotations be part of shacl shapes #2111

Merged
merged 4 commits into from
May 29, 2024

Conversation

jsheunis
Copy link
Contributor

@jsheunis jsheunis commented May 15, 2024

This is for the SHACL generator in response to #1618.

Code is added to shaclgen.py to:

  • allow users to specify the --include-annotations tag if they want annotations (on classes, slots, and types) to be included in the exported SHACL shapes
  • determine the datatype of both annotation tag and value:
    • a CURIE is identified by searching for the ':' character in the tag or value
    • if not a CURIE, the annotation tag is assumed to be a literal with type string
    • if not a CURIE, the annotation value is a literal with datatype determined using python type and extended_float, extended_int, extended_str, etc
    • add the correct triples to the shacl output (to a nodeshape for classes, and to a property shape for slots and slots with types as ranges)

In addition, the test_shaclgen test is updated to test for annotations on classes, and the kitchen sink schema is updated to include annotations on the person class. Several snapshot data files needed to be updated as well because of the change to that schema, so that existing tests would still succeed.

Uncertainties:

  • I was unsure how to test for annotations on types and slots, because the related rdf terms in the graph generated from the kitchen sink schema would be blank nodes with transient IDs, so these can't be captured in the "EXPECTED" triples.
  • Are there other elements that can also have annotations that aren't included and should ideally form part of this PR?
  • I am not sure if my approach to get the element type, and hence the XSD datatype, is sensible; happy to update this approach if there are LinkML-preferred ways of doing the same
  • I currently have a workaround in the code for a jsonasobj2 issue, mentioned here: jsonasobj2 causes inconsistent behavior on attr assignment vs. instantiation for linkml_model classes #1665 (comment)
  • UPDATE: conflicts can arise when a slot has annotations, and if that same slot has a type as a range, where the type also has annotations. If some of these annotations have the same tag, the current way that the code is implemented means that the values will be appended, not replaced. It seems to me that the desired behaviour will differ based on the use case. So the question is whether more user-defined controls should be implemented to direct such behaviour?

This is for the SHACL generator in response to linkml#1618.
Code is added to shaclgen.py to:
- allow users to specify the --include-annotations tag if they
want annotations (on classes, slots, and types) to be included
in the exported SHACL shapes
- determine the datatype of both annotation tag and value (a
CURIE is identified by searching for the ':' character)
- add the correct triples to the shacl output (to a nodeshape
for classes, and to a property shape for slots and slots with typesas ranges)
Copy link

codecov bot commented May 16, 2024

Codecov Report

Attention: Patch coverage is 78.94737% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 79.86%. Comparing base (c5b0542) to head (bc364b2).
Report is 1 commits behind head on main.

Files Patch % Lines
linkml/generators/shaclgen.py 78.94% 5 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2111      +/-   ##
==========================================
- Coverage   79.86%   79.86%   -0.01%     
==========================================
  Files         110      110              
  Lines       12318    12355      +37     
  Branches     3507     3520      +13     
==========================================
+ Hits         9838     9867      +29     
- Misses       1884     1889       +5     
- Partials      596      599       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jsheunis jsheunis marked this pull request as ready for review May 16, 2024 10:34
else:
logging.error(f"No URI for type {rt.name}")

def _add_annotations(self, func: Callable, item) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be moved into schemaview (but doesn't need to happen for this PR)

Add a TODO comment
@jsheunis
Copy link
Contributor Author

jsheunis commented May 16, 2024

Regarding:

  • UPDATE: conflicts can arise when a slot has annotations, and if that same slot has a type as a range, where the type also has annotations. If some of these annotations have the same tag, the current way that the code is implemented means that the values will be appended, not replaced. It seems to me that the desired behaviour will differ based on the use case. So the question is whether more user-defined controls should be implemented to direct such behaviour?

I just came across a relevant example. Among other annotations, I want to assign sh:order to slots to define how they are represented in a user interface autogenerated from the exported SHACL. E.g.:

slot_usage:   
   distance:
      annotations:
         sh:group: mycurie:BasicPropertyGroup
         sh:order: 0

the output from gen-shacl will then be:

[ sh:datatype xsd:string ;
     sh:description "Distance of a thing from source location." ;
     sh:group mycurie:BasicPropertyGroup ;
     sh:nodeKind sh:Literal ;
     sh:order 0,
        27 ;
     sh:path mycurie:distance ],

i.e. the sh:order has two comma-separated values. This is because the code also assigns order to each slot that is being processed:

prop_pv_literal(SH.order, order)
order += 1

So, for my use case, I would want my annotation to overwrite whatever the code does internally.

I have solved this locally with the following patch:

diff --git a/linkml/generators/shaclgen.py b/linkml/generators/shaclgen.py
index dc804cb5..729d80f6 100644
--- a/linkml/generators/shaclgen.py
+++ b/linkml/generators/shaclgen.py
@@ -109,8 +109,6 @@ class ShaclGenerator(Generator):
                         g.add((pnode, p, Literal(v)))

                 prop_pv(SH.path, slot_uri)
-                prop_pv_literal(SH.order, order)
-                order += 1
                 prop_pv_literal(SH.name, s.title)
                 prop_pv_literal(SH.description, s.description)
                 if not s.multivalued:
@@ -185,6 +183,10 @@ class ShaclGenerator(Generator):
                     if s.annotations and self.include_annotations:
                         self._add_annotations(prop_pv, s)

+                if (pnode, SH.order, None) not in g:
+                    prop_pv_literal(SH.order, order)
+                    order += 1
+
                 ifabsent_processor.process_slot(prop_pv, s, class_uri)

         return g

@cmungall
Copy link
Member

Thank you!

yes, a limit of the current rdf testing is that it's incomplete when blank nodes are present.

Override is the correct behavior in the case of clashes. We'll want to implement this same behavior in other rdf-serializing generators.

It would be good to extend test_annotation in test_annotation_compliance. This wouldn't add anything that is not already in your test, but the compliance testing framework is becoming the de facto standard way to test one construct across multiple generators. I am happy for this to be done in a later PR though

@jsheunis
Copy link
Contributor Author

Does it make sense to make the SH.order-related patch highlighted above part if the PR as well? For me it does, but for my purposes I could also just apply the patch separately whenever I need to use shaclgen for that specific use case.

It would be good to extend test_annotation in test_annotation_compliance. This wouldn't add anything that is not already in your test, but the compliance testing framework is becoming the de facto standard way to test one construct across multiple generators. I am happy for this to be done in a later PR though

I can take a shot at this in a next PR 👍

@cmungall cmungall merged commit f9af01d into linkml:main May 29, 2024
14 checks passed
vincentkelleher pushed a commit to vincentkelleher/linkml that referenced this pull request Jun 5, 2024
…part of shacl shapes (linkml#2111)

* Add --include-annotations option for shaclgen

This is for the SHACL generator in response to linkml#1618.
Code is added to shaclgen.py to:
- allow users to specify the --include-annotations tag if they
want annotations (on classes, slots, and types) to be included
in the exported SHACL shapes
- determine the datatype of both annotation tag and value (a
CURIE is identified by searching for the ':' character)
- add the correct triples to the shacl output (to a nodeshape
for classes, and to a property shape for slots and slots with typesas ranges)

* fix linting

* Update snapshot data in 'test_scripts' after updating kitchen sink schema for shaclgen annotation tests

* Update shaclgen.py

Add a TODO comment

---------

Co-authored-by: Chris Mungall <[email protected]>
cmungall added a commit that referenced this pull request Jun 7, 2024
* Implement equals_string and equals_string_in

* Remove renaming§

* Add validation rules

* Add validation for equals_string and equals_string_in in schema loader

* Revert renaming

* Remove obsolete code

* Remove obsolete code

* Fix codespell errors

* Resolve flake errors

* Reforamt files

* Fix lint errors

* fix lint errors

* Add unit tests for equals_string and equals_string_in

* Make quality checks happy (#2136)

* Update poetry lockfile

* hotwo on deprecation

* `shaclgen`: Add `--include-annotations` option to let annotations be part of shacl shapes (#2111)

* Add --include-annotations option for shaclgen

This is for the SHACL generator in response to #1618.
Code is added to shaclgen.py to:
- allow users to specify the --include-annotations tag if they
want annotations (on classes, slots, and types) to be included
in the exported SHACL shapes
- determine the datatype of both annotation tag and value (a
CURIE is identified by searching for the ':' character)
- add the correct triples to the shacl output (to a nodeshape
for classes, and to a property shape for slots and slots with typesas ranges)

* fix linting

* Update snapshot data in 'test_scripts' after updating kitchen sink schema for shaclgen annotation tests

* Update shaclgen.py

Add a TODO comment

---------

Co-authored-by: Chris Mungall <[email protected]>

* Erdiagram include upstream (#2139)

* Include upstream classes into ERD diagram of selected entitites

Add docs for —include-upstream

* Fix unit test for Py3.9

* Update poetry lockfile

* Implement equals_string and equals_string_in

* Resolve flake errors

* fix lint errors

* Fix tests for equals_string_in feature

Signed-off-by: Vincent Kelleher <[email protected]>

* Fix gen shacl test

* Fix unit tests

* Reformat code

* Fix missing type

* Reformt

* Fix lint errors

* Fix lint errors

* Fix unti tests

* Format imports; ensure that tox and pre-commit agree on a ruff version

---------

Signed-off-by: Vincent Kelleher <[email protected]>
Co-authored-by: Anja Strunk <[email protected]>
Co-authored-by: Vlad Korolev <[email protected]>
Co-authored-by: cmungall <[email protected]>
Co-authored-by: Sierra Taylor Moxon <[email protected]>
Co-authored-by: Stephan Heunis <[email protected]>
Co-authored-by: Chris Mungall <[email protected]>
Co-authored-by: Vincent Kelleher <[email protected]>
Co-authored-by: anjastrunk <[email protected]>
vincentkelleher pushed a commit to vincentkelleher/linkml that referenced this pull request Jun 10, 2024
* Implement equals_string and equals_string_in

* Remove renaming§

* Add validation rules

* Add validation for equals_string and equals_string_in in schema loader

* Revert renaming

* Remove obsolete code

* Remove obsolete code

* Fix codespell errors

* Resolve flake errors

* Reforamt files

* Fix lint errors

* fix lint errors

* Add unit tests for equals_string and equals_string_in

* Make quality checks happy (linkml#2136)

* Update poetry lockfile

* hotwo on deprecation

* `shaclgen`: Add `--include-annotations` option to let annotations be part of shacl shapes (linkml#2111)

* Add --include-annotations option for shaclgen

This is for the SHACL generator in response to linkml#1618.
Code is added to shaclgen.py to:
- allow users to specify the --include-annotations tag if they
want annotations (on classes, slots, and types) to be included
in the exported SHACL shapes
- determine the datatype of both annotation tag and value (a
CURIE is identified by searching for the ':' character)
- add the correct triples to the shacl output (to a nodeshape
for classes, and to a property shape for slots and slots with typesas ranges)

* fix linting

* Update snapshot data in 'test_scripts' after updating kitchen sink schema for shaclgen annotation tests

* Update shaclgen.py

Add a TODO comment

---------

Co-authored-by: Chris Mungall <[email protected]>

* Erdiagram include upstream (linkml#2139)

* Include upstream classes into ERD diagram of selected entitites

Add docs for —include-upstream

* Fix unit test for Py3.9

* Update poetry lockfile

* Implement equals_string and equals_string_in

* Resolve flake errors

* fix lint errors

* Fix tests for equals_string_in feature

Signed-off-by: Vincent Kelleher <[email protected]>

* Fix gen shacl test

* Fix unit tests

* Reformat code

* Fix missing type

* Reformt

* Fix lint errors

* Fix lint errors

* Fix unti tests

* Format imports; ensure that tox and pre-commit agree on a ruff version

---------

Signed-off-by: Vincent Kelleher <[email protected]>
Co-authored-by: Anja Strunk <[email protected]>
Co-authored-by: Vlad Korolev <[email protected]>
Co-authored-by: cmungall <[email protected]>
Co-authored-by: Sierra Taylor Moxon <[email protected]>
Co-authored-by: Stephan Heunis <[email protected]>
Co-authored-by: Chris Mungall <[email protected]>
Co-authored-by: Vincent Kelleher <[email protected]>
Co-authored-by: anjastrunk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants