Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for "endpoint --patch-db-metadata" #571

Open
Aklakan opened this issue Nov 17, 2022 · 2 comments
Open

Add support for "endpoint --patch-db-metadata" #571

Aklakan opened this issue Nov 17, 2022 · 2 comments

Comments

@Aklakan
Copy link

Aklakan commented Nov 17, 2022

Description

If a db-metadata file is supplied it seems that ontop uses it as its only source of information.
This feature request is about adding something like a --path-db-metadata option such that ontop uses its default metadata extraction as a default unless overridden by the "patch" metadata. This would make it more easy to e.g. externally declare unique constraints for e.g. database views. Right now one has to manually extract and patch the metadata (which may fail #570).
Also, for completeness, the current approach results in a null pointer exception (NPE) if metadata is missing in the file specified for --db-metadata:

Steps to Reproduce

  • Create a database with a simple table:
CREATE TABLE "mytable"("id" int not null);  -- Note: Not declared as unique
INSERT INTO "mytable" VALUES (1);
  • Create the file mapping.r2rml.ttl
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix eg: <http://www.example.org/> .

[ a rr:TriplesMap ;
  rr:logicalTable        [ rr:tableName "\"mytable\"" ] ;
  rr:subjectMap          [ rr:template  "https://www.example.org/{id}" ] ;
  rr:predicateObjectMap  [ rr:predicate eg:value ; rr:objectMap [ rr:column "\"id\"" ] ] ;
] .
  • Create an empty database metadata file empty.json:
{}
  • Works: ./ontop endpoint -p mydb.local.ontop.properties -m mapping.r2rml.ttl

  • Fails with NPE: /ontop endpoint -p mydb.local.ontop.properties -m mapping.r2rml.ttl --db-metadata empty.json

Caused by: java.lang.NullPointerException
	at it.unibz.inf.ontop.dbschema.impl.JsonSerializedMetadataProvider.<init>(JsonSerializedMetadataProvider.java:51)
	at it.unibz.inf.ontop.dbschema.impl.JsonSerializedMetadataProvider.<init>(JsonSerializedMetadataProvider.java:42)
	at it.unibz.inf.ontop.dbschema.impl.JsonSerializedMetadataProvider$$FastClassByGuice$$54679058.GUICE$TRAMPOLINE(<generated>)
	at it.unibz.inf.ontop.dbschema.impl.JsonSerializedMetadataProvider$$FastClassByGuice$$54679058.apply(<generated>)
  • Desired: /ontop endpoint -p mydb.local.ontop.properties -m mapping.r2rml.ttl --patch-db-metadata override.json
    Apply specified overrides whenever applicable with e.g. override.json declaring the id column as unqiue:
{
  "relations": [{
    "uniqueConstraints" : [ {
      "determinants" : [ "\"id\"" ]
    }],
    "name" : [ "\"mytable\"" ]
  }]
}

(Obviously this makes more sense if "mytable"; I hope the example is sufficiently clear).

Attached material

Versions

Ontop CLI 4.2.1, Postgres Ubuntu 14.6-1.pgdg22.04+1

Additional Information

Any additional information, configuration or data that might be necessary to reproduce the issue.

@bcogrel
Copy link
Member

bcogrel commented Nov 17, 2022

Thanks @Aklakan for sharing your idea.

Since 4.2.1, we started to support incomplete JSON metadata file when the option ontop.allowRetrievingBlackBoxViewMetadataFromDB is enabled.

I guess the NPE in your case comes from the fact that the JSON document is an empty JSON object. We need to throw a better exception in that case.

Currently we have 2 alternatives for adding unique constraints:

  1. Create an lens (initially called an "Ontop view"): https://ontop-vkg.org/guide/advanced/views
  2. Provide a "constraint file" (legacy mechanism) like here

The patching is another alternative, I don't how maintainable it could be.

By patching, do you mean overriding the full definition of a relation or specifying the unique constraint to add (incremental update)? If it is the first case, it should a priori already work with the option mentioned above turned on.

Best,
Benjamin

@Aklakan
Copy link
Author

Aklakan commented Nov 17, 2022

Hi @bcogrel, the NPE seems to always appear on access of non-existing metadata - it also happens if the json is non-empty but does not provide an entry for "mytable".

We need to throw a better exception in that case.

Exactly, it's just a minor issue but that would be the solution :)

By patching, do you mean overriding the full definition of a relation or specifying the unique constraint to add (incremental update)?

Overall I'd think that the main use case is for incremental update, i.e. supplying uniqueness and not null constraints in cases where those cannot be inferred from the database objects - most prominently view tables. I'd think that this approach might be more lightweight than lenses while still giving the desired performance (e.g. prevent DISTINCTs from being injected). I doubt there is much point in completely replacing JSON objects such as the table definition - because then the patch would have to mention all columns over again. If there really is a use case for replacing the whole table definition then maybe a json flag such as "replace": true could be supported.
Maybe "patch" is bad choice of wording - something like override or overlay might be more accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants