Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformation of trustyuris in sub-names yields unexpected result #6

Open
fkleedorfer opened this issue Oct 15, 2019 · 8 comments
Open

Comments

@fkleedorfer
Copy link

Transforming the follwoing content:

@prefix ex:<http://example.com/> .
@prefix sub:<http://example.com/myresource#> .

sub:graph1{
    ex:myresource a ex:TrustyUri .
    sub:graph1 ex:contains sub:part1 .
}

sub:graph2{
    sub:graph2 ex:contains sub:part2 .
}

in a file myresource.trig
using

 scripts/TransformRdf.sh myresource.trig http://example.com/myresource

yields

@prefix this: <http://example.com/myresource.RAg3eYqbagUqCN8-PgKHO9fGbp8aY65dMFiZIY_D4MYdc> .
@prefix sub: <http://example.com/myresource.RAg3eYqbagUqCN8-PgKHO9fGbp8aY65dMFiZIY_D4MYdc#> .
@prefix ex: <http://example.com/> .
@prefix sub1: <http://example.com/myresource#> .

sub:%23graph1 {
  ex:myresource.RAg3eYqbagUqCN8-PgKHO9fGbp8aY65dMFiZIY_D4MYdc a ex:TrustyUri .
  
  sub:%23graph1 ex:contains sub:%23part1 .
}

sub:%23graph2 {
  sub:%23graph2 ex:contains sub:%23part2 .
}

I'd expect the graph names and objects to be sub:graph[12] and sub:part[12]. The result is not incorrect, just not nicely readable and probably not what is intended.

@fkleedorfer fkleedorfer changed the title Transformation of trustyuris in graph names with unexpected result Transformation of trustyuris in sub-names yields unexpected result Oct 15, 2019
@tkuhn
Copy link
Member

tkuhn commented Oct 17, 2019

Yes, I agree that this is somewhat unexpected, but it's all according to design. This transformation takes the base URI (http://example.com/myresource) and all URIs that start with that sequence of characters and transforms all these URIs to include the hash at the end of the base URI. The remaining part is then added after the hash, with a character like # or ., depending on the context, as a separator in order to not make what follows look like part of the hash. In your case the chosen separator is #.

For http://example.com/myresource#graph1 then, the remaining part is "#graph1", which becomes %23graph1 after the separator. # is escaped as %23 because a URI can contain at most one hash character. We can't just drop it as it has to be different from what the URI http://example.com/myresourcegraph1 is transformed to (doesn't occur here but it might).

In your case, you could just not use the hash character in the initial file, and the output is as you'd expect it to be:

@prefix sub:<http://example.com/myresource> .

The URIs before transforming then look a bit strange, but if you generate the trusty version directly that doesn't really matter.

Alternatively you could also use http://example.com/myresource#instead of http://example.com/myresource as a base URI. But then the hash gets added after the # sign, which might not be what you want (this is the part that is not sent to the server at all for resolution, which means you can't redirect based on the hash for example). Also, the prefixes for pretty-printing are somehow not picked up correctly in this case, which makes the output look ugly, but this should probably be fixable.

There might be doable ways how your initial problem could be solved without URI clashes, but not sure it's worth it at this point...

@fkleedorfer
Copy link
Author

Thanks for the clarification!
I am not convinced it's correct to just replace the [baseuri] in all URIs - your remark about http://example.com/myresourcegraph1 being a case in point. Example:

@prefix ex:<http://example.com/> .
@prefix sub:<http://example.com/myresource#> .

sub:graph1{
    ex:myresource a ex:TrustyUri .
    ex:myresourceSomewhereElse a ex:NormalUri .              #<--- this line is new
    <http://example.com/myresource/SubDir> a ex:NormalUri .  #<--- this line is new
    sub:graph1 ex:contains sub:part1 .
}

sub:graph2{
    sub:graph2 ex:contains sub:part2 .
}

in a file myresource.trig
using

scripts/TransformRdf.sh myresource.trig http://example.com/myresource

yields

@prefix this: <http://example.com/myresource.RASJIH5T2nJs9nSyd9W4MMe4Chqfcg53u91vDQ5n_FB2Q> .
@prefix sub: <http://example.com/myresource.RASJIH5T2nJs9nSyd9W4MMe4Chqfcg53u91vDQ5n_FB2Q#> .
@prefix ex: <http://example.com/> .
@prefix sub1: <http://example.com/myresource#> .

sub:%23graph1 {
  ex:myresource.RASJIH5T2nJs9nSyd9W4MMe4Chqfcg53u91vDQ5n_FB2Q a ex:TrustyUri .
  
  sub:%23graph1 ex:contains sub:%23part1 .
  
  <http://example.com/myresource.RASJIH5T2nJs9nSyd9W4MMe4Chqfcg53u91vDQ5n_FB2Q#/SubDir>
    a ex:NormalUri .
  
  sub:SomewhereElse a ex:NormalUri .
}

sub:%23graph2 {
  sub:%23graph2 ex:contains sub:%23part2 .
}

Here, the external references
http://example.com/myresourceSomewhereElse
and
http://example.com/myresource/SubDir>
become internal nodes
http://example.com/myresource.RAQ3BGPr3SOLFBEkNzp3TZqFXs1FjzTU7NkELTGM2oN7c#SomewhereElse
and
http://example.com/myresource.RASJIH5T2nJs9nSyd9W4MMe4Chqfcg53u91vDQ5n_FB2Q#/SubDir
which breaks the references.

The remedy would be to replace baseuri with trustyuri in candidateuri only if it matches the whole candidateuri or candidateuri continues with a # . This would work as long as baseuri does not contain a #, which seems safe to assume as trustyuri is about making dereferencable URIs immutable - the fragment id is not part of the dereferencing process and can therefore be stripped from baseuri before the transformation.

This would also address the initial issue:
http://example.com/myresource#graph1>
would become
http://example.com/myresource.RASJIH5T2nJs9nSyd9W4MMe4Chqfcg53u91vDQ5n_FB2Q#graph1

@tkuhn
Copy link
Member

tkuhn commented Oct 18, 2019

The assumption is that you are under control of the base URI like http://example.com/myresource, because otherwise you shouldn't mind Trusty URIs for it in the first place, and so the assumption is that you shouldn't include URIs like http://example.com/myresourceSomewhereElse if you don't want them to be transformed in the way they are.

While writing this, I see that you could apply a similar argument for removing the %23 part in the example above.

But there can in any case be different rules and procedures for this, that all lead to valid trusty URIs and associated resources. So this is not about the core of the structure and valitidy of trusty URIs; these are just practical matters on how to generate them. In other words, we could add parameters to have different strategies for producing trusty URIs, without the need to touch the checking of trusty URIs at all.

A simple one could be to only transform the base URI and nothing else. Another one could be the one you sketched.

So, I guess in the end the question is: How important is this for you at this point? :) If it's important we could work on adding these parameters to achieve different transformations.

@fkleedorfer
Copy link
Author

fkleedorfer commented Oct 19, 2019

First off, I guess the issue here has never come up in the relevant scenarios so far, so it's really not a big thing - on the other hand, I also doubt that any application relies on the behaviour as it is now. If that's the case, it could just be changed.

However, in its current form it's equivalent to a tacit rule saying: "You have to make sure none of your URIs is the prefix of any other" (or we can guarantee for nothing) - and that's a tough thing to check. DBpedia, for example, would not be able to use trustyURIs in their current form. Consider:

http://dbpedia.org/resource/Star_Wars
http://dbpedia.org/resource/Star_Wars:_Episode_I_%E2%80%93_The_Phantom_Menace

The links to Episode 1 would be broken in the Star Wars content. While some people might actually be quite ok with that ;-), it illustrates that these collisions happen, probably more likely so for systems in which URIs carry semantics. It's fair to say that DBpedia controls the URI space, but each URI depends on user input and they would first have to do a prefix scan of all their URIs when minting a new one to avoid the issue. EDIT: they could use a special terminator char. That would definitely be simpler, but also weird.

I'm thinking about using trustyURIs for the webofneeds project, as you once proposed. In experimenting with it, I stumbled upon a few things, like the issue here, and also the complication with SHACL being incompatible with skolemization, which is mandatory for trustyURI content. If trustyURI were used for WoN, I'd need it without skolemization and with the prefix replacement as described above. As I haven't made up my mind yet, I wouldn't want to incur any work on your side, though.

@tkuhn
Copy link
Member

tkuhn commented Oct 28, 2019

The important point here is that whatever your base URI is, it's a temporary one anyway that will be transformed into the new trusty URI. So a new URI is minted, and there is no reason why the pre-trusty base URI should correspond to anything that is out there already. So if http://dbpedia.org/resource/Star_Wars might occur as a prefix of another URI then just don't use that URI as base URI (but for example http://dbpedia.org/resource/Star_Wars/).

I could add more features to control which URIs are transformed and how. But skolemization will stay. Trusty URIs won't support unskolemized blank nodes (we'll I am open to become convinced otherwise, but I don't see how that could happen at the moment). Skolemization is just the cleanest way (theoretically and practically) to deal with blank nodes.

@fkleedorfer
Copy link
Author

Concerning the skolemization: maybe our use case is a little different. I can think of reasons for skolemization in other use cases, for example, if you have an application that must be able to address any piece of information unambiguously and with minimal effort - that's just not our concern. For our use case, we just want the self-references to be "trusty" and that the trusty URI can be verified. And, it seems, we need the blank nodes for SHACL.

I guess making skolemization optional would incur a performance hit because you'd have to do the skolemization when verifying, and you probably don't want that for the existing applications, but maybe this would warrant another module for as-is RDF content except for self-references?

@tkuhn
Copy link
Member

tkuhn commented Oct 29, 2019

The main problem with blank nodes is not addressability but graph normalization, which is intractable in the general case when blank nodes are involved. So one would have to use a normalization algorithms that works reasonably well on graphs found in real data, but it will break (i.e. not terminate) for certain inputs.

@Crispae
Copy link

Crispae commented Jan 18, 2024

Hi, I am using FIP wizard to create a nanopublication for community template, there I was getting error mentioning TrustyURI.
The url I was putting, is just any github repo

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants