-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scheme-dependent and file-system dependent URI normalization to URIResolverRegistry #1944
Comments
Note that file name normalization is an "IO feature". We need to know all about the implementation parameters of the file system under the hood and not just the opaque location. There will be some normalization steps that can be done without knowledge of the filesystem implementation, but predicting which is which should not be up to the user. A general normalize function that works specifically for each scheme will be easier to implement than some parallel hierarchy. If a normalization scheme can not be implemented, we are probably missing a specific scheme identifier. For example |
Is your feature request related to a problem? Please describe.
There are many reasons for aliases in source locations:
These are semantic properties of file systems, not syntactic. It means that you have to have
an actively running filesystem with a file on it, to be able to know what the aliases are and how
they might be normalized.
Loc aliases are detrimental to downstream analysis in Rascal as
loc
are pretty much always usedas identities.
Describe the solution you'd like
I'd like an additional method to URIResolverRegistry:
normalize(ISourceLocation x)
,which would be implemented by dispatching to
ISourceLocationInput::normalize(ISourceLocation x)
via the scheme,and then making this available to Rascal users via
loc Location::normalize(loc l)
.This way the user is able to fix possible issues with aliasing easily, without having to consider every
different way files could be aliases. Also they are not forced to use it.
Maybe normalize should also replace logical schemes by physical schemes (since that is also a source of aliases). But the jury is still out on this.
Describe alternatives you've considered
There is something to be said for normalizing add location creation time, however there is not always
a file system available to normalize against. So this is impossible. It's better to let source locations remain
purely syntactical, and leave it to a
normalize
function to deal with the semantics of aliases.Additional context
Typically people run into these things with case sensitive file systems, but there are many ways to alias files. The more
we use Rascal for IO, and on different systems with different OSes and file systems, the more often we run into these issues.
Bad news
Implementing normalization is a lot of detailed research work for each scheme.
Good news
We might implement a default that does nothing, and start incrementally adding normalization. If we start with the
file
scheme, then we quickly saturize at 80% of all the schemes.The text was updated successfully, but these errors were encountered: