Parameterization #661

joepie91 · 2023-11-02T21:16:25Z

Is your feature request related to a problem? Please describe.
I'm currently evaluating Oxigraph for a project, but it seems to be missing a pretty important feature for dealing with untrusted data - query parameterization. In SQL implementations, this is a widespread feature nowadays, allowing for dynamically specified values to be provided separately from the query itself, so that malicious actors cannot modify the query structurally by specifying maliciously-formatted values.

Describe the solution you'd like
A way to specify placeholder bindings in the query that parameters can then be separately specified for, in such a way that the parameter values are guaranteed to never go through a SPARQL parser (ie. there is an entirely separate processing path from specification to query execution, with no string concatenation/formatting step inbetween). Both for library use, and for the server implementation (Stardog's approach may be useful here).

Describe alternatives you've considered

String-concatenation into queries before sending them off: Prone to SPARQL injection, even with the appropriate escaping (as escaping query values is notoriously difficult to get right).
Never allowing dynamic values: Not viable for many applications, as especially in a linked-data environment, it is highly likely that you will be dealing with untrusted third-party data.

Additional context
Many other triplestores, like Stardog, Jena, or dotNetRDF already seem to implement such functionality. It also seems to have been considered for standardization at some point, but it's not clear to me how that played out in the end.

Tpt · 2023-11-03T08:25:55Z

Hi! That's a great point. Parameterization is indeed not implemented yet (and not standardized yet, there are two conflicting approaches...).

String-concatenation into queries before sending them off: Prone to SPARQL injection, even with the appropriate escaping (as escaping query values is notoriously difficult to get right).

There is already a fairly simple solution: use the Oxigraph-provided RDF term objects (NamedNode, BlankNode and Literal) which does already validation (no invalid data) and proper escaping on serialization (the default display/print of these objects is compatible with SPARQL). But indeed, a parametrization feature would be great to have.

joepie91 · 2023-11-03T15:07:30Z

Hi, thanks for the quick response :)

Parameterization is indeed not implemented yet (and not standardized yet, there are two conflicting approaches...).

If the lack of standardization is a concern for implementation, perhaps a solution might be to make it require specifying an "unstable, might break in the future" flag? Something like unstable_parameters_might_break or so. So that the functionality is there for those who need it, at the cost of maybe needing to change their code down the line, and still making it possible to implement it in a standards-compliant manner once a standard is decided upon without being burdened by existing uses.

There is already a fairly simple solution: use the Oxigraph-provided RDF term objects (NamedNode, BlankNode and Literal) which does already validation (no invalid data) and proper escaping on serialization (the default display/print of these objects is compatible with SPARQL).

Ah, I hadn't noticed that that was an option. I'm not sure if it's mentioned in the documentation - I may have overlooked it.

I assume this would involve using a TripleRef to represent and serialize the entire triple, for eg. storing new triples? Or would it be no different to manually concatenate the NamedNode/BlankNode/Literal values?

An additional issue would be that if using oxigraph_server rather than the Rust bindings, there does not seem to be a way to access these types through the API - would it be possible to expose (again, possibly behind an unstable flag) some endpoint that allows for encoding values in the prescribed manner via those types, if a full-blown parameterization implementation is not viable?

(The reason I'm looking to use oxigraph_server is that the WASM implementation does not seem to support persistent databases on disk yet, which I need for my usecase, and I'll probably end up writing N-API bindings at some point but that would take quite a while)

Tpt · 2023-11-04T08:57:27Z

I assume this would involve using a TripleRef to represent and serialize the entire triple, for eg. storing new triples? Or would it be no different to manually concatenate the NamedNode/BlankNode/Literal values?

Both works. The serialization of TripleRef is the subject-predicate-object concatenation with spaces between them.

I would tend to think that parametrization is possible. The way I would implement it is follow the SPARQL-dev substitution proposal. It is the approach that seems the best to me (easy to implement because it follows how EXISTS work + is performance-friendly). We could maybe name the substitution parameter something like "subtitute-variable" or something like this, this way it's cristal clear what is happening and there is very small chance SPARQL would standardized on the same name.

joepie91 added the enhancement New feature or request label Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameterization #661

Parameterization #661

joepie91 commented Nov 2, 2023

Tpt commented Nov 3, 2023

joepie91 commented Nov 3, 2023

Tpt commented Nov 4, 2023

Parameterization #661

Parameterization #661

Comments

joepie91 commented Nov 2, 2023

Tpt commented Nov 3, 2023

joepie91 commented Nov 3, 2023

Tpt commented Nov 4, 2023