-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execute a SPARQL basic federated query as if it was a constant to be … #365
base: version4
Are you sure you want to change the base?
Execute a SPARQL basic federated query as if it was a constant to be … #365
Conversation
…propegated like a values clause
So this is the most basic idea, regarding federated queries. At parse time we invoke the basic federated query and use the results of to populate a BindingSetAssignment. Which is then fed into the normal query machinery. This won't scale well as these results sets can be humongous and lead to all kinds of out of resource exceptions. |
…s. Fix some implementation issues, regarding ordering consistency
So the next step would be to execute this in advance but at a different point, insert the remote query results into a temporary table and join on that. Then for select databases add a function that would do the federated query. Which would be faster in the common cases. Either way seems like a bit of an issue on layer violations so would love to discuss how to go about doing it best in the ontop team opinion. |
} | ||
} | ||
} | ||
Set<BindingSet> bs = new LinkedHashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be a List instead of a Set?
In principle, the SPARQL subquery could return duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I always assumed it should be a set because the sparql-11-federation mentions "the multiset of solution mappings corresponding to the results of executing query".
Let me confirm that for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiset is another name for bag, it accepts duplicates (unlike sets)
Hi Jerven, Thanks for sharing this interesting work! The first solution makes sense to me as a first implementation. At the moment, the BindingSetAssignment will be translated into a large union, which is obviously not very efficient. However, we started to work on expanding our internal algebra to an in-memory table (called ValuesNode, see https://github.com/ontop/ontop/tree/feature/values-node), which should help. In terms of integration, I think I would prefer to have it disabled by default, so as to make sure the endpoint administrator is well aware of the presence of this feature. As you said, this feature could have a significant impact on the performance of the endpoint and on its dimensioning. We could return an explicit error message explaining how to turn on this feature to users issuing a query with a SERVICE clause. We would also need to set up a query timeout when specified. The second solution based on a temporary table is interesting but definitely challenging. Here are a few points to consider:
So yes, definitely, there is a bit of layer violations. I am curious to see what we can expect to get in terms of performance. Best, |
Yes. postresql can call rest/http in stored procedures so we should be able to do the same to fetch a sparql query.
I think there are ways around this. We know which IRI patterns needs to match in the next result. We could make a temporary table like this. e.g. we do a federated query like. ...
WHERE
{
SERVICE <http://example/sparql> {
?ex a ?type .
}
?ex a ex:OurType .
} We have a mapping that says on our side. [] rr:subjectMap [ rr:template "http://example.org/ours/{id}" ; rr:class ex:OurType ] . We can generate a temp table with three columns. The first the result, the second if it matches the template as a funtion/virtual, the third the template decomposed as a funtion/virtual column.
Indeed. Probably ways around it, but would require some experimentation.
I think going for the stored procedures will be more successful (performance wise and stability wise)
|
Ok, I better see, thanks. The second solution seems feasible but quite involved. If I understand correctly, Ontop would propagate the structural constraints coming from the mapping, such as the IRI templates, to the SPARQL subqueries and their corresponding temporary tables. Ease of deployment would be in my view a crucial aspect for the success of this solution. I have a limited experience with stored procedures, let's see how it will go. Best, |
FYI. I won't have time to work on this for quite a while (last week was Elixir European Biohackathon) but to call a sparql endpoint in a function from postgresql would depend on the basic http/rest call idea as shown in this stack overflow answer |
I have not had time to work on this, and it looks unlikely I will :( |
…propegated like a values clause