Issue querying the Bioregistry SPARQL endpoint from various triplestores #775

vemonet · 2023-03-17T19:37:54Z

This issue follows up on #686 and #773

I tried to run federated queries to the new Bioregistry SPARQL endpoint from various triplestores using a simple SPARQL query:

PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?o WHERE {
    SERVICE <https://bioregistry.io/sparql> {
        <http://purl.obolibrary.org/obo/CHEBI_24867> owl:sameAs ?o
    }
}

From OpenLink Virtuoso

From a Virtuoso triplestore v7.2.9: https://bio2rdf.org/sparql (latest version of open source virtuoso) we get the following response:

Virtuoso RDFZZ Error DB.DBA.SPARQL_REXEC('https://bioregistry.io/sparql', ...) returned Content-Type 'text/html' status 'HTTP/1.1 200 OK
'
{"results": {"bindings": [{"o": {"type": "uri", "value": "https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:24867"}}, {"o": {"type": "uri", "value": "http://bioregistry.io/CHEBI:24867"}}, {"o": {"type": "uri", "value": "http://bioregistry.io/CHEBIID:24867"}}, {"o": {"type": "uri", "value": "http://bioregistry.io/ChEBI:24867"}}, {"o": {"type": "uri", "value": "http://bioregistry.io/chebi:24867"}}, {"o": {"type": "uri", "value": "http://identifiers.org/CHEBI/24867"}}, {"o": {"type": "uri", "value": "http://identifiers.org/CHEBI:24867"}}, {"o": {"type": "uri", "value": "http://identifiers.org/chebi/CHEBI:24867"}}, {"o": {"type": "uri", "value": "http://n2t.net/chebi:24867"}}, {"o": {"type": "uri", "value": "http://purl.obolibrary.org/obo/CHEBI_24867"}}, {"o": {"type": "uri", "value": "http://www.ebi.ac.uk/chebi/displayImage.do?defaultImage=true&imageIndex=0&chebiId=24867"}}, {"o": {"type": "uri", "value": "http://www.ebi.ac.uk/chebi/searchId.do?chebiId=24867"}}, {"o": {"type": "uri", 

SPARQL query:
define sql:big-data-const 0
#output-format:text/html
define sql:signal-void-variables 1
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?o WHERE {
    SERVICE <https://bioregistry.io/sparql> {
        <http://purl.obolibrary.org/obo/CHEBI_24867> owl:sameAs ?o
    }
}

It seems like the query is well processed, but the results are sent with the wrong content-type (text/html)

From Ontotext GraphDB

GraphDB 10.1.0 using RDF4J 4.2.0: https://graphdb.dumontierlab.com/repositories/test

Error 500: Internal Server Error
Query evaluation error: <!doctype html>
<html lang=en>
<title>400 Bad Request</title>
<h1>Bad Request</h1>
<p>The browser (or proxy) sent a request that this server could not understand.</p> (HTTP status 500)

From Blazegraph

Not sure which version: http://kg-hub-rdf.berkeleybop.io/blazegraph/sparql

Server Error (#500)

SPARQL-QUERY: queryStr=PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?o WHERE {
    SERVICE <https://bioregistry.io/sparql> {
        <http://purl.obolibrary.org/obo/CHEBI_24867> owl:sameAs ?o
    }
}
java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.openrdf.query.QueryEvaluationException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.Exception: task=ChunkTask{query=48bf76d9-daf6-452b-a5b6-62e11c08cab6,bopId=1,partitionId=-1,sinkId=2,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: com.bigdata.rdf.sail.webapp.client.HttpException: Status Code=400, Status Line=BAD REQUEST, Response=<!doctype html>
<html lang=en>
<title>400 Bad Request</title>
<h1>Bad Request</h1>
<p>The browser (or proxy) sent a request that this server could not understand.</p>

I think it's mostly related to the content-types they are expecting

The text was updated successfully, but these errors were encountered:

vemonet · 2023-03-18T13:07:22Z

Hi @cthoyt, I realized after writing vemonet/rdflib-endpoint#8 that you were talking about integrating the curie mapping endpoint to the bioregistry flask app 😅

I implemented the use of rdflib-endpoint to serve the SPARQL endpoint in place of Flask in curies, and deployed the flask app + rdflib-endpoint in the bioregistry app. We just need to change some of the params that I passed as string to use proper variables

You can find the changes done in the branch add-rdflib-endpoint-for-mappings on my fork of bioregistry and curies:

Let me know if it fits your requirements, I did not see any impact on performance when serving the flask app locally through FastAPI

Using SparqlEndpoint should solve SERVICE queries from most triplestores, and a YASGUI interface will be automatically served to users accessing /sparql through the browser

I am facing issues when deploying with the current gunicorn config though, I think you will need to change the workers class to gunicorn -k uvicorn.workers.UvicornWorker so I currently added a quick fix to the CLI option to start the web app with uvicorn in development (it gets fast hot reload, which is really convenient when developing).

rdflib-endpoint was added to the fastapi optional dependencies in curies, not sure if this is the right place to put it!

I also implemented the custom processor in SparqlEndpoint, so the queries with values on left join are working:

PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT REDUCED * WHERE {
    ?child owl:sameAs ?child_mapped .
}
VALUES (?child) {
    (<http://purl.obolibrary.org/obo/CHEBI_1>) 
    (<http://purl.obolibrary.org/obo/CHEBI_2>)
}

I'll clean up the code and send a pull request if this implementation works for you

cthoyt · 2023-03-18T13:11:09Z

Let's plan to chat on monday morning - I want to make sure any changes in curies are supported for both flask and fastapi, both on the blueprint/router level and explicitly not by making full fledged apps. This needs to be possible to easily mount on existing apps (whether they're in flask or fastapi), and I don't want to spend time on the minutiae of gunicorn/uvicorn/etc

vemonet · 2023-03-18T13:23:55Z

Yes, let's do this on monday

This needs to be possible to easily mount on existing apps (whether they're in flask or fastapi)

It is possible with FastAPI :) you can mount any thing on it as far as I know, then just serve with uvicorn/gunicorn (FastAPI just leverage existing standard to describe your API, it's quite well built). I even serve pre-compiled React progressive web apps with it in production without issues!

I am not sure it is possible to do with Flask though (could be, but I did not find anything in my searches)

The switch to use FastAPI adds just 2 clear lines of code in bio2registry, and does not loose any existing capabilities of the webapp (to be tested more though!)

and I don't want to spend time on the minutiae of gunicorn/uvicorn/etc

For serving gunicorn/uvicorn etc is quite simple, and you have already done 99% of the job since it is already served through gunicorn, so you just need to enable to use the uvicorn worker class in your gunicorn setup

@vemonet

Closes biopragmatics/bioregistry#775. This PR adds handling of headers to both the Flask and FastAPI implementations of the apps. - [x] Add Flask implementation - [x] Add FastAPI implementation - [x] Add Flask tests - [x] Add FastAPI tests - [ ] Should the `output` parameter be supported? CC @vemonet. Ideally, I'd like to use https://github.com/vemonet/rdflib-endpoint and not re-implement this code, but we'll have to work through a few issues first (improving code modularity, documentation, and figuring out Flask suppot) before I can give that a try

cthoyt · 2023-03-18T13:47:07Z

I implemented a more principled approach for handling content types in biopragmatics/curies#46 and improved response types, but I will re-open this since there are other solutions possible.

@vemonet how about 13.00 CET on monday? i'll email you a zoom link

cthoyt · 2023-03-18T14:04:52Z

I think https://flask.palletsprojects.com/en/2.2.x/patterns/appdispatch/#combining-applications might be appropriate for mounting fastapi on to flask

vemonet · 2023-03-18T17:24:50Z

Interesting, that might be a solution but will probably require some additional patching, because FastAPI is ASGI, and this is for WSGI apps

This question seems to contain some interesting remarks: https://stackoverflow.com/questions/68769247/how-do-i-write-an-asgi-compliant-middleware-while-staying-framework-agnostic

cthoyt · 2023-03-26T09:24:27Z

As of #780, this appears to be fixed 🚀

vemonet changed the title ~~Issue querying the Bioregistry SPARQL endpoint from a Virtuoso triplestore~~ Issue querying the Bioregistry SPARQL endpoint from various triplestores Mar 17, 2023

cthoyt mentioned this issue Mar 18, 2023

Better handling of content types biopragmatics/curies#46

Merged

5 tasks

cthoyt closed this as completed in biopragmatics/curies#46 Mar 18, 2023

cthoyt reopened this Mar 18, 2023

cthoyt added the website label Mar 18, 2023

cthoyt closed this as completed Mar 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue querying the Bioregistry SPARQL endpoint from various triplestores #775

Issue querying the Bioregistry SPARQL endpoint from various triplestores #775

vemonet commented Mar 17, 2023 •

edited

Loading

vemonet commented Mar 18, 2023

cthoyt commented Mar 18, 2023

vemonet commented Mar 18, 2023 •

edited

Loading

cthoyt commented Mar 18, 2023

cthoyt commented Mar 18, 2023

vemonet commented Mar 18, 2023

cthoyt commented Mar 26, 2023

Issue querying the Bioregistry SPARQL endpoint from various triplestores #775

Issue querying the Bioregistry SPARQL endpoint from various triplestores #775

Comments

vemonet commented Mar 17, 2023 • edited Loading

From OpenLink Virtuoso

From Ontotext GraphDB

From Blazegraph

vemonet commented Mar 18, 2023

cthoyt commented Mar 18, 2023

vemonet commented Mar 18, 2023 • edited Loading

cthoyt commented Mar 18, 2023

cthoyt commented Mar 18, 2023

vemonet commented Mar 18, 2023

cthoyt commented Mar 26, 2023

vemonet commented Mar 17, 2023 •

edited

Loading

vemonet commented Mar 18, 2023 •

edited

Loading