Implement a Blaze-PostgreSQL adapter #1543

davidmscholz · 2024-03-11T09:17:39Z

As stated in the documentation (https://github.com/samply/blaze/blob/master/docs/architecture.md), it should be possible to use PostgreSQL as the Resource Store for Blaze.

I think having that option would be very beneficial to us if it would enable users to perform SQL queries on the resources in order to more easily assess data quality in their Blaze instance.

On the other hand, using PostgreSQL as the Resource Store might come with decreased CQL-query performance. I imagine the performance hit wouldn't be too large for us since the datasets we (CCP) use are typically not very large (< 1GB) and are not expected to grow very rapidly.

Could we implement a Blaze-PostgreSQL adapter and compare the CQL-query performances on different datasets between two instances of Blaze? One instance would use RocksDB as the Resource Store as a gold standard, and the other would use PostgreSQL as the Resource Store.

alexanderkiel · 2024-03-11T13:52:19Z

Hi David,

it is perfectly possible to implement a Postgres resource store adapter. Blaze is designed to support more than one implementation. For example, the distributed storage variant uses Cassandra as resource store.

However I would not use the resource store for queries without the index. Let me explain that. In the resource store, blaze stores versions of resources by their content hash. That means that a resource with an update appears twice in the resource store and there is no way to tell which version is the current one. That information is only available in the index.

For data quality queries, I would suggest to use CQL directly. I have some examples in the CQL Documentation. More in depth data quality queries should be possible with CQL.

Another way, I could imagine is to implement SQL on FHIR in Blaze.

So at the end, I don't see a performance problem using Postgres as resource store, but the way Blaze stores resources in the resource store is not applicable for queries.

davidmscholz · 2024-03-11T18:54:54Z

Hi Alex, I didn't know about SQL on FHIR. That sounds definitely useful! However, would that mean that we would need to duplicate the Blaze contents in order to be able to analyze them using standard SQL tools?

If so wouldn't it be possible to "just" store some version-counter or timestamp together with the hash and FHIR resource upon insert when using Postgres as Resource Store?

alexanderkiel · 2024-03-12T07:54:32Z

SQL on FHIR should be implemented on top of the Blaze storage architecture, the same as I did with CQL. So no duplicate storage.

Why isn't CQL an option for the data quality queries?

davidmscholz · 2024-03-12T09:15:07Z

Very nice.
SQL is just what people are used to and the official CQL docs aren't great to be honest.

alexanderkiel · 2024-03-12T09:25:43Z

Yes ok, got the point with SQL. But implementing SQL on FHIR will not go fast, even if you like to do it. So can I help you with solving your data quality queries with CQL?

davidmscholz · 2024-03-12T09:47:18Z

Yes, I can imagine that implementing SQL on FHIR will not be accomplished overnight. Thank you for offering to help with the CQL queries. I have built a workaround solution for the time being. That was mainly motivated by other limitations that have nothing to do with Blaze. My main goal is to make the raw data more accessible to our users (bridgehead responsibles) so that they can more easily perform exploratory analyses themselves.

alexanderkiel · 2024-03-12T09:57:33Z

I would try to tech people CQL. We already did one such event at the TMF last autumn and we plan a new event for this year. So I would be interested in examples of data quality queries we can use for the lectures. In my opinion CQL has advantages over SQL for FHIR. It's also available in HAPI and Firefly Server. So I would suggest not to ignore it.

davidmscholz · 2024-03-12T11:26:44Z

I'd be interested in such a CQL event. Especially if it was possible to participate remotely since I currently don't have much time for travel. I will try to write down the most important questions and examples and send them to you.

alexanderkiel added the question Further information is requested label Mar 11, 2024

alexanderkiel self-assigned this Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a Blaze-PostgreSQL adapter #1543

Implement a Blaze-PostgreSQL adapter #1543

davidmscholz commented Mar 11, 2024 •

edited

alexanderkiel commented Mar 11, 2024

davidmscholz commented Mar 11, 2024 •

edited

alexanderkiel commented Mar 12, 2024

davidmscholz commented Mar 12, 2024 •

edited

alexanderkiel commented Mar 12, 2024

davidmscholz commented Mar 12, 2024

alexanderkiel commented Mar 12, 2024

davidmscholz commented Mar 12, 2024

Implement a Blaze-PostgreSQL adapter #1543

Implement a Blaze-PostgreSQL adapter #1543

Comments

davidmscholz commented Mar 11, 2024 • edited

alexanderkiel commented Mar 11, 2024

davidmscholz commented Mar 11, 2024 • edited

alexanderkiel commented Mar 12, 2024

davidmscholz commented Mar 12, 2024 • edited

alexanderkiel commented Mar 12, 2024

davidmscholz commented Mar 12, 2024

alexanderkiel commented Mar 12, 2024

davidmscholz commented Mar 12, 2024

davidmscholz commented Mar 11, 2024 •

edited

davidmscholz commented Mar 11, 2024 •

edited

davidmscholz commented Mar 12, 2024 •

edited