Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making live-queries (ie. subscription-lds) more efficient/scalable #725

Open
4 of 6 tasks
Venryx opened this issue Mar 19, 2021 · 17 comments
Open
4 of 6 tasks

Making live-queries (ie. subscription-lds) more efficient/scalable #725

Venryx opened this issue Mar 19, 2021 · 17 comments

Comments

@Venryx
Copy link

Venryx commented Mar 19, 2021

Feature description

The live-query feature is impressive, but it appears to be lacking a number of optimizations needed for high scalability.

I know that most functionality that's achieved through live-queries can be approximated through use of the more efficient TRIGGER/NOTIFY subscriptions; however, the live-query approach is much more "elegant" in my view, and it requires less "wiring" to set up each time.

Because live-queries are not a high-priority feature for most developers, I expect that I will need to develop most of the optimizations myself. This will probably be a lot of work, but I'd like to give a try at it, as if I manage it, I can then benefit from the optimizations in all my future projects. (as well as the ~half dozen projects I currently have on Firestore that I plan to migrate)

This thread is my attempt to list those potential optimizations, discuss them, and eventually attempt to develop support for them. (either by enhancing the existing subscription-lds plugin, or creating a new plugin that sits on top of it)

Motivating example

What motivated this feature request for me, is that I am attempting to transition a project from Google Firestore to Postgraphile + Postgres, and its codebase is heavily reliant on the live-query functionality that Firestore supports.

I'm not in a rush to complete the transition (the current site works fine for now), but I would like to gradually extend Postgraphile until its live-query support can scale to a level comparable to Firestore. (granted, it will likely be more expensive and require horizontal-scaling/dedicated-live-query-servers, but that's acceptable)

Like mentioned, most developers seem to get by with the simpler TRIGGER/NOTIFY subscriptions. However, I think if the live-query functionality were optimized enough that you could use it on larger production sites, a lot more developers would be interested in using it.

From what I understand, Firebase/Firestore and Baqend/InvaliDB are the only two database solutions which have thoroughly
scalable live-queries
at the moment. And as Firestore and InvaliDB are both proprietary, this makes scalable live-query support in Postgraphile particularly desirable.

The Medium article above, written by Wolfram Wingerath (the developer of InvaliDB's live-query solution), explains what some of those necessary optimizations are. (beyond the Medium article, he's also written papers on the subject, which describe the challenges and solutions in greater depth)

Breaking changes

It depends on the implementation details. The optimizations will likely require some architectural extensions (eg. the concept of "partitioned live-query servers"), but most of these changes should be able to be implemented in external packages, rather than the subscription-lds and lds packages themselves. (For now, I'll just maintain an independent fork with the changes needed.)

Supporting development

I [tick all that apply]:

  • am interested in building this feature myself
  • am interested in collaborating on building this feature
  • am willing to help testing this feature before it's released
  • am willing to write a test-driven test suite for this feature (before it exists)
  • am a Graphile sponsor ❤️
  • have an active support or consultancy contract with Graphile

My next post will attempt to describe what some of the desired optimizations are.

@Venryx
Copy link
Author

Venryx commented Mar 19, 2021

Desired optimizations:

1) Cache subscription-query results, and apply row-changes to its results directly (ie. mutating the cache)

Rather than re-running the full query for a collection each time a row changes, instead read the change-data within the replication-stream entry, find the subscription-queries that are affected by the change, and directly apply those changes in memory.

Of course, this is not possible for every query (eg. if the query depends on computed-values from functions), but it is possible for a large portion of queries, and -- at least in my project -- that subset is worth the effort to support. (in fact, for my current codebase, 100% of the live-queries should be supportable in this way, due to the much less expressive querying that Firestore supports)

Some posts/repos that are relevant:

Granted, this caching system is expected to add complications with regard to permissions; you have to make sure that the cache is invalidated when the user's permissions change, otherwise it may result in unauthorized data leakage/persistence. The InvaliDB papers discuss this issue, and the solution it uses. (two of the papers: short version, long version)

The permission issue is likely an area where the solution settled on will not work universally (due to the flexibility of the Postgres RLS system). However, if the solution found can handle the use-cases of 80+% of projects/queries (for example, by declaring certain rules the developer has to follow for reliable usage), it can still be useful.

Admittedly, this non-universality will likely keep the advanced live-query system in the domain of a third-party addition, rather than something in the core; this thread is still relevant, however, in that it can guide discussion on what "primitives" are necessary within the subscription-lds and lds packages in order to support a third-party solution.

2) Batch equivalent subscriptions from multiple users

Given the above caching system, it's expected that there will be many concurrent queries amassed, for systems with many users. This implies significant overhead, as each user (and for each subscribed collection) will require their own "query match" check, when a row is changed.

To help reduce this query-matching overhead, it is important to be able to merge equivalent subscriptions wherever possible. For example, if there are 1000 users and each subscribes to a shared list of events for the month, we want only one subscription made instead of 1000.

Again, this functionality requires that the permission system be constrained enough that the system can meaningfully batch subscriptions without breaking security rules. This means that the solution will not work for every project (at least not without certain restrictions or hints being added). While this makes it problematic to include within Postgraphile proper, the optimization can be very beneficial for projects (or portions of them) where maximum performance is more important than RLS's full flexibility, and the developers are willing to use a third-party solution for it.

3) Add partitioning system, so the query-matching itself doesn't overload the live-query servers

As a project scales, the number of concurrent users will make it infeasible for a single server to manage the subscriptions for every client (assuming that they access different parts of the database, and/or that the user count is so high that merely pushing out the deltas of shared tables to everyone is too much). Thus, it will be necessary to support having multiple live-query servers running, each handling a subset of the subscriptions.

That's not enough though; you have to also partition the notifications of changes to each live-query server, so that the query-matching itself does not create a bottleneck.

The fully horizontally-scalable solution (with multiple servers, partitioned by both query and object-id) is briefly described and visualized here.

4) When a subscription-query's result changes, send only the delta to the client, rather than the whole list

This may require holding a buffer of recent changes, to reliably send it to all of the subscribed clients. This would have some overhead; but, one way or another, Firestore and InvaliDB have proven that the delta-sending approach is workable. (I've confirmed with chrome's dev-tools that Firestore only sends the deltas, and I know the same is true for InvaliDB.)

It also requires a change on the client side, to be able to receive the "delta" and apply it to its own in-browser cache. (it's possible there exists an Apollo library for this sort of thing that could be utilized)

That said, this optimization still seems to be the easiest of the four to implement, as it doesn't require special permission checks, nor does it require communication between the database server and "live query" servers.

It's also arguably the most relevant for early usage, as you're going to see immediate reductions in network bandwidth/costs (eg. by 99%, for single-row changes to a 100 item collection that has live-query subscriptions), versus performance limits you'd merely hit later down the road (relating to query-matching and such within your servers, as optimized by entries 1-3).


All in all, there's quite a lot of work required to get the performance of Postgraphile's live-queries on par with that of Firestore or InvaliDB. Realistically, I will probably give up before I implement all four. 😅 That said, even if only one or two are managed to be implemented, they should still provide some meaningful performance benefits -- perhaps enough to make live-queries more generally usable.

Anyway, if anyone's interested in taking on one or more or these tasks, let me know and I'll focus on the others.

In the meantime, I'll look into what steps are required for implementing optimization 4. (since it seems the most straightforward)

@benjie
Copy link
Member

benjie commented Mar 19, 2021

It also requires a change on the client side, to be able to receive the "delta" and apply it to its own in-browser cache. (it's possible there exists an Apollo library for this sort of thing that could be utilized)

It's been clear to me for a while that the main issue with live queries as they currently stand is that they send the entire response rather than a delta. This prevents a tonne of optimisations, for example we could easily say "user 21's username is now 'Flibble'" but if that was a subset of a larger query then we can't send the rest of the data without recalculating it for the various reasons you outline (and doing so may "roll back" changes that the client's cache has already received due to mutations they've performed since the live query started). This is also clear to others working on live queries in the GraphQL ecosystem, and there's even beginning to be discussions around using JSON Patch for this. I very much encourage you to engage with the other people working on this problem in the GraphQL ecosystem so that we end up with a standard live queries solution rather than a vendor-specific one; I build live queries in the way that I did for PostGraphile because it maximized compatibility - all GraphQL clients that support GraphQL subscriptions automatically support PostGraphile live queries - but the trade-offs are heavy.

This is great stuff; I should warn you that what I'm working on for V5 may invalidate work that you might take on attempting to fix optimizations 1-3 in V4; but optimization 4 will be beneficial for both (and for the GraphQL ecosystem in general!) so I definitely encourage you to work on that.

@Venryx
Copy link
Author

Venryx commented Mar 19, 2021

Ah yes, good point; I recall reading some discussions in the GraphQL repos relating to live queries and JSON patches. I'll look for those discussions again, and link the relevant ones below. (will update this post as I find them)

Relevant discussions:

Relevant repos:

I'll also keep in mind your note about there being changes in V5 that may affect optimizations 1-3.

@Venryx Venryx mentioned this issue Mar 19, 2021
6 tasks
@Venryx
Copy link
Author

Venryx commented May 16, 2021

I'm starting work on trying to integrate the graphql-live-query-patch library (and its related packages) with Postgraphile.

The library is designed to make live-query result-streaming more efficient by only sending json patches of the "deltas" between the result set, each time it changes. (so optimization no. 4 above)

The integration instructions, on the server side, say to wrap/run a function on the results of the graphql execute function.

However, I do not know how to do this for Postgraphile.

I did a search in the postgraphile repo for uses of the graphql execute function: https://github.com/graphile/postgraphile/search?q=execute

This search result, on line 87 of liveSubscribe.ts, seems like it may be one of the places where the call/result needs to be wrapped.

However, I do not know what the recommended approach is for adding that wrapping/intercepting.

Should I:

  1. Fork Postgraphile, and modify the code directly.
  2. Use some sort of "trick" to modify the Postgraphile (or graphql) code, without needing to maintain a fork. (eg. patch-package tool, find-and-replace plugin for webpack/snowpack, etc.)
  3. Look for some sort of hook/plugin API that would let me intercept the execute function's results.
  4. Try to figure out a way to use express middleware code to intercept the full result-set being sent to the client, to apply the graphql-live-query-patch behavior after Postgraphile has finished processing it?

No. 3 seems ideal to me, but I don't currently know of any "hooks" in Postgraphile that do what I need. (wrapping/intercepting the result of the graphql execute function)

@benjie
Copy link
Member

benjie commented May 16, 2021

Please add a PR that adds just a new pluginHook call wherever you need it; see the pluginHooks.ts file for existing hooks.

(You may want to experiment via a fork before doing this, but having your project run via a plugin means you won’t require my authorisation to get changes merged which means you can probably advance it faster/further and release new major versions over time that don’t have to sync with PostGraphile’s majors.)

@Venryx
Copy link
Author

Venryx commented May 16, 2021

Taking a closer look, it seems like the least-intrusive way to integrate the functionality needed would be to let the user supply a custom value for the subscribe field of the websocket-server configurations in subscriptions.ts.

I noticed that there are two different websocket-server options:

Anyway, it looks like letting the user supply a custom subscribe field/function -- which is then sent to the websocket-server selected above -- is all that's needed (on the server side) for integrating the graphql-live-query-patch library, as my plugin can supply a custom version of it with the needed execute-function wrapping.

Does this integration approach sound fine?

If so, I'll put together a pull-request with a new option for it. (and then I can adjust the option name and transferring code, if needed, based on your feedback/review)

@Venryx
Copy link
Author

Venryx commented May 16, 2021

I've confirmed that adding that single custom field to the postgraphile options (to change the subscribe function sent to the websocket-servers) is enough for the graphql-live-query-patch library to be integrated. (well, at least as far as I've seen when comparing the network data between the library's official demo, and my patched version of postgraphile)

Here is a screenshot of the network data for my new test app:

The entry second from the bottom is a regular "full result set" supplied by postgraphile.

The entry at the very bottom is a "json patch" result-set, describing only the delta betwen the initial "full result set", and the latest results.

I'll now work on integrating the graphql-live-query-patch library on the client-side.

@Venryx
Copy link
Author

Venryx commented May 16, 2021

It works!

I've got json-patches being generated on the server, and applied on the client, for the updated result-sets of live-queries (as triggered automatically by the subscription-lds plugin).

Here is a screenshot of the initial result-set, transferred over websockets, followed by three "json patch" updates (which were applied successfully):

I tried to make use of existing libraries where possible, relying on graphql-live-query-patch for the patch generation and application logic, and not changing any of the default transport protocols used by Apollo and Postgraphile.

The only change needed in the Postgraphile codebase is:

  1. The addition of the option to customize the subscribe function passed to the websocket-server.

And the only thing the user will need to do (once I create the helper package) is:

  1. Run npm install X, for the helper package.
  2. Add its ApolloLink entry to their Apollo client setup code.
  3. Pass its customized subscribe function to their Postgraphile server, using the new setting.

I'll add a comment here once I've created the helper package, and link to the instructions on how to use it.

@benjie
Copy link
Member

benjie commented May 16, 2021

Very cool; instead of adding a subscribe option, could you add a generic pluginHook which can supply/augment it instead?

@Venryx
Copy link
Author

Venryx commented May 17, 2021

Working on it.

I think I'm almost there; the only issue is that the postgraphile options object is not available from within liveSubscribe.ts, and this appears to be necessary to set up a hook location. (to call pluginHookFromOptions(options))

How should I make the options object available to the liveSubscribeImpl function?

@Venryx
Copy link
Author

Venryx commented May 17, 2021

I've finished updating the code to use a new postgraphile:ws:executionResult hook, rather than an additional field on the Postgraphile options object.

It's working fine, however I'm currently using a hack to make the options object evailable to the liveSubscribeImpl function. (using essentially a global/singleton atm)

I'll wait until @benjie recommends a transfer approach before proceeding, as there are a few approaches with different side-effects.

Example approaches:

  1. Using a createLiveSubscribe() approach. (but this could change the current API)
  2. Using some kind of singleton. (inelegant, and may cause issues if there is more than one postgraphile instance)
  3. Creating a "wrapper" subscribe function in subscriptions.ts which passes on the arguments it receives from the websocket-server, while adding an options argument to the end.

No. 3 is currently my preferred, though it seems like there should still be a better way.

For reference, here is the current delta between my branch and upstream: graphile/crystal@main...Venryx:main

@Venryx
Copy link
Author

Venryx commented May 20, 2021

I've made a Github repo for the Postgraphile and Apollo plugins: https://github.com/pg-lq/postgraphile-live-query

The two plugins are published as the following packages (see above repo for more info):
@pg-lq/postgraphile-plugin
@pg-lq/apollo-plugin

And here is a small "todo app" demo repo that uses the plugins above: https://github.com/Venryx/postgraphile-live-query-demos

The main difficulty of installation atm is that the new postgraphile hook the plugin relies on has not been merged into postgraphile's upstream yet. Once benjie has time to look the changes over, I'll put together a pull-request, so the plugin becomes easy to use for anyone without messing with postgraphile forks.

@benjie
Copy link
Member

benjie commented May 26, 2021

I think I'm almost there; the only issue is that the postgraphile options object is not available from within liveSubscribe.ts, and this appears to be necessary to set up a hook location. (to call pluginHookFromOptions(options))

It seems like liveSubscribe is only used internally, so I'd wrap it with a factory function makeLiveSubscribe(pluginHook) which you can call after this line: https://github.com/graphile/postgraphile/blob/601a1086a2c10ebae4747364e199f2517d653861/src/postgraphile/http/subscriptions.ts#L72 (this is basically your option 1)

@benjie
Copy link
Member

benjie commented May 26, 2021

I've done the refactoring in graphile/crystal#1482 to help your PR stay small; so you just need to add the relevant plugin hooks.

@Venryx
Copy link
Author

Venryx commented May 26, 2021

I've done the refactoring in graphile/postgraphile#1482 to help your PR stay small; so you just need to add the relevant plugin hooks.

Awesome, thanks; I will do that soon. (in a day or two most likely)

@benjie
Copy link
Member

benjie commented May 26, 2021

Basic implementation here: graphile/crystal#1483

I did rename the hook because it's not strictly ws specific.... sorry 😬

@Venryx
Copy link
Author

Venryx commented May 26, 2021

Sounds good.

After renaming the hook in my plugin's code, it looks like it'll be ready to go in future Postgraphile versions. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants