Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] @graphile/lds2 #662

Open
4 of 6 tasks
wesselvdv opened this issue Sep 18, 2020 · 12 comments
Open
4 of 6 tasks

[RFC] @graphile/lds2 #662

wesselvdv opened this issue Sep 18, 2020 · 12 comments

Comments

@wesselvdv
Copy link
Contributor

Feature description

The new package @graphile/lds2 is in essence a copy of the current implementation with two changes (1 & 2) and one addition (3):

  1. Minimal required version of wal2json is at this time of writing 2.3, due to requiring the include-pk parameter to be available. This is required for @graphile/subscriptions-lds to properly work in cases where REPLICA IDENTITY FULL is enabled on a table.

  2. Dropping support for version-format 1, and moving to version-format 2 which no longer reports changes on a transaction level, but on a tuple level. Additional bonus is that in my opinion the json object is structured more like what the industry standard is today.

  3. Adding watch capability much to what I believe graphile-build has for building it schema. Currently @graphile/lds is not able to actively listen to schema changes, adding this would allow us to circumvent the need for the include-pk parameter and as such also keep supporting version-format 1 and wal2json <= 2.3, yet this is not the main goal of this addition.

    I believe the most useful purpose of adding a watch capability is to be able to support subscriptions on native pg >=11 partitioned tables and timescale hypertables. These table structures are providing a virtual table that one can do insert,update,delete,select upon, but in actuality these operations are redirected to one or more sub tables (partitions). wal2json does NOT report changes on this (virtual) parent table level, but on the actual subtable that the change occurred. This makes it impossible to subscribe to these type of tables with the current implementation of @graphile/lds. I am aware of the fact that native pg >= 11 partitioned tables are NOT very well supported in postgraphile atm (Only the partitions are showing up in the schema, not the virtual table), but due to the different technical implementation of timescale hypertables these are actually coming out in the generated schema just fine (No partitions are shown, just the parent table).

    Having this watch capability allows us to figure out that when a change occurs in one of these sub tables, we also need to fabricate a same change for the (virtual) parent table.

Motivating example

Motivation for this alternative version is twofold, being that the the current implementation of @graphile/lds:

  1. Assumes that the primary key information of a change is always included in the property oldkeys if the property oldkeys is included. This is NOT always the case due to the possibilty of enabling REPLICA IDENTITY FULL on a table which changes the contents of oldkeys from not only containing the primary key(s), but to containing the entire record as it was before said change (much like the OLD value in update/delete triggers). As such, enabling this feature on a table breaks the current implementation of @graphile/lds.

    Fixing the above issue requires the use of a parameter include-pk which is only available in version >=2.3 of wal2json, and as of right now we haven't been able to find a method to detect which version of wal2json is installed. The only way seems to be that of try {} catch {} the code with the parameter and falling back on one without if it fails with a parameter unknown error. Problem I have with this approach is that we'd have to resort to string matching to determine the cause, which feels a bit too iffy to me.

  2. Is built on version-format 1 of wal2json which produces a JSON object per transaction. All of the new/old tuples are available in the JSON object. Whereas the newer version-format 2 produces a JSON object per tuple, supports the truncate operation and provides in my opinion a more logical constructed object.

    Below are both formats as defined in typescript definitions, these are both the default format which can be augmented with additional metadata based on parameters. One of which is the much needed include-pk which is only available in wal2json >= 2.3

    format-version 1:

    export interface Change {
     kind:          Kind;
     schema:        string;
     table:         string;
     columnnames?:  string[];
     columntypes?:  string[];
     columnvalues?: Columnvalue[];
     oldkeys?:      Oldkeys;
    }
    
    enum Kind {
     Insert = 'insert',
     Update = 'update',
     Delete = 'delete'
    }
    
    export type Columnvalue = any;
    
    export interface Oldkeys {
     keynames:  string[];
     keytypes:  string[];
     keyvalues: Keyvalue[];
    }
    
    export type Keyvalue = string;

    format-version 2:

    export interface Change {
     action:    Action;
     schema:    string;
     table:     string;
     columns?:  Column[];
     identity?: Column[];
    }
    
    enum Action {
     Insert = 'I',
     Update = 'U',
     Delete = 'D',
     Truncate = 'T'
    }
    
    export interface Column {
     name:  string;
     type:  string;
     value: Value;
    }
    
    export type Value = any;

As we discussed in Discord fixing/adding the above 2 points would result in such breaking changes that are whole new package is warranted. This new package @graphile/lds2 would have more strict requirements on which version of wal2json is used.

Breaking changes

None, because it is a new package. Yet, in order to make this package useful an update to @graphile/subscription-lds or alternative package that does the same but using @graphile/lds2 is required.

Supporting development

I [tick all that apply]:

  • am interested in building this feature myself
  • am interested in collaborating on building this feature
  • am willing to help testing this feature before it's released
  • am willing to write a test-driven test suite for this feature (before it exists)
  • am a Graphile sponsor ❤️
  • have an active support or consultancy contract with Graphile
@wesselvdv
Copy link
Contributor Author

wesselvdv commented Sep 18, 2020

I guess I'll get this party starting with an immediate comment on my own proposal. Having read everything meticulously again, I think there might be a way to prevent breaking changes by implementing point nr. 3 (watch capability) and using the additional schema information to retrieve the pk in case of REPLICA IDENTITY FULL.

I do have to say that I am not familiar with the technical implementation of this watching, so I am not sure how much (duplicate) effort it would be to incorporate this in either @graphile/lds or @graphile/lds2

@wesselvdv wesselvdv changed the title [RFC] @graphile/lds2 [RFC] @graphile/lds2 Sep 18, 2020
@benjie
Copy link
Member

benjie commented Sep 28, 2020

Excellent outline 🙌

Most of this week (except Thursday) is dedicated to OSS from me, so ping me on Discord if you want to chat about this in real time so we can try and make some headway. I'm on UK time (UTC+1).

Questions

Q1: Do we need include-pk?

I think you're saying that when you have REPLICA IDENTITY FULL, the entire row (rather than just the PKs) comes through via oldkeys. We could presumably detect this by seeing that the length of the data is !== primaryKeys.length and === columns.length and automatically pick the relevant values for the PKs. If this isn't the case, please could you include copies of the relevant payloads to make the differences clearer.

If we can fix this particular issue in @graphile/lds via this means, we should (since it's a bug), so perhaps we can accomplish this via a standalone PR to reduce the difference LDS2 introduces?

Q2: What specifically do you think we need to monitor for in watch mode?

The list of columns/PK obviously, but not sure what else?

We could/should add watch mode support to @graphile/lds anyway, so perhaps we can accomplish this via a standalone PR to reduce the difference LDS2 introduces?

Q3: What's involved in adding virtual table support; can we add this to @graphile/lds?

If we can add support in @graphile/lds in a non-breaking way, this seems optimal. Would be nice to extend this to PostGraphile core too; I think the challenge here is that the introspection query will need different versions for different versions of PostgreSQL.

For PostgreSQL native partitions, I think step1 would be adding native partition support to PostGraphile.

Other comments

C1: Platform support

It would be nice to have a table of wal2json support across the various cloud vendors: Amazon RDS, Google Cloud SQL, etc - what versions of Postgres does it support, what versions of wal2json within those does it support, etc. I'm not keen to require a version of wal2json that few cloud providers support, but if support now exists (even in only the latest PostgreSQL versions), I'm definitely open to considering it.

C2: wal2json version detection

Someone should reach out and find if there's a better approach to detecting the wal2json version. This would be optimal, versus having two modules.

Separately, I'm also uncomfortable with matching error message strings, but if they have specific error codes then this hesitation goes away. Is this something you could research?

C3: testing

It'd be really handy if we could test different combinations of wal2json and PostgreSQL, to see what comes out. This seems like a perfect use case for Docker; are you aware of a docker PostgreSQL-wal2json matrix we can leverage here, or do we need to roll our own?

C4: new message format

I definitely see the appeal. I'd be keen to address as many issues as we can in @graphile/lds before creating an official lds2 module; in an ideal world there'd be just one module and we'd auto-switch; the idea is that the LDS module abstracts the PostgreSQL internals and only tells us what we care about so it should be the right abstraction point for doing this.

@benjie
Copy link
Member

benjie commented Sep 28, 2020

Link to the previous Discord discussion:

https://discord.com/channels/489127045289476126/489717163280826374/756222484172243054

@wesselvdv
Copy link
Contributor Author

wesselvdv commented Sep 29, 2020

Excellent outline 🙌

Most of this week (except Thursday) is dedicated to OSS from me, so ping me on Discord if you want to chat about this in real time so we can try and make some headway. I'm on UK time (UTC+1).

Thanks! I'll try and see if I can make some time this week.

C2: wal2json version detection
Someone should reach out and find if there's a better approach to detecting the wal2json version. This would be optimal, versus having two modules.

Separately, I'm also uncomfortable with matching error message strings, but if they have specific error codes then this hesitation goes away. Is this something you could research?

I already opened up an issue in the wal2json repo, but unfortunately there's no mechanism in place to detect versions from SQL. See eulerto/wal2json#181

@eulerto Just pinging you to this discussion, if you have any useful ideas on how to tackle detecting the version of wal2json that'd be very helpful!

Q1: Do we need include-pk?
I think you're saying that when you have REPLICA IDENTITY FULL, the entire row (rather than just the PKs) comes through via oldkeys. We could presumably detect this by seeing that the length of the data is !== primaryKeys.length and === columns.length and automatically pick the relevant values for the PKs. If this isn't the case, please could you include copies of the relevant payloads to make the differences clearer.

If we can fix this particular issue in @graphile/lds via this means, we should (since it's a bug), so perhaps we can accomplish this via a standalone PR to reduce the difference LDS2 introduces?

I think we can easily solve this using said watch capability in combination with the logic you're describing to detect that there's more in oldkeys than just the primary keys.

Q2: What specifically do you think we need to monitor for in watch mode?
The list of columns/PK obviously, but not sure what else?

We could/should add watch mode support to @graphile/lds anyway, so perhaps we can accomplish this via a standalone PR to reduce the difference LDS2 introduces?

I would also argue that we make sure that we watch the relations of any partitioned tables, or other constructs (timescale) that look like it. This will allow us to correctly announce for the parent table as well for any child ones. I built a small fix for timescale once to make sure that a change in a chunk of a table would also trigger the table itself:

  • hypertable A
    • chunk A
    • chunk B <-- change occurs here, then announcement references table chunk A, we also need to announce for hypertable A.
    • chunk C

Above fix was quite trivial, because I could just register another callback on the system table that was keeping track of this relation between hypertables and their chunks. So on startup of the plugin I would gather all links, and afterwards place a callback to listen to any changes on this system table so I would stay up to date:

    this.subscriptions = {
      '["_timescaledb_catalog", "hypertable"]': [
        [updateHyperTableChunks, undefined],
      ],
    };

Q3: What's involved in adding virtual table support; can we add this to @graphile/lds?
If we can add support in @graphile/lds in a non-breaking way, this seems optimal. Would be nice to extend this to PostGraphile core too; I think the challenge here is that the introspection query will need different versions for different versions of PostgreSQL.

For PostgreSQL native partitions, I think step1 would be adding native partition support to PostGraphile.

I believe it could be as simple as this:

    this.subscriptions = {
      '["pg_catalog", "pg_inherits"]': [
        [updateTablePartitions, undefined],
      ],
    };

The table pg_catalog.pg_inherits contains all information that we need.inhrelid is the column that contains the actual partition table name, and inhparent contains the schema that you could use in the predicate to filter more strictly.

@benjie
Copy link
Member

benjie commented Sep 29, 2020

Regarding Q3; are you suggesting that pg_catalog and pg_inherits are streamed via logical decoding too, and that we could subscribe to this? That's pretty cool! What I meant, though, is we should support these partitioned/hyper tables via PgTablesPlugin and the like.

@wesselvdv
Copy link
Contributor Author

wesselvdv commented Sep 29, 2020

Regarding Q3; are you suggesting that pg_catalog and pg_inherits are streamed via logical decoding too, and that we could subscribe to this? That's pretty cool! What I meant, though, is we should support these partitioned/hyper tables via PgTablesPlugin and the like.

Hmm, I just tried this locally but I couldn't listen on any pg_catalog.* table. It probably only works because the way timescale does it isn't regarded as being a system table.

@benjie
Copy link
Member

benjie commented Sep 29, 2020

@wesselvdv To keep this moving forward; can I suggest that you raise a PR just to solve the problem of Question 1; i.e. look at the length of oldKeys and use that to determine which fields are the PK.

In general, I'm thinking that we could/should use a flag rather than a separate version in order to support different wal2json features. This is slightly complicated by LDS being independent of Graphile Engine, but I think we can still achieve it 👍

@wesselvdv
Copy link
Contributor Author

@benjie I agree, what's the best approach for incorporating the watch capabilities? I have yet to take a look at graphile-build to see how it's incorporated.

@benjie
Copy link
Member

benjie commented Sep 30, 2020

The LDS module is completely standalone to graphile-build (you can use it without caring about GraphQL at all), so it would have to implement its own watch mode. Rather than having watch mode, I think I would do just-in-time introspection of the tables. So when the first request comes through for a table, quickly look up it's primary keys, and cache it. Then all the changeToPk calls would pass through the details of the table as a second argument. A very simple watch mode would simply not cache this data at all ;)

@wesselvdv
Copy link
Contributor Author

wesselvdv commented Sep 30, 2020

@benji Wouldn't it be more useful to register events for ddl changes, and store these changes in a central audit table. Said audit table can then be listened upon using the in place callback system.

I just tested the below, seems to work nicely. I'll try to incorporate timescale hypertable detection in this aswel.

create schema graphile_lds_watch;

CREATE TABLE graphile_lds_watch.tables
(

    "table"      regclass primary key NOT NULL,
    primary_keys TEXT[]               NOT NULL DEFAULT '{}'::TEXT[],
    parent       REGCLASS             NULL,
    last_updated TIMESTAMPTZ          NOT NULL DEFAULT now()
);

CREATE OR REPLACE FUNCTION graphile_lds_watch.track_primary_keys()
    RETURNS event_trigger AS
$$
DECLARE
    r            RECORD;
    primary_keys TEXT[];
    parent       regclass;
    pg_version   numeric = current_setting('server_version_num');
BEGIN
    FOR r IN SELECT * FROM pg_event_trigger_ddl_commands()
        LOOP
            IF r.command_tag = 'ALTER TABLE' OR r.command_tag = 'CREATE TABLE' then
                primary_keys := (
                    SELECT array_agg(a.attname)
                    FROM pg_index i
                             JOIN pg_attribute a ON a.attrelid = i.indrelid
                        AND a.attnum = ANY (i.indkey)
                    WHERE i.indisprimary
                      AND i.indrelid = r.objid
                );
                IF pg_version >= 10000 THEN
                    parent := (
                        SELECT inhparent::regclass
                        FROM pg_catalog.pg_inherits
                        WHERE inhrelid = r.objid
                    );
                end if;

                INSERT
                INTO graphile_lds_watch.tables("table", primary_keys, parent, last_updated)
                VALUES (r.object_identity, coalesce(primary_keys, '{}'::TEXT[]), parent,
                        statement_timestamp())
                ON CONFLICT ("table")
                    DO UPDATE SET primary_keys = excluded.primary_keys,
                                  parent       = excluded.parent;
            end if;
        END LOOP;
END ;
$$ LANGUAGE plpgsql;

CREATE EVENT TRIGGER track_ddl ON ddl_command_end
EXECUTE PROCEDURE graphile_lds_watch.track_primary_keys();
CREATE EVENT TRIGGER track_ddl_drops ON sql_drop
EXECUTE PROCEDURE graphile_lds_watch.track_primary_keys();

DROP TABLE IF EXISTS attributes;
CREATE TABLE IF NOT EXISTS attributes
(
    id   serial                                      NOT NULL,
    date TIMESTAMPTZ DEFAULT 'infinity'::TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (id, date)
);

ALTER TABLE attributes
    DROP CONSTRAINT attributes_pkey;

ALTER TABLE attributes
    ADD CONSTRAINT attributes_pkey_two PRIMARY KEY (id);

DROP TABLE IF EXISTS measurement;
CREATE TABLE IF NOT EXISTS measurement
(
    city_id   int  not null,
    logdate   date not null,
    peaktemp  int,
    unitsales int
) PARTITION BY RANGE (logdate);

DROP TABLE IF EXISTS measurement_y2006m02;
CREATE TABLE IF NOT EXISTS measurement_y2006m02 PARTITION OF measurement
    FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

DROP TABLE IF EXISTS measurement_y2006m03;
CREATE TABLE IF NOT EXISTS measurement_y2006m03 PARTITION OF measurement
    FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

DROP TABLE IF EXISTS measurement_y2007m11;
CREATE TABLE IF NOT EXISTS measurement_y2007m11 PARTITION OF measurement
    FOR VALUES FROM ('2007-11-01') TO ('2007-12-01');

SELECT * FROM graphile_lds_watch.tables;

@benjie
Copy link
Member

benjie commented Oct 9, 2020

I'm not sure I see the value of mutating the database on every mutation in the database to be able to track mutations in the database; could you expand? In PostGraphile the event triggers come through to Node which then checks the latest state of the DB; I'd expect something similar to work well here?

I'm not super comfortable with any tool mutating the database, there has to be a really good reason to do it. Event triggers in particular are frustrating because you can't easily omit them from pg_dumps so they pollute user's schema dumps.

@Venryx
Copy link

Venryx commented Mar 19, 2021

@wesselvdv Are you still interested in working on this enhancement?

I'm looking to make some optimizations to the live-query system (as described here), and this enhancement appears to be a necessary step for entries 1 and 3 in that thread.

Specifically, they rely on having the full data of the changed row (for efficient matching against active live-queries), which requires REPLICA IDENTITY FULL to be enabled, which you've stated is incompatible with Postgraphile without this thread's proposed changes. (the main change being the upped version-requirement of wal2json, to support the include-pk parameter, which subscription-lds needs to be compatible with the full-replica flag, I gather)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants