Roadmap to synchronizing multiple replicas or undo/redo support #503

johncalvinyoung · 2024-01-29T21:11:45Z

Hello, Streamich!

My team is exploring using json-joy/json-crdt, we're really excited about your performant JSON CRDT implementation. We have a few questions about where the library is going:

Automerge and others provide an easy way to have a fork request changes from another fork whose tip is farther along in history. I see lots of ways to emit patches as they're applied locally, but no way to get a diff or set of patches comparing two clocks, unless I'm missing it? I see the beginnings of PatchLog and Draft in the codebase, but it's not clear how to use them yet.
Is there a way to convert JSON-CRDT patches to RFC6902 patches? I see there's support for applying a RFC6902 patch to a Model and flushing to a JSON-CRDT patch, but it'd be really nice for our application if there was a way to emit the changes actually applied to the CRDT (such as when we synchronize a JSON-CRDT patch across the network). Right now we're reading the whole view() after each broadcast patch is applied, which is inefficient on very large documents that change only a few keys at a time.
Is there a plan to invert or unpatch a JSON-CRDT patch? Other libraries support inverting RFC6902 patches. Would be really nice for building an undo-redo on top of the library, or do you have an alternative architecture in mind?

I really appreciate it--doing my best to understand what the current codebase offers and where the library might go in future. Might be able to contribute to the library, as well, in future.

Thanks!

The text was updated successfully, but these errors were encountered:

streamich · 2024-02-14T11:02:02Z

Hello @johncalvinyoung,

Sorry for the late reply, I had some family issues to take care of. Those are all good questions, let me try to address them.

I see lots of ways to emit patches as they're applied locally, but no way to get a diff or set of patches comparing two clocks, unless I'm missing it? I see the beginnings of PatchLog and Draft in the codebase, but it's not clear how to use them yet.

Currently, the idea is that the developers have to figure out the missing patches in their code. There are two major directions I see: (1) in the peer-to-peer model one peer needs to send its logical clock vector model.clock to the other peer and the other peer needs to compute the missing patches and send them to the first peer; (2) in the client-server model, the server would be the central authority, which could order the patches and send the client the missing "tip".

You are right, the PatchLog is the "native" code piece that could provide this calculation from the library side. Something like:

const patches = patchLog.getTip(anotherPeerLogicClockVector);

The complexity here is that the PatchLog currently is all in memory, however, in a real application one could imagine that the patches are stored on disk. Hence there is a decision to make: (1) either leave the PatchLog implementation work on in-memory data structures, which means all patches need to be loaded in memory; (2) add some sort of persistence support to the PatchLog.

I am leaning towards the approach (at least in the initial implementation), where the PatchLog is a memory only class, it knows nothing about how patches are stored, they all have to be loaded in memory. That means that the on disk storage needs to store all patches compactly (in one file or just a handful of files), so that there is no disk hammering for reading a file-per-patch. There is the json-joy/json-crdt/file which will eventually be able to provide that, however, the file format there is mostly intended for debugging or low-volume peer-to-peer use case, as that file format is not the most efficient one. But it will be very convenient for debugging, it will store the whole history.

The above should work fine for peer-to-peer, low volume. However, for server-client or where the most efficiency is desired, storing all patches in a single file is not the most performant approach. There should be some custom patch storage layer. Hence, also, all patches should not be loaded to memory every time clock diff is computed, only the necessary patches. For those use cases, there should be some clock diff function implemented in the json-joy/json-crdt-patch library, which—given two logical clock vectors—computes the patch IDs that are the difference. Then that difference can be used to load the necessary patches from the custom storage layer.

Is there a way to convert JSON-CRDT patches to RFC6902 patches?

No, but the other way around is possible. The code in json-joy/json-crdt/json-patch allows to construct JSON CRDT Patch patches from JSON Patch patches.

Right now we're reading the whole view() after each broadcast patch is applied, which is inefficient on very large documents that change only a few keys at a time.

I don't completely understand your issue here, but below are few ideas:

The view() is fast, it caches everything it can and also tries to preserve object identity (i.e. if object has not changed it returns the same object, which will eval to true using === triple equality). You can use object equality for your rendering caching.

Also, there is a way to subscribe deeply to JSON CRDT objects:

model.api.obj(['path', 'to', 'my', 'object']).onChanges.listen(() => {});

Also, JSON CRDT Patch patches have a way to store custom information in the meta field, you can put anything there, like your JSON Patch patch.

Also, if your patches are arriving to fast, say faster than every 200ms, maybe you could buffer and apply them in batches, every 200ms.

Finally, defies the purpose for you, but I will mention anyways: there are libraries that can compute JSON Patch from two documents.

Is there a plan to invert or unpatch a JSON-CRDT patch? [..] Would be really nice for building an undo-redo on top of the library, or do you have an alternative architecture in mind?

I am not sure what will be the final form here, but in the next few months I will be adding rich-text support to this library and it will definitely have undo-redo.

Regarding the operation inversion, the trick for being able to invert an operation is to store enough meta information in the operation, to be able to invert it. Like, delete operation needs to know what content was deleted. I'm not sure it is possible in all cases and if this library will support it out-of-the-box.

[..] doing my best to understand what the current codebase offers and where the library might go in future.

Lots of UI debugging tools will be released shortly. Also, lots of UI integrations, specifically for text editing will be released and potentially JSON, too. Finally, major feature for the next few months will be implementation of Peritext rich-text algorithm, it will work on top of Quill editor and eventually a custom CRDT-native rich-text UI will be developed.

johncalvinyoung · 2024-02-14T20:44:28Z

Thanks, @streamich, that's very detailed and very helpful!

The code in json-joy/json-crdt/json-patch allows to construct JSON CRDT Patch patches from JSON Patch patches.

Already using this, thanks!

Also, JSON CRDT Patch patches have a way to store custom information in the meta field, you can put anything there, like your JSON Patch patch.

That's a very good point, and one I suspect we'll be making use of!

Finally, defies the purpose for you, but I will mention anyways: there are libraries that can compute JSON Patch from two documents.

That's exactly what I'm doing right now, sadly enough. The problem is that this application's state is actually stored in a very rich non-JSON tree structure in memory, making heavy use of prototypes and inheritance. Our experimental synchronization process therefore has to serialize to a (much flatter) JSON structure, mutate the CRDT, synchronize the CRDT to the other client, work out the diff between the CRDT and our current state on that client (which is where calling view() comes in), and apply it to the live application tree with application-specific optimizations and conflict resolution. Writing it out like that, I'm surprised that even works. It's sort of layering OT on top of CRDT the way it's currently written. But it does work so far, though we haven't tackled undo/redo synchronization as of yet.

Also, there is a way to subscribe deeply to JSON CRDT objects:

Technically we could be observing changes deep in that JSON layer... but when the JSON layer is tens of megabytes of JSON with thousands of objects represented... I'm not sure how performant that would be.

This experimental feature was originally written on top of Automerge, and used Automerge basically for reconciling patches, then derived patches back out to apply to our in-memory tree. Your library here is MUCH faster and actually viable in a way Automerge hasn't been for us, but there's definitely a few differences we're trying to figure out! In particular, I suspect I'm going to be tracking in our code a stack of JSON patches && CRDT patches applied, with the inverted JSON patch associated for working out the inversing of the CRDT. But we're not going to have budget to tackle this at greater depth for a few months, so I've got time to think about that architecture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap to synchronizing multiple replicas or undo/redo support #503

Roadmap to synchronizing multiple replicas or undo/redo support #503

johncalvinyoung commented Jan 29, 2024

streamich commented Feb 14, 2024 •

edited

johncalvinyoung commented Feb 14, 2024

Roadmap to synchronizing multiple replicas or undo/redo support #503

Roadmap to synchronizing multiple replicas or undo/redo support #503

Comments

johncalvinyoung commented Jan 29, 2024

streamich commented Feb 14, 2024 • edited

johncalvinyoung commented Feb 14, 2024

streamich commented Feb 14, 2024 •

edited