Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Garbage Collection Due to MinSyncedSeq Determination with Lamport Clocks #723

Open
chacha912 opened this issue Dec 13, 2023 · 0 comments
Labels
bug 🐞 Something isn't working hard 🧑‍🔬 Difficult to deal with or require research protocol changed 📝 Whether the protocol has changed sdk ⚒️

Comments

@chacha912
Copy link
Contributor

chacha912 commented Dec 13, 2023

What happened:

1. Inaccurate Garbage Collection using Lamport Clocks

In Yorkie, the Garbage Collection (GC) feature is designed to eliminate tombstone nodes once every peer has acknowledged the deletion operation (meaning these nodes are no longer referenced by remote peers). The determination of whether all peers have received a deletion operation is based on the time of the min synced sequence change. This change (minSyncedSeq) is determined using the smallest Lamport clocks. However, while Lamport clocks enable total ordering, they lack the capability to distinguish between concurrency and causality of events, leading to inaccuracies in GC decisions.

Consider the scenario where concurrent edits occur:

GC(as-is)

  1. Initially, the document starts with the string abd, with clients A and B both synced up to the change 3@B.
  2. Simultaneously, A and B perform edits. A deletes bd, and B adds c between b and d.
  3. B syncs first, marking the server with 4@B as synced.
  4. A syncs next, pulling 4@B and pushing 4@A. The server registers 4@A as synced.
  5. A applies the edit for 4@B, triggering GC since min synced seq is 4@A, and the removedAt matches 4@A.
  6. B inserts 1 after b.
  7. B syncs again, pushing 5@B and pulling 4@A.
  8. A syncs, pulling 5@B, but encounters an error since it can't find 2@A.

The bug stems from B not yet syncing 4@A in step 3 but marking it as synced since we use Lamport clock to determine the min synced sequence. Although 4@A < 4@B, it doesn't guarantee the sync of 4@A due to Lamport clock limitations.

  • Lamport Clocks
    • a->b => L(a)<L(b)
    • but, L(a)<L(b) => a->b or a||b(concurrency)

2. Proposed Solution Ideas

2-1. Use of Vector Clocks

Replace Lamport clocks with vector clocks to manage the concurrency of changes. Vector clocks maintain an array of counters for each client, representing the ordering of changes.

GC(vector clocks)

  • Clients A and B start with vector clocks [2,3].
  • During concurrent edits, update vector clocks, e.g., [2,4] for B's sync.
  • During GC, compare the min synced sequence by calculating the minimum value at each position of the vectors.
    • Rule for comparing vector timestamps:
      • V(a)=V(b) when a_k=b_k for all k
      • V(a)<V(b) when a_k<=b_k for all k and V(a)≠V(b)
      • concurrency: V(a)||V(b) if a_i<b_i and a_j>b_j, some i,j

This approach ensures accurate concurrency tracking but may pose memory challenges as all time tickets need to be managed using vector clocks.

2-2. Hybrid Use of Vector and Lamport Clocks

Introduce a hybrid approach by managing vector clocks for changes' concurrency in the document replica and using Lamport clocks for element ID generation.

GC(vector + lamport clocks)

  • In the document replica, the vector clock is managed through the synced vector map. When a change is received, the vector clock is updated.
  • During GC, compare the Lamport time of the deleted element with the min synced Lamport time from the synced vector map. If the deleted Lamport time is greater, GC can proceed.

This approach balances the benefits of vector clocks for concurrency tracking and Lamport clocks for efficient element ID generation. However, it may still incur increased memory usage as the number of clients grows.

2-2-1. Handling Concurrent Edits in Text Delete

With the adoption of the 2-2 approach, the use of the latestCreatedAtMap to handle text delete concurrency can be eliminated.

  • Changes now include vector clocks, enabling the determination of the order of changes without the need for latestCreatedAtMap.
  • During delete operations, compare the Lamport time of the deleted character's creation with the Lamport time from the vector clock to maintain concurrency.

Figure

  • Text delete(lamport clock limitation)
    Text delete(lamport clock limitation)
  • Text delete(latestCreatedAtMap): as-is
    Text delete(latestCreatedAtMap)
  • Text delete(vector + lamport clocks)
    Text delete(vector + lamport clocks)

2-3. Exploration of Other Approaches

Note: The proposed solutions are conceptual ideas, and a detailed design and validation are necessary for actual implementation.

What you expected to happen:

GC should occur only when all peers have received the delete operation.

How to reproduce it (as minimally and precisely as possible):

Execute the following test code corresponding to the scenario:

// gc_test.go
t.Run("should not garbage collection", func(t *testing.T) {
	ctx := context.Background()
	d1 := document.New(helper.TestDocKey(t))
	err := c1.Attach(ctx, d1)
	assert.NoError(t, err)

	d2 := document.New(helper.TestDocKey(t))
	err = c2.Attach(ctx, d2)
	assert.NoError(t, err)

	err = d1.Update(func(root *json.Object, p *presence.Presence) error {
		root.SetNewText("text").Edit(0, 0, "a").Edit(1, 1, "b").Edit(2, 2, "d") // abd
		return nil
	}, "sets text")
	assert.NoError(t, err)
	err = c1.Sync(ctx)
	assert.NoError(t, err)
	err = c2.Sync(ctx)
	assert.NoError(t, err)

	err = d2.Update(func(root *json.Object, p *presence.Presence) error {
		root.GetText("text").Edit(2, 2, "c")  // abcd
		return nil
	}, "insert c")
	assert.NoError(t, err)

	err = d1.Update(func(root *json.Object, p *presence.Presence) error {
		root.GetText("text").Edit(1, 3, "") // a
		return nil
	}, "delete bd")
	assert.NoError(t, err)
	assert.Equal(t, 2, d1.GarbageLen())
	assert.Equal(t, 0, d2.GarbageLen())

	err = c2.Sync(ctx)
	assert.NoError(t, err)
	err = c2.Sync(ctx)
	assert.NoError(t, err)
	err = c1.Sync(ctx)
	assert.NoError(t, err)
	err = c1.Sync(ctx)
	assert.NoError(t, err)
	assert.Equal(t, 0, d1.GarbageLen())
	assert.Equal(t, 0, d2.GarbageLen())

	err = d2.Update(func(root *json.Object, p *presence.Presence) error {
		root.GetText("text").Edit(2, 2, "1")
		return nil
	}, "insert 1")
	assert.NoError(t, err)

	err = c2.Sync(ctx)
	assert.NoError(t, err)
	err = c2.Sync(ctx)
	assert.NoError(t, err)
	assert.Equal(t, 0, d1.GarbageLen())
	assert.Equal(t, 0, d2.GarbageLen())

	err = c1.Sync(ctx)
	assert.NoError(t, err) // 👾 Encounters an error since the node is purged in GC.
	assert.Equal(t, 0, d1.GarbageLen())
	assert.Equal(t, 0, d2.GarbageLen())
})

image

Anything else we need to know?:

Environment:

  • Operating system:
  • Browser and version:
  • Yorkie version (use yorkie version): v0.4.9
  • Yorkie JS SDK version:
@chacha912 chacha912 added the bug 🐞 Something isn't working label Dec 13, 2023
@hackerwins hackerwins added the hard 🧑‍🔬 Difficult to deal with or require research label Dec 13, 2023
@hackerwins hackerwins added the protocol changed 📝 Whether the protocol has changed label Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working hard 🧑‍🔬 Difficult to deal with or require research protocol changed 📝 Whether the protocol has changed sdk ⚒️
Projects
Status: Backlog
Status: Todo
Development

No branches or pull requests

2 participants