-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read from branch with compaction data #7701
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Really exciting to see how we can separate the two functions of "the" (currently single) branch metarange ID: right now it is used both as the parent for the next commit, and also as the base for differences in staging.
I would prefer to avoid the term "compaction" here, and instead name the new metarange ID on a branch by some other name - maybe BaseMetarangeID. It may make our lives easier if we ever have to add user-requested "partial commits". The name we pick right now matters, because it it stored to the KV. So it will be expensive to change in future!
I am not sure why we need a "compaction manager". All it does is duplicate the code that we already have in Graveler, just for a different metarange. We even need the same if
statements to select the right base metarange. Furthermore, AFAICT this compaction manager leads to these bugs:
- An SSTable handle leaks whenever a compacted metarange exists on the branch. This leaks a file descriptor as well as memory.
- The code opens an unused metarange whenever a compacted metarange exists on the branch. This is a loss of efficiency. IOPs (I/O operations) are sometimes an expensive resource on lakeFS installations.
Can we try to select the correct metarange in if
statements before accessing the commit metarange, instead of having the manager? I believe that that code would be easier to understand, and provide fewer opportunities for bugs.
Agree with @arielshaqed comments. Is this PR goal to cover all branch reading scenarios? What about functions that are not covered in this PR, like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Ariel, I would expect compaction to be handled as any other metarange. furthermore, compaction made the term staging a tricky one
i.e some would expect listStagingArea to return only data existing under the staging token and others may expect it to return all staging data (IMO all staging data is the trivial one). Also, we still have many places not considering the compacted data, for example graveler.Get and graverler.Commit which should return "dirty" if staging is empty, etc...
pkg/graveler/graveler_test.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tests! With Commit, StagingToken and SealedTokens already present, and with the addition of BaseMetaRange
concept, the number of test scenarios is huge. I think we missed some important ones, like testing with tombstones, testing with all 4 (or 5 for multiple sealed tokens) keys, querying for empty staging area, etc. The reason we need to be pedantic about changes here, is that bugs in the versioning engine can be catastrophic. We intend on introducing the compaction operation itself later and that's even more worrying! Finding the bugs are normally easier during the PR itself, and it gets harder as we build on top of it.
You can look at graveler_v2_test.go
file too. I find it easier to write the tests with proper generated mocks instead of limited fakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed everything except graveler_test.go
(will get back to it later).
Overall we're almost there! Bunch of medium/small comments..
pkg/graveler/graveler.go
Outdated
@@ -3109,6 +3164,14 @@ func (g *Graveler) Diff(ctx context.Context, repository *RepositoryRecord, left, | |||
leftValueIterator.Close() | |||
return nil, err | |||
} | |||
if rightBranch.CompactedBaseMetaRangeID != "" { | |||
compactedDiffIterator, err := g.CommittedManager.Diff(ctx, repository.StorageNamespace, leftCommit.MetaRangeID, rightBranch.CompactedBaseMetaRangeID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftCommit
& rightBranch.CompactedBaseMetaRangeID
? Can you explain why this is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case there's no compaction data on the right we return a combined iterator with:
- diff iterator between the left committed data and the right committed data
- iterator of committed data in the left branch
- iterator of the staging data of the right branch
So in case there's compacted data on the right branch we need to change the first iterator to contain also changes that have already compacted to the right branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case there's no compaction data on the right we return a combined iterator with:
But this path handles the case where rightBranch.CompactedBaseMetaRangeID != ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, why isn't it JoinedDiff(stagingIterator, compactedDiffIterator)
? Isn't iterator of committed data in the left branch
already being diffed in compactedDiffIterator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review, but I think there's some valuable comments already
} | ||
} | ||
|
||
if updatedValue == nil && reference.CompactedBaseMetaRangeID != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we get here with updatedValue != nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case the object didn't change in staging
pkg/graveler/graveler.go
Outdated
@@ -2492,7 +2512,7 @@ func (g *Graveler) ResetPrefix(ctx context.Context, repository *RepositoryRecord | |||
newSealedTokens = append(newSealedTokens, branch.SealedTokens...) | |||
|
|||
// Reset keys by prefix on the new staging token | |||
itr, err := g.listStagingArea(ctx, branch, 0) | |||
itr, err := g.listStagingAreaWithoutCompaction(ctx, branch, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't compaction be involved too? AFAIR ResetPrefix
resets all the uncommitted changes in a prefix. Why shouldn't it reset changes that exist in compacted area too?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pushing this one through, it's certainly wasn't an easy one.
Once it's merged, please open an issue containing the tests table as a dependency for the next compaction task.
…d-compacted-branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only blocking diffs are on JoinedDiffIterator. It is really important code, and needs to be as simple as possible. I think we may have simpler code if we revisit some decisions.
One way might be this. "Advance inner" (now, after seeing the code) seems like the wrong strategy. AFAICT it forces us to assume a particular special DiffIterator.Value()
return of nil
when the iterator is over. And then it looks like we check almost the same cases but not quite. Because there are so many, I worry that we will miss some edge cases.
pkg/catalog/catalog_test.go
Outdated
numBranch: 3, | ||
numRecords: 0, | ||
expectedCalls: 1, | ||
compactBranch: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Rather than double each test scenario, I would probably just loop over both scenarios by saying for _, compactBranch := range{false, true} { t.Run(...) }
on l. 585.
pkg/catalog/catalog_test.go
Outdated
} | ||
|
||
if numRecords > 0 { | ||
test.GarbageCollectionManager.EXPECT(). | ||
GetUncommittedLocation(gomock.Any(), gomock.Any()). | ||
Times(expectedCalls). | ||
AnyTimes(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Particularly worried that we might never call this. If no UGC occurs, will the test fail somewhere else?
func normalizeStorageNamespace(namespace string) string { | ||
if !strings.HasSuffix(namespace, DefaultPathDelimiter) { | ||
namespace += DefaultPathDelimiter | ||
} | ||
return namespace | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we must already have this function somewhere!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found in the controller the function 'normalizePhysicalAddress,' and it looks similar, but it doesn't perform the same function.
// CompactedBaseMetaRangeID - the MetaRangeID of the last compaction's | ||
CompactedBaseMetaRangeID MetaRangeID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really not a fan of the "compacted" part of the name. It describes how it happened not what it is. I would prefer to use "BaseMetaRangeID". It is important to get the name right before any users - because if we get it wrong it will be hard to fix existing KVs.
In future we might have other reasons to change the base. For one example: during concurrent merges, one of the merges will fail when it tries to update the branch record of the destination. That is inefficient and slow. We could use your work here to speed up concurrent merges! After the merge fails like this we might optionally update the base metarange of the source branch to point at the intended result. Now the source branch looks like the destination branch got merged into it, so retrying the merge will be faster.
(This is just one example, I have many plans for this feature. I don't want to tie them to the word "compacted"!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @itaiad200 was worried that the word "compact" was missing for this variable.
Decision time 😈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itaiad200 can you chime in, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not tying ourselves to Compacted
means that we add another term on top of staging, sealed, compaction and uncommitted.. For example, LastStagingSnapshot
- Pro: It describes what it is, not obscure. Con: Another term.
I still hold my opinion but not firmly, I'd rather it be CompactedStaging
over BaseMetaRangeID
. I believe it does describe what it is, not necessarily the action that happened.
So up to you, not blocking :)
pkg/graveler/joined_diff_iterator.go
Outdated
type JoinedDiffIterator struct { | ||
iterA DiffIterator | ||
iterB DiffIterator | ||
p DiffIterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document p
?
pkg/graveler/joined_diff_iterator.go
Outdated
// first | ||
nextA = c.iterA.Next() | ||
nextB = c.iterB.Next() | ||
case valA == nil && valB == nil: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not to depend on nil
values like this. I think that the problem is you already called Next
on the iterators by the time you call this function; perhaps we should figure out how to avoid that?
pkg/graveler/joined_diff_iterator.go
Outdated
if !c.advanceInnerIterators() { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is more complex than it should be. You have to call c.Next()
the first time, before using the iterator. That's why you need to have the special case for p == nil
in advanceInnerIterators
. This function:
- Relies on DiffIterator returning
nil
value in certain conditions. Regardless of whether or not it does this, relying on this makes it a very special case - most Go iterators do not support this! - We repeatedly call
Value()
on each iterator. This is weird. - We have 2
switch
statements. The first has 7 cases and 2 special cases inif
statements, the second has 5 cases. This is too complex.
pkg/graveler/joined_diff_iterator.go
Outdated
if c.p != nil && c.p.Err() != nil { | ||
return c.p.Err() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. c.p
is always either c.iterA
or c.iterB
. Why do we need to check it separately?
pkg/graveler/joined_diff_iterator.go
Outdated
if c.p != nil && c.p.Err() != nil { | ||
return c.p.Err() | ||
} | ||
if c.iterA != nil && c.iterA.Err() != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is c.iterA == nil
?
…d-compacted-branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still worried about the iterator. The switch statement in there is huge, which makes it too hard for me to understand. I think it has issues, but TBH I really don't know.
So still requesting changes, if only to documentation.
// CompactedBaseMetaRangeID - the MetaRangeID of the last compaction's | ||
CompactedBaseMetaRangeID MetaRangeID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itaiad200 can you chime in, please?
if branchRecord.Branch.CompactedBaseMetaRangeID != "" { | ||
metaRangeID = branchRecord.Branch.CompactedBaseMetaRangeID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand: AFAICT this is a performance degradation if there has been no compaction! Why wouldn't it be sufficient to take the "cachedMetaRangeID" in the "else"? (This is very important because otherwise we GetCommit. It's cached, but that's still unnecessary overhead on the commit cache to get a commit we might already have had!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call deleteUnsafe
function from two places, both pass nil for cachedMetaRangeID
pkg/graveler/joined_diff_iterator.go
Outdated
iterAHasMore bool | ||
iterB DiffIterator | ||
iterBHasMore bool | ||
p DiffIterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, please document p. The name is very unintuitive.
pkg/graveler/joined_diff_iterator.go
Outdated
case bytes.Equal(valA.Key, valB.Key): | ||
c.iterAHasMore = c.iterA.Next() | ||
c.iterBHasMore = c.iterB.Next() | ||
case bytes.Compare(valA.Key, valB.Key) < 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This traverses the keys twice. Please compare once; it is fine to call a separate routine here, but that will need to be documented. In fact you can return immediately in all the above cases, and then compute bytes.Compare(val!.Key, valB.Key)
once.
pkg/graveler/joined_diff_iterator.go
Outdated
c.iterBHasMore = c.iterB.Next() | ||
case bytes.Compare(valA.Key, valB.Key) < 0: | ||
c.iterAHasMore = c.iterA.Next() | ||
c.iterBHasMore = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this line? After line 43 we know that iterBHasMore.
pkg/graveler/joined_diff_iterator.go
Outdated
default: | ||
// value of iterA < value of iterB | ||
c.iterBHasMore = c.iterB.Next() | ||
c.iterAHasMore = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
pkg/graveler/joined_diff_iterator.go
Outdated
return false | ||
} | ||
if !c.iterAHasMore && !c.iterBHasMore { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. Suppose that the last values of iterA and iterB refer to the same key. Then these lines will make this condition true -- and the iterator misses the last value!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We process only values we already returned in the first switch-case statement.
I'll add a test case just to make sure.
pkg/graveler/joined_diff_iterator.go
Outdated
valA = c.iterA.Value() | ||
valB = c.iterB.Value() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If c.iterAHasMore then it is incorrect to call c.iterA.Value(), and similarly for iterB.
c.iterA.SeekGE(id) | ||
c.iterB.SeekGE(id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems incorrect: AFAIR you must call Next() after SeekGE().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true.
This is the same behavior we have in our CombinedIterator
.
I don't know if we want to have the same behavior here or change it.
…d-compacted-branch
No description provided.