-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tiered compaction: identify_levels
bails too early
#7775
base: main
Are you sure you want to change the base?
tiered compaction: identify_levels
bails too early
#7775
Conversation
No tests were run or test report is not availableTest coverage report is not availableThe comment gets automatically updated with the latest test results
6593ce0 at 2024-05-15T22:46:26.289Z :recycle: |
I believe the code is correct. It took me a while to re-understand how it works though, so more comments would probable be in order. The repro test case returns 0 layers, not 3 as in the assertion, and that is correct. |
The trace from the test:
That is correct. As soon as the function sees the too-large layer, it discards the "candidate" it was building, and returns with the "current_best", which is LSN 0/9000 and no layers. The sort function happens to reorder the layers to A, C, B, but you would get the same result with the A, B, C ordering. It would just bail out of the loop earlier. |
@@ -135,6 +135,7 @@ where | |||
// Is it small enough to be considered part of this level? | |||
if r.end.0 - r.start.0 > lsn_max_size { | |||
// Too large, this layer belongs to next level. Stop. | |||
// Due to the sorting bug pointed out above there could still be smaller layers at same key range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An important point to notice here is that when we see a too large layer and bail out of the loop, we discard the candidate that we were building, and return with the "current best" safe stopping point that we had seen earlier.
// The `identify_levels` loop will bails out at the first layer that is too large. | ||
// , i.e., layer B. (log message "too large"). | ||
// That leaves layer C out of the level, even though it belongs to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will in fact leave out all the layers from the returned Level. That's correct. The layers overlap, so it must include either all of them, or none. Because B is too large, they are all left out.
I drew this diagram to help myself walk through the scenario from the test case:
Read the diagram above from left to right. Walk through the iterations:
|
I wrote a comment with an overview description of how the loop in identify_levels() works: #7777. I hope that clarifies this to future readers. |
@@ -308,6 +309,26 @@ mod tests { | |||
Ok(()) | |||
} | |||
|
|||
#[tokio::test] | |||
async fn repro_identify_levels_bails_too_ealy_if_partitioned_keyspace_same_lsn() -> anyhow::Result<()> { | |||
tracing_subscriber::fmt::init(); // so that RUST_LOG=trace works |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail if there is multiple tests using it because you can set it only once per process.
Instead, I suggest copy-pasting from tests.rs:
static LOG_HANDLE: OnceCell<()> = OnceCell::new();
pub(crate) fn setup_logging() {
LOG_HANDLE.get_or_init(|| {
logging::init(
logging::LogFormat::Test,
logging::TracingErrorLayerEnablement::EnableWithRustLogFilter,
logging::Output::Stdout,
)
.expect("Failed to init test logging")
});
}
Spent some time reading tiered compaction code, found this bug with
identify_levels
.@hlinnaka please confirm that this is indeed a bug / unintended behavior.