Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiered compaction: identify_levels bails too early #7775

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

problame
Copy link
Contributor

Spent some time reading tiered compaction code, found this bug with identify_levels.

@hlinnaka please confirm that this is indeed a bug / unintended behavior.

Copy link

No tests were run or test report is not available

Test coverage report is not available

The comment gets automatically updated with the latest test results
6593ce0 at 2024-05-15T22:46:26.289Z :recycle:

@hlinnaka
Copy link
Contributor

I believe the code is correct. It took me a while to re-understand how it works though, so more comments would probable be in order.

The repro test case returns 0 layers, not 3 as in the assertion, and that is correct.

@hlinnaka
Copy link
Contributor

hlinnaka commented May 16, 2024

The trace from the test:

2024-05-16T09:05:47.647670Z  INFO pageserver_compaction::identify_levels: identify level at 0/10000, size 8192, num layers below: 3
2024-05-16T09:05:47.647687Z TRACE pageserver_compaction::identify_levels: inspecting 0000000000003000-0000000000004000__00008000-00009000 for candidate 0/10000, current best 0/10000
2024-05-16T09:05:47.647699Z TRACE pageserver_compaction::identify_levels: inspecting 0000000000002000-0000000000003000__00005000-00009000 for candidate 0/8000, current best 0/9000
2024-05-16T09:05:47.647710Z TRACE pageserver_compaction::identify_levels: too large 0000000000002000-0000000000003000__00005000-00009000, size 16384 vs 8192

That is correct. As soon as the function sees the too-large layer, it discards the "candidate" it was building, and returns with the "current_best", which is LSN 0/9000 and no layers. The sort function happens to reorder the layers to A, C, B, but you would get the same result with the A, B, C ordering. It would just bail out of the loop earlier.

@@ -135,6 +135,7 @@ where
// Is it small enough to be considered part of this level?
if r.end.0 - r.start.0 > lsn_max_size {
// Too large, this layer belongs to next level. Stop.
// Due to the sorting bug pointed out above there could still be smaller layers at same key range
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important point to notice here is that when we see a too large layer and bail out of the loop, we discard the candidate that we were building, and return with the "current best" safe stopping point that we had seen earlier.

Comment on lines +317 to +319
// The `identify_levels` loop will bails out at the first layer that is too large.
// , i.e., layer B. (log message "too large").
// That leaves layer C out of the level, even though it belongs to it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will in fact leave out all the layers from the returned Level. That's correct. The layers overlap, so it must include either all of them, or none. Because B is too large, they are all left out.

@hlinnaka
Copy link
Contributor

I drew this diagram to help myself walk through the scenario from the test case:

        // The layers are processed in order A, B, C
        //
        //                                A           B            C
        //                       10000
        //
        //                        9000    +           +            +
        //                                |           |            |
        //                        8000    +           |            +
        //                        7000                |
        //                        6000                |
        //                        5000                +
        //                        4000
        //
        // current_best_start_lsn: 10000    9000
        // current_best_layers:       []      []
        // candidate_start_lsn:    10000    8000         
        // candidate_layers:          []     [A]
        //
        // A: Hooray, there are no crossing LSNs.
        // B: Too large. Discard 'candidate' and return with 'current_best'
        //

Read the diagram above from left to right. Walk through the iterations:

  1. At entry to the function, current_best_start_lsn and candidate_start_lsn are set to 10000, and current_best_layers and `candidate_layers' are empty.
  2. Process layer A. Because r.end (9000) <= candidate_start_lsn (10000), it takes the "Hooray, there are crossing LSNs" codepath. It updates current_best_start_lsn to r.end==9000, and starts building a new candidate with candidate_start_lsn set to r.start==8000.
  3. Process layer B. It overlaps with candidate_start_lsn (8000), so "current best" is not updated. It is found to be too large, and we break out of the loop, and return with the "current best" of 9000 and empty layer set.

@hlinnaka
Copy link
Contributor

I wrote a comment with an overview description of how the loop in identify_levels() works: #7777. I hope that clarifies this to future readers.

@@ -308,6 +309,26 @@ mod tests {
Ok(())
}

#[tokio::test]
async fn repro_identify_levels_bails_too_ealy_if_partitioned_keyspace_same_lsn() -> anyhow::Result<()> {
tracing_subscriber::fmt::init(); // so that RUST_LOG=trace works
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail if there is multiple tests using it because you can set it only once per process.

Instead, I suggest copy-pasting from tests.rs:

static LOG_HANDLE: OnceCell<()> = OnceCell::new();

pub(crate) fn setup_logging() {
    LOG_HANDLE.get_or_init(|| {
        logging::init(
            logging::LogFormat::Test,
            logging::TracingErrorLayerEnablement::EnableWithRustLogFilter,
            logging::Output::Stdout,
        )
        .expect("Failed to init test logging")
    });
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants