Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-34156 InnoDB fails to apply the redo log for compressed tablespace #3254

Open
wants to merge 1 commit into
base: 10.11
Choose a base branch
from

Conversation

Thirunarayanan
Copy link
Member

@Thirunarayanan Thirunarayanan commented May 14, 2024

  • The Jira issue number for this PR is: MDEV-34156

Description

Problem:

  • During recovery, InnoDB fails to apply the redo log for compressed tablespace. The reason is that
    InnoDB assumes that pages has been freed while applying the redo log for it. InnoDB does multiple scan due to small buffer pool size. Problematic page has been freed and reinitialize multiple times. InnoDB stores the freed page information before it ran out of memory.
    But InnoDB assigns the freed page ranges to tablespace in recv_init_crash_recovery_spaces() even though
    InnoDB doesn't have complete freed range information.

Solution:

Store the freed page information irrespective of the InnoDB redo log memory.

Basing the PR against the correct MariaDB version

  • This is a new feature and the PR is based against the latest MariaDB development branch.
  • This is a bug fix and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@Thirunarayanan Thirunarayanan requested a review from dr-m May 14, 2024 15:31
@CLAassistant
Copy link

CLAassistant commented May 14, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scope and exact impact of this bug are unclear to me.

Is it possible to provide any form of test case for reproducing this?

Comment on lines 3387 to 4413
/* Add the freed page ranges in the respective
tablespace */
if (!rs.second.freed_ranges.empty()
tablespace only if InnoDB doesn't need to
rescan the redo logs */
if (!rescan
&& !rs.second.freed_ranges.empty()
&& (srv_immediate_scrub_data_uncompressed
|| rs.second.space->is_compressed())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this code change, this has something to do with page_compressed=1 tables, and possibly also with innodb_immediate_scrub_data_uncompressed=ON.

Problem:
=======
During recovery, InnoDB fails to apply the redo log for
compressed tablespace. The reason is that InnoDB assumes
that pages has been freed while applying the redo log for it.
InnoDB does multiple scan due to small buffer pool size.
Problematic page has been freed and reinitialize multiple times.
InnoDB stores the freed page information before it
ran out of memory. But InnoDB assigns the freed page ranges to
tablespace in recv_init_crash_recovery_spaces() even though
InnoDB doesn't have complete freed range information.

Solution:
========
Store the freed page information irrespective of the
InnoDB redo log memory.
@Thirunarayanan Thirunarayanan changed the base branch from 10.5 to 10.11 May 20, 2024 10:13
Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This is tricky to review, because I was not able to reproduce this on my own system, no matter what I tried. On the shared environment where this was reproduced in the first place, I debugged the recovery of the data directory whose copy is attached to MDEV-34156, with and without the fix. The problem was that store_freed_or_init_rec was not being called on the BLOB page 0x2c:5 in the function with the template parameter store=false.

Unfortunately, this patch turns out to be an exact reversion of 941af1f (MDEV-31803). We need to understand this scenario better and revise the fix accordingly.

A possible answer is that the previous fix was incorrect, and something needs to be adjusted for the immediate_scrub_data_uncompressed=ON case elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants