Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS-9092: Data inconsistencies when high rate of pages split/merge #5249

Closed
wants to merge 1 commit into from

Conversation

dlenev
Copy link
Contributor

@dlenev dlenev commented Mar 6, 2024

https://perconadev.atlassian.net/browse/PS-9092

Problem:

Query over InnoDB table that uses backward scan over the index occasionally
might return incorrect/incomplete results when changes to table (for example,
DELETEs in other or even the same connection followed by asynchronous purge)
cause concurrent B-tree page merges.

Cause:

The problem occurs when persistent cursor which is used to scan over index
in backwards direction stops on infimum record of the page to which it points
currently and releases all latches it has, before moving to the previous page.
At this point merge from the previous page to cursor's current one can happen
(because cursor doesn't hold latch on current or previous page). During this
merge records from the previous page are moved over infimum record and placed
before any old user records in the current page. When later our persistent
cursor resumes its iteration it might use optimistic approach to cursor
restoration which won't detect this kind of page update and resumes the
iteration right from infimum record, effectively skipping the moved records.

Solution:

This patch solves the problem by forcing persisted cursor to use pessimistic
approach to cursor restoration in such cases. With this approach cursor
restoration is performed by looking up and continuing from user record
which preceded infimum record when cursor stopped iteration and released
the latches. Indeed, in this case records which were moved during the merge
will be visited by cursor as they precede this old-post-infimum record
in the page.

This forcing of pessimistic restore is achieved by increasing page's
modify_clock version counter for the page merged into, when merge happens
from the previous page (normally this version counter is only incremented
when we delete records from the page or the whole page).

Theoretically, this might be also done when we are merging into page the
page which follows it. But it is not clear if it is really required, as
forward scan over the index is not affected by this problem. In forward
scan case different approach to latching is used when we switch
between B-tree leaf pages - we always acquire latch on the next page
before releasing latch on the current one. As result concurrent merges
from the next page to the current one are blocked.

Note that the same approach to latching can't be used for backward
iteration as it will mean that latching happens into opposite order
which will lead to deadlocks.

It is quite possible that there are move scenarios which should be covered
by this patch and there is a better way to solve this issue. But we feel
that required investigation and bigger changes are more appropriate for
Upstream.

https://perconadev.atlassian.net/browse/PS-9092

Problem:

  Query over InnoDB table that uses backward scan over the index occasionally
  might return incorrect/incomplete results when changes to table (for example,
  DELETEs in other or even the same connection followed by asynchronous purge)
  cause concurrent B-tree page merges.

Cause:

  The problem occurs when persistent cursor which is used to scan over index
  in backwards direction stops on infimum record of the page to which it points
  currently and releases all latches it has, before moving to the previous page.
  At this point merge from the previous page to cursor's current one can happen
  (because cursor doesn't hold latch on current or previous page). During this
  merge records from the previous page are moved over infimum record and placed
  before any old user records in the current page. When later our persistent
  cursor resumes its iteration it might use optimistic approach to cursor
  restoration which won't detect this kind of page update and resumes the
  iteration right from infimum record, effectively skipping the moved records.

Solution:

  This patch solves the problem by forcing persisted cursor to use pessimistic
  approach to cursor restoration in such cases. With this approach cursor
  restoration is performed by looking up and continuing from user record
  which preceded infimum record when cursor stopped iteration and released
  the latches. Indeed, in this case records which were moved during the merge
  will be visited by cursor as they precede this old-post-infimum record
  in the page.

  This forcing of pessimistic restore is achieved by increasing page's
  modify_clock version counter for the page merged into, when merge happens
  from the previous page (normally this version counter is only incremented
  when we delete records from the page or the whole page).

  Theoretically, this might be also done when we are merging into page the
  page which follows it. But it is not clear if it is really required, as
  forward scan over the index is not affected by this problem. In forward
  scan case different approach to latching is used when we switch
  between B-tree leaf pages - we always acquire latch on the next page
  before releasing latch on the current one. As result concurrent merges
  from the next page to the current one are blocked.

  Note that the same approach to latching can't be used for backward
  iteration as it will mean that latching happens into opposite order
  which will lead to deadlocks.

  It is quite possible that there are move scenarios which should be covered
  by this patch and there is a better way to solve this issue. But we feel
  that required investigation and bigger changes are more appropriate for
  Upstream.
@dlenev
Copy link
Contributor Author

dlenev commented Jun 12, 2024

Created PRs for 8.0.37 and 8.4.0 release branches instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant