PS-9092: Data inconsistencies when high rate of pages split/merge #5249

dlenev · 2024-03-06T14:05:46Z

https://perconadev.atlassian.net/browse/PS-9092

Problem:

Query over InnoDB table that uses backward scan over the index occasionally
might return incorrect/incomplete results when changes to table (for example,
DELETEs in other or even the same connection followed by asynchronous purge)
cause concurrent B-tree page merges.

Cause:

The problem occurs when persistent cursor which is used to scan over index
in backwards direction stops on infimum record of the page to which it points
currently and releases all latches it has, before moving to the previous page.
At this point merge from the previous page to cursor's current one can happen
(because cursor doesn't hold latch on current or previous page). During this
merge records from the previous page are moved over infimum record and placed
before any old user records in the current page. When later our persistent
cursor resumes its iteration it might use optimistic approach to cursor
restoration which won't detect this kind of page update and resumes the
iteration right from infimum record, effectively skipping the moved records.

Solution:

This patch solves the problem by forcing persisted cursor to use pessimistic
approach to cursor restoration in such cases. With this approach cursor
restoration is performed by looking up and continuing from user record
which preceded infimum record when cursor stopped iteration and released
the latches. Indeed, in this case records which were moved during the merge
will be visited by cursor as they precede this old-post-infimum record
in the page.

This forcing of pessimistic restore is achieved by increasing page's
modify_clock version counter for the page merged into, when merge happens
from the previous page (normally this version counter is only incremented
when we delete records from the page or the whole page).

Theoretically, this might be also done when we are merging into page the
page which follows it. But it is not clear if it is really required, as
forward scan over the index is not affected by this problem. In forward
scan case different approach to latching is used when we switch
between B-tree leaf pages - we always acquire latch on the next page
before releasing latch on the current one. As result concurrent merges
from the next page to the current one are blocked.

Note that the same approach to latching can't be used for backward
iteration as it will mean that latching happens into opposite order
which will lead to deadlocks.

It is quite possible that there are move scenarios which should be covered
by this patch and there is a better way to solve this issue. But we feel
that required investigation and bigger changes are more appropriate for
Upstream.

https://perconadev.atlassian.net/browse/PS-9092 Problem: Query over InnoDB table that uses backward scan over the index occasionally might return incorrect/incomplete results when changes to table (for example, DELETEs in other or even the same connection followed by asynchronous purge) cause concurrent B-tree page merges. Cause: The problem occurs when persistent cursor which is used to scan over index in backwards direction stops on infimum record of the page to which it points currently and releases all latches it has, before moving to the previous page. At this point merge from the previous page to cursor's current one can happen (because cursor doesn't hold latch on current or previous page). During this merge records from the previous page are moved over infimum record and placed before any old user records in the current page. When later our persistent cursor resumes its iteration it might use optimistic approach to cursor restoration which won't detect this kind of page update and resumes the iteration right from infimum record, effectively skipping the moved records. Solution: This patch solves the problem by forcing persisted cursor to use pessimistic approach to cursor restoration in such cases. With this approach cursor restoration is performed by looking up and continuing from user record which preceded infimum record when cursor stopped iteration and released the latches. Indeed, in this case records which were moved during the merge will be visited by cursor as they precede this old-post-infimum record in the page. This forcing of pessimistic restore is achieved by increasing page's modify_clock version counter for the page merged into, when merge happens from the previous page (normally this version counter is only incremented when we delete records from the page or the whole page). Theoretically, this might be also done when we are merging into page the page which follows it. But it is not clear if it is really required, as forward scan over the index is not affected by this problem. In forward scan case different approach to latching is used when we switch between B-tree leaf pages - we always acquire latch on the next page before releasing latch on the current one. As result concurrent merges from the next page to the current one are blocked. Note that the same approach to latching can't be used for backward iteration as it will mean that latching happens into opposite order which will lead to deadlocks. It is quite possible that there are move scenarios which should be covered by this patch and there is a better way to solve this issue. But we feel that required investigation and bigger changes are more appropriate for Upstream.

dlenev · 2024-06-12T21:14:22Z

Created PRs for 8.0.37 and 8.4.0 release branches instead.

dlenev requested a review from satya-bodapati March 6, 2024 14:06

satya-bodapati assigned dlenev Mar 6, 2024

dlenev closed this Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PS-9092: Data inconsistencies when high rate of pages split/merge #5249

PS-9092: Data inconsistencies when high rate of pages split/merge #5249

dlenev commented Mar 6, 2024

dlenev commented Jun 12, 2024

PS-9092: Data inconsistencies when high rate of pages split/merge #5249

PS-9092: Data inconsistencies when high rate of pages split/merge #5249

Conversation

dlenev commented Mar 6, 2024

dlenev commented Jun 12, 2024