PS-9092: Data inconsistencies when high rate of pages split/merge #5249
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
https://perconadev.atlassian.net/browse/PS-9092
Problem:
Query over InnoDB table that uses backward scan over the index occasionally
might return incorrect/incomplete results when changes to table (for example,
DELETEs in other or even the same connection followed by asynchronous purge)
cause concurrent B-tree page merges.
Cause:
The problem occurs when persistent cursor which is used to scan over index
in backwards direction stops on infimum record of the page to which it points
currently and releases all latches it has, before moving to the previous page.
At this point merge from the previous page to cursor's current one can happen
(because cursor doesn't hold latch on current or previous page). During this
merge records from the previous page are moved over infimum record and placed
before any old user records in the current page. When later our persistent
cursor resumes its iteration it might use optimistic approach to cursor
restoration which won't detect this kind of page update and resumes the
iteration right from infimum record, effectively skipping the moved records.
Solution:
This patch solves the problem by forcing persisted cursor to use pessimistic
approach to cursor restoration in such cases. With this approach cursor
restoration is performed by looking up and continuing from user record
which preceded infimum record when cursor stopped iteration and released
the latches. Indeed, in this case records which were moved during the merge
will be visited by cursor as they precede this old-post-infimum record
in the page.
This forcing of pessimistic restore is achieved by increasing page's
modify_clock version counter for the page merged into, when merge happens
from the previous page (normally this version counter is only incremented
when we delete records from the page or the whole page).
Theoretically, this might be also done when we are merging into page the
page which follows it. But it is not clear if it is really required, as
forward scan over the index is not affected by this problem. In forward
scan case different approach to latching is used when we switch
between B-tree leaf pages - we always acquire latch on the next page
before releasing latch on the current one. As result concurrent merges
from the next page to the current one are blocked.
Note that the same approach to latching can't be used for backward
iteration as it will mean that latching happens into opposite order
which will lead to deadlocks.
It is quite possible that there are move scenarios which should be covered
by this patch and there is a better way to solve this issue. But we feel
that required investigation and bigger changes are more appropriate for
Upstream.