Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strengthen locking in FsBlobContainer register impl #107830

Conversation

DaveCTurner
Copy link
Contributor

Expands the JVM-wide mutex to prevent all concurrent operations on
file-based registers, but then introduces an artificial mechanism for
emulating write/write contention within a single JVM.

Expands the JVM-wide mutex to prevent all concurrent operations on
file-based registers, but then introduces an artificial mechanism for
emulating write/write contention within a single JVM.
@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.15.0 labels Apr 24, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Apr 24, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, good idea about the write contention simulation. Can we add tests to demonstrate that contended reads no longer report MISSING (or fail on the assertion)?

@DaveCTurner
Copy link
Contributor Author

Yep, and I think we can strengthen the repo analysis too

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

try {
return doUncontendedCompareAndExchangeRegister(registerPath, expected, updated);
} finally {
mutex.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not handled by the try-with-resource above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d'oh yes it is

@DaveCTurner DaveCTurner added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Apr 24, 2024
@elasticsearchmachine elasticsearchmachine merged commit fee9097 into elastic:main Apr 24, 2024
14 checks passed
@DaveCTurner DaveCTurner deleted the 2024/04/24/FsBlobContainer-register-locking branch April 24, 2024 13:29
if (r.isPresent()) {
l.onResponse(r);
} else {
l.onFailure(new IllegalStateException("register read failed due to contention"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveCTurner raising your attention that it seems to fail AzureSnapshotRepoTestKitIT/testRepositoryAnalysis (see #108504)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping, clearly a bug: #108900

@DaveCTurner DaveCTurner restored the 2024/04/24/FsBlobContainer-register-locking branch June 17, 2024 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team >test Issues or PRs that are addressing/adding tests v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants