Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Atomic locking of createExecDir #1353

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jerrymarino
Copy link
Contributor

Summary: today we reference an Entry after it has been "downloaded". Today we expire during execution and InputFetch stage when Entries for keys either existed on the file system or were loaded from the distributed store in the InputFetch operation. Will also mutate existing entries due to deleting the reference to the last inode of a file. The latter manfiests in several ways e.g. like failing the report stage of a concurrent action.

Prior to CASFileCache.putDirectory we increment keyReferences to prevent eviction during execution. This provides atomicity guarantees around the keys within directories while minimizing mutexes.

Fixes my contention with createExecDir / and my write-to-CAS failures which manifested as a result.

TLDR:

Logical and atomic locking of an execDir by gaining references to keys via the new method lockDirectoryKeys. Because there are many references to a single Entry this needs to be reference counted for each concurrent execDir and must upheld through the entire lifetime of the directory.

Testing:

I have a number of manual integration tests that need reworking so I can add them here and will add these as part of the PR to get it out of the Draft state

@jerrymarino jerrymarino requested a review from werkt as a code owner May 30, 2023 19:26
@jerrymarino jerrymarino force-pushed the jmarino/exec_dir_locking branch 2 times, most recently from 92bdeac to 5e37811 Compare May 30, 2023 19:35
Summary: today we reference an `Entry` after it has been "downloaded".
Today we expire _during_ execution and `InputFetch` stage when `Entries`
for keys either existed on the file system or were loaded from the
distributed store in the `InputFetch` operation. Will also mutate
existing entries due to `deleting` the reference to the last inode of a
file. The latter manfiests in several ways e.g. like failing the report
stage of a concurrent action.

Prior to `CASFileCache.putDirectory` we increment `keyReferences` to
prevent eviction during execution. This provides atomicity gaurentees
around the _keys_ within  directories while minimizing mutexes.

Fixes my contention with `createExecDir` / and my write-to-CAS failures
which manifested as a result.

TLDR:

Logical and atomic locking of an `execDir` by gaining references to keys
via the new method `lockDirectoryKeys`. Because there are many
references to a single `Entry` this needs to be refernce counted for
each concurrent `execDir` and must upheld through the entire lifetime of
the directory.

Testing:

I have a number of manual integration tests that need reworking so I can
add them here and will add these as part of the PR to get it out of the
`Draft` state
@jerrymarino jerrymarino force-pushed the jmarino/exec_dir_locking branch 2 times, most recently from c020ae0 to 70a2b6d Compare May 30, 2023 22:45
- move loop directory out of critical section
- improve logging/method naming
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant