Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsspec-cache-clear: add config variables for clearing cache up #31

Open
yarikoptic opened this issue Dec 14, 2021 · 3 comments
Open

fsspec-cache-clear: add config variables for clearing cache up #31

yarikoptic opened this issue Dec 14, 2021 · 3 comments
Assignees
Labels

Comments

@yarikoptic
Copy link
Member

ATM the entire cache is cleaned up by the command.

  • Analogously to https://github.com/joblib/joblib/pull/1200/files we need here to allow for age (or atime_age, as in "last accessed", not when was created), and size limits, so we could keep caches "under control".
    • I think those should be DataLad configuration options. Then they could be made universally applicable (through centralized configuration) or specified on command line (or in Python interface, although there not as easily ATM iirc). So let's make them
    • I think we do not have formalized way to extend the dictionary of known config variables (filed formalize extending common_cfg.definitions (and other things) in extensions? datalad#6314) so we can simply use/document such config variables in this extension: datalad.fsspec.cache-clear.age and datalad.fsspec.cache-clear.size which would default to None (not considered)
  • Ideally we should have also allowed for an option only_in_file_tree to remove cached items which are no longer available in the current git (annex) file tree (of current commit). But 1). I do not think it would be easily possible without adding some extra layer of tracking; 2). such items would eventually could be picked up through age. So, let's not bother about this one
@yarikoptic yarikoptic changed the title fsspec-cache-clear: add options for clearing cache up fsspec-cache-clear: add config variables for clearing cache up Dec 14, 2021
@jwodder
Copy link
Member

jwodder commented Dec 14, 2021

@yarikoptic While fsspec has the necessary pieces to implement such a cleanup in datalad-fuse, the code would end up relying largely on undocumented implementation details of how fsspec structures its cache metadata. Are you sure that's the best idea?

@yarikoptic
Copy link
Member Author

I agree that we must not implement something based on some undocumented implementation details. I think then an issue (or better a PR) should be filed with fsspec to expose (and documenting) interface(s) to provide such functionality. In the minimalistic case it could be some get_cache_entries function which would return entries within cache with all needed attributes (path, atime, size) so our code then could implement the rest.

@jwodder
Copy link
Member

jwodder commented Dec 14, 2021

@yarikoptic Issue created: fsspec/filesystem_spec#857

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants