Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow not using symlinks when fetching files #2476

Draft
wants to merge 20 commits into
base: develop
Choose a base branch
from

Conversation

asumagic
Copy link
Collaborator

@asumagic asumagic commented Mar 27, 2024

What does this PR do?

Currently this is just mocking up APIs and writing proofs-of-concept.

The goal of this PR would ultimately be to explore ways to avoid using symbolic links in most cases in SB (which should e.g. definitely not be done when using load_audio against local paths in inference APIs...)

As of the current state of the PR:

  • fetch is extended to accept a local_strategy parameter. This parameter controls the behavior for downloading and copying/linking files around.
  • Most stuff within SB that indirectly calls fetch now accepts and forwards the same parameter.
  • The default strategy is switched from SYMLINK to NO_LINK on all platforms (can be debated).
  • Inference interfaces now explicitly use NO_LINK by default since polluting the working directory with audio files is largely undesired.
  • Fixed up parameter transfer logic to handle NO_LINK behavior.
  • Explicitly set the local strategy the in some recipe dataset download scripts.

Resolves #1070, resolves #1278, resolves #2197, resolves #1885, resolves #1155, resolves #1586

Supersedes #1303

Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

@Adel-Moumen
Copy link
Collaborator

Ping @pplantinga and @Gastron as they both have a good understanding of the current issue and SB codebase. Could you please share your thoughts on this issue so that we can move in a valuable direction please? Thanks :)

Copy link
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change, LGTM!

@@ -58,38 +158,68 @@ def fetch(
use_auth_token=False,
revision=None,
huggingface_cache_dir=None,
local_strategy: LocalStrategy = LocalStrategy.SYMLINK,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if we still want the default to be SYMLINK here. I appreciate that we might want to keep the default the same but perhaps we could add a warning and change it in the next version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this more and I think it might make sense to switch to NO_LINK by default. I was wary of doing such a change but I don't think most things implicitly rely on this, and it's usually a simple fix or workaround.

CI passes when changing the default after one change in parameter transfer logic.

It should be plenty documented though... I don't want users to be too surprised when they see an empty directory where the pretrained stuff should be.

speechbrain/utils/fetching.py Show resolved Hide resolved
@Adel-Moumen Adel-Moumen added this to the v1.0.2 milestone May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment