Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling hf-transfer by default #2279

Closed
AlpinDale opened this issue May 13, 2024 · 3 comments
Closed

Enabling hf-transfer by default #2279

AlpinDale opened this issue May 13, 2024 · 3 comments

Comments

@AlpinDale
Copy link

hf_transfer, to my knowledge, has become very stable recently. I use it daily, and I find it a bit cumbersome that we have to manually install the package, then export a very long env variable to finally have access to faster downloads. I believe it's about time it was made the default behaviour for huggingface_hub. Thoughts? I'm sure many others in the community same the share belief as me.

@Wauplin
Copy link
Contributor

Wauplin commented May 22, 2024

Thanks @AlpinDale for raising the question. hf_transfer is indeed quite stable (at least we don't make changes to it very often). For the record, it is enabled by default on all Spaces for example. However, it is not the best solution for everyone for several reasons:

  • hf_transfer is faster only when the bandwidth allows it. This is the case on clusters / machines with good connections but on normal or slow connections, it does not bring any benefit. In some cases it even deteriorates speed because of the multi-process overload.
  • hf_transfer maxes-out the CPU cores. This makes it very unsuitable for parallelism. In huggingface_hub we make sure to not parallelize download but users could launch several processes in parallel and in such a case, it would spawn "N_user_processes * N_cores" processes which would completely bloat the CPU. Maxing out the CPU can also lead to a very deteriorated UX on the user machine (think "everything is frozen").
  • hf_transfer does not handle proxies and don't have a retry mechanism. It is also not possible to resume a stopped download. All of this is doable with the normal implementation based on requests.
  • Progress bars are not as good as with the requests implementation (updates only every 50MB). For slow connections that means a poor user experience.
  • in some cases if the Python process gets terminated, some daemon hf_transfer processes are still running

So all things considered, hf_transfer is stable enough for a lot of use cases but we are not aiming at making it the default. The best way to enable it is to set

  • HF_HUB_ENABLE_HF_TRANSFER=1 in your .bashrc-like file on machines you manage
  • add huggingface_hub[hf_transfer] to your requirements.txt-like file

@Wauplin Wauplin closed this as completed May 22, 2024
@julien-c
Copy link
Member

@AlpinDale out of curiosity, is your use case in the context of a CLI command, or from Python code?

@AlpinDale
Copy link
Author

Sorry, I was away a for bit. Thanks for answering, @julien-c

I use both the CLI and the python API. For now, I can manage by exporting the hf_transfer env variable in my bashrc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants