Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature: Skip existed files when downloading folders #335

Closed
wants to merge 3 commits into from

Conversation

GeekDream-x
Copy link

@GeekDream-x GeekDream-x commented Mar 18, 2024

Background

Due to some accidents, some users encounter the problems that the download tasks break up unexpectedly. When downloading just a single file, it is acceptable to re-download it from the beginning. However, when downloading a folder, it is a waste of time to re-download the files downloaded before the break.

Update:

  • Add a new parameter skip_existed to the function download_folder().
  • Print the skipped files if quite is set to False.

Example Python Snippet:

import gdown
import os

output_dir = "./xxx/your_target_dir"
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
url = "https://drive.google.com/drive/folders/xxxxxxxxxxxxxxxxxxxxx=sharing"
gdown.download_folder(url, output=output_dir, remaining_ok=True, quite=False, skip_existed=True)

@genghisun
Copy link

Thanks! This feature is very useful.
But why not set skip_existed default to True?

@GeekDream-x
Copy link
Author

Thanks! This feature is very useful. But why not set skip_existed default to True?

@genghisun Thanks! You are right and I have changed the default value to True in the new commit.

@genghisun
Copy link

Hi~ Sorry to bother you again.
Today I realized that single file download also needs the skip_existed parameter, especially when downloading a list with multiple files, just like example below. Because in case the middle file in the list download fails, running the code again will re-download all the previous files, which is unnecessary.

import gdown

url_list = [
    'https://drive.google.com/file/d/xxxx',
    'https://drive.google.com/file/d/xxxx',
    'https://drive.google.com/file/d/xxxx',
]

for url in url_list:
    gdown.download(url, fuzzy=True, skip_existed=True)

Also, it is more uniform to have this parameter for both download and download_folder.
The implementation can be done by checking if the output file exists at https://github.com/wkentaro/gdown/blob/main/gdown/download.py#L280.

@GeekDream-x
Copy link
Author

Hi~ Sorry to bother you again. Today I realized that single file download also needs the skip_existed parameter, especially when downloading a list with multiple files, just like example below. Because in case the middle file in the list download fails, running the code again will re-download all the previous files, which is unnecessary.

import gdown

url_list = [
    'https://drive.google.com/file/d/xxxx',
    'https://drive.google.com/file/d/xxxx',
    'https://drive.google.com/file/d/xxxx',
]

for url in url_list:
    gdown.download(url, fuzzy=True, skip_existed=True)

Also, it is more uniform to have this parameter for both download and download_folder. The implementation can be done by checking if the output file exists at https://github.com/wkentaro/gdown/blob/main/gdown/download.py#L280.

@genghisun Sounds useful. I have updated download.py and please help to check whether it is reliable.

@wkentaro
Copy link
Owner

Use resume=True introduced by #288, and it will skip downloading files if they already exist.

@wkentaro wkentaro closed this May 12, 2024
@wkentaro wkentaro self-assigned this May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants