New feature: Skip existed files when downloading folders #335

GeekDream-x · 2024-03-18T02:30:29Z

Background

Due to some accidents, some users encounter the problems that the download tasks break up unexpectedly. When downloading just a single file, it is acceptable to re-download it from the beginning. However, when downloading a folder, it is a waste of time to re-download the files downloaded before the break.

Update:

Add a new parameter skip_existed to the function download_folder().
Print the skipped files if quite is set to False.

Example Python Snippet:

import gdown
import os

output_dir = "./xxx/your_target_dir"
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
url = "https://drive.google.com/drive/folders/xxxxxxxxxxxxxxxxxxxxx=sharing"
gdown.download_folder(url, output=output_dir, remaining_ok=True, quite=False, skip_existed=True)

genghisun · 2024-04-08T06:29:40Z

Thanks! This feature is very useful.
But why not set skip_existed default to True?

GeekDream-x · 2024-04-08T07:06:56Z

Thanks! This feature is very useful. But why not set skip_existed default to True?

@genghisun Thanks! You are right and I have changed the default value to True in the new commit.

genghisun · 2024-04-09T14:54:49Z

Hi~ Sorry to bother you again.
Today I realized that single file download also needs the skip_existed parameter, especially when downloading a list with multiple files, just like example below. Because in case the middle file in the list download fails, running the code again will re-download all the previous files, which is unnecessary.

import gdown

url_list = [
    'https://drive.google.com/file/d/xxxx',
    'https://drive.google.com/file/d/xxxx',
    'https://drive.google.com/file/d/xxxx',
]

for url in url_list:
    gdown.download(url, fuzzy=True, skip_existed=True)

Also, it is more uniform to have this parameter for both download and download_folder.
The implementation can be done by checking if the output file exists at https://github.com/wkentaro/gdown/blob/main/gdown/download.py#L280.

GeekDream-x · 2024-04-11T07:19:11Z

Hi~ Sorry to bother you again. Today I realized that single file download also needs the skip_existed parameter, especially when downloading a list with multiple files, just like example below. Because in case the middle file in the list download fails, running the code again will re-download all the previous files, which is unnecessary.
import gdown

url_list = [
    'https://drive.google.com/file/d/xxxx',
    'https://drive.google.com/file/d/xxxx',
    'https://drive.google.com/file/d/xxxx',
]

for url in url_list:
    gdown.download(url, fuzzy=True, skip_existed=True)
Also, it is more uniform to have this parameter for both download and download_folder. The implementation can be done by checking if the output file exists at https://github.com/wkentaro/gdown/blob/main/gdown/download.py#L280.

@genghisun Sounds useful. I have updated download.py and please help to check whether it is reliable.

wkentaro · 2024-05-12T06:37:55Z

Use resume=True introduced by #288, and it will skip downloading files if they already exist.

Skip existed files when downloading folders

63685ed

Update the default value for "skip_existed"

f65a33d

Add skip_existed in download.py

c8693d8

wkentaro closed this May 12, 2024

wkentaro self-assigned this May 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New feature: Skip existed files when downloading folders #335

New feature: Skip existed files when downloading folders #335

GeekDream-x commented Mar 18, 2024 •

edited

genghisun commented Apr 8, 2024

GeekDream-x commented Apr 8, 2024

genghisun commented Apr 9, 2024

GeekDream-x commented Apr 11, 2024

wkentaro commented May 12, 2024

New feature: Skip existed files when downloading folders #335

New feature: Skip existed files when downloading folders #335

Conversation

GeekDream-x commented Mar 18, 2024 • edited

Background

Update:

Example Python Snippet:

genghisun commented Apr 8, 2024

GeekDream-x commented Apr 8, 2024

genghisun commented Apr 9, 2024

GeekDream-x commented Apr 11, 2024

wkentaro commented May 12, 2024

GeekDream-x commented Mar 18, 2024 •

edited