Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time and resource usage to download the dataset #55

Open
aiPenguin opened this issue May 16, 2024 · 1 comment
Open

Time and resource usage to download the dataset #55

aiPenguin opened this issue May 16, 2024 · 1 comment

Comments

@aiPenguin
Copy link

Hi,

can anyone share your resource usage and settings for downloading the dataset?

We have tried to download the 10M sub-dataset in 720p without audio. But it requires more than 15K CPU hours.

Is there anything wrong?

Thx.

@bonlime
Copy link

bonlime commented Jun 12, 2024

this is because you're re-encoding the downloaded videos when splitting. Try adding:

subsampling:
    ClippingSubsampler:
        args:
            precision: keyframe_adjusted

in the config and it would orders of magnitude faster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants