Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The downloaded videos are compressed to low quality #48

Open
jianzongwu opened this issue Apr 23, 2024 · 9 comments
Open

The downloaded videos are compressed to low quality #48

jianzongwu opened this issue Apr 23, 2024 · 9 comments

Comments

@jianzongwu
Copy link

jianzongwu commented Apr 23, 2024

00005
00016

Hello, I used the download script to download validation videos but found that they are compressed a lot and, as a result, of low quality. Do you have any idea about this? I guess this may be caused by YouTube compressing the videos when downloading.

The images above are extracted frames and there seem to be many pixel patches, especially on the background

Does anyone have the same issue as I do?

@tsaishien-chen
Copy link
Contributor

Hi @jianzongwu,
Thanks for your interest in our dataset and sorry the late reply.
What is the resolution of the download video? You can set the target resolution here.
You can also check the yt-dlp download format here. I think the problem can be solved if you always download "the best video" (i.e., using "bv" tag)

@jianzongwu
Copy link
Author

Hello, I tried to set the target resolution to 720p and 360p. The downloaded videos are as the resolution, but both have compressed patches.

How to use the 'bv' tag in the downloading script?

@jianzongwu
Copy link
Author

jianzongwu commented Apr 28, 2024

config-720.mp4

I tried again to download a 720p video. The downloaded video is 720x1280, but it is still severely compressed.

It's not the resolution problem. The problem is that the downloaded videos do not look the same as YouTube videos if you look at them online.

I'm wondering if anyone has the same problem as I have.

@tsaishien-chen
Copy link
Contributor

Hi @jianzongwu,

Could you please try to run this command:
yt-dlp -f "bv[ext=mp4]" "https://www.youtube.com/watch?v=gsnqXt7d1mU"
and see whether the downloaded video has severe compression?

@jianzongwu
Copy link
Author

Interestingly, it is not compressed. I followed your command and downloaded the sunrise video with bv. It successfully gave me a high-quality video with the largest resolution. No compression phenomenon was found.

I also downloaded the 360p version of this video by this command:
yt-dlp -f "wv*[height>=360][ext=mp4]" "https://www.youtube.com/watch?v=gsnqXt7d1mU"
It downloads the 360p video, and also no compression is found.

I notice a difference between my downloading previous times and this time. The compressed videos can not be identified and opened in VSCode. I must download the videos to my local machine and open it with MP4 player. However, the successfully downloaded videos without compression can be opened directly by VSCode.

So, the problem is in the codebase, where it calls yt-dlp and saves the videos?

@jianzongwu
Copy link
Author

subsampling: {}

reading:
    yt_args:
        download_size: 360
        download_audio: False
        yt_metadata_args:
            writesubtitles: False
            subtitleslangs: ['en']
            writeautomaticsub: False
            get_info: False
    timeout: 60
    sampler: null

storage:
    number_sample_per_shard: 100
    oom_shard_count: 5
    captions_are_subtitles: False

distribution:
    processes_count: 32
    thread_count: 32
    subjob_size: 10000
    distributor: "multiprocessing"

This is my config.

@tsaishien-chen
Copy link
Contributor

tsaishien-chen commented Apr 29, 2024

Hi @jianzongwu,

So, the problem is in the codebase, where it calls yt-dlp and saves the videos?

Yes, could you please print these lines and try to run yt-dlp command using the same option and see whether you will get the compressed videos?
Also, one more thing you can do: you can try to check whether the original video (before splitting) is also compressed.
To do that, you can reduce the number of parallel process to 1 and set a breakpoint after the first video is processed (before the original video is deleted).

@jianzongwu
Copy link
Author

jianzongwu commented Apr 29, 2024

Hi, I have solved the problem.

It is caused by ffmpeg, it compress the videos when splitting the videos by timestamps.

I did not find the calling place of ffmpeg in the codebase, so I re-write a download script myself based on hf_download.py and manually set the "-q:v" parameter in ffmpeg to "0" (means no compression) and the extracted frames is as nice as the original video.

Feel free to close this issue.

@tsaishien-chen
Copy link
Contributor

Hi @jianzongwu,

Great to hear you solved the problem! And thanks for providing the useful information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants