-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data process for pre-training and fine-tuning #393
Labels
Comments
This issue is stale because it has been open for 7 days with no activity. |
This issue was closed because it has been inactive for 7 days since being marked as stale. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Here you said prepare a 10M dataset. What is it composed of, panda-10m and HD-VG-130M? How much of the HD-VG dataset has been used? The pre-training has 9.7M videos. Does this mean that the processing pipeline only filtered out 3% of the videos? What processing steps were involved in the pre-training, and what processing steps were involved in the fine-tuning? What filtering thresholds were used for each?
The text was updated successfully, but these errors were encountered: