-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downloading get stuck at some particular points #21
Comments
Hi @fgvfgfg564, is there any error messages? |
There is no error message. The downloading process just simply got stuck there. We're guessing that perhaps some of the data points were too long and exceeded the upper limit of commands that windows can support, resulting in the error. |
Hello, I am the co-worker of the questioner, thanks for your reply. There is no error message during the download and the same situation occurs in Ubuntu 22.04. However, our later tests revealed that CSV file was not the problem. |
@tsaishien-chen @AliaksandrSiarohin @Secant1998 |
Hi @fgvfgfg564, @Secant1998, @kuno989, |
Hi @tsaishien-chen Here is a graph of usage when downloading the panda70m_training_full.csv data through Spark.
![]() Usage of panda70m_testing.csv in the same configuration. ![]() |
Hi @kuno989, |
Hi @tsaishien-chen , As you can see, the CPU usage spikes up to 90% at the beginning, but after a certain time it does very little work. Below is htop when CPU usage drops. ![]() I'm currently checking and it's working, but it seems to be threadlocked at a certain moment, what do you think? here is version information +++++++ I ran a total of 48 hours of testing, with the following results ![]() Because when I exit video2dataset with ctrl + c, it works momentarily. ![]() |
Hi @kuno989, |
If the downloading of test set (a smaller subset) can work, I think a way to fix this issue is: split the whole csv file into multiple smaller ones and download them by a bash script. |
As you can see in the wandb logs, video2dataset processed 5993 ~ 6100 pieces of data in 48 hours. ![]() ++++++ ![]() |
Hi @tsaishien-chen I have encountered the same issue.We have tried three different datasets: training_full, training_2m, and training_10m. They get stuck after downloading some content, until we manually stop them with Ctrl+C. it seems that the problem is not caused by the CPU and RAM. |
hi @tsaishien-chen Is there any update? |
Hi @itorone: When you terminated the processes, did you see at which lines does the code stuck by checking the command window or htop? Hi @kuno989: For the screenshot below, is it captured after the processes get stuck? |
This is a screenshot of when CPU utilization dropped. Here's another YouTube dataset that I'm using |
I used ffmpeg-4.4.1-amd64-static. But since your machine can work for hdvila-100m which also splits the video by ffmpeg, I don't think ffmpeg is the problem. |
Hi @fgvfgfg564 and @Secant1998, have you solved the problem? Could you please share how you fix the issue? |
I had the same problem on one of my server and it didn't occur on another server. I find the problem is because after finishing downloading the video with yt_download, some threads occupied this file and gave the a lock thus the main python thread cannot continue reading this video file and waiting for this file free, and then it get stuck. One obvious case is when you try to ctrl C this python process, it would throw a file not found error while this file is already completely downloaded in your tmp dir. Yet I don't have anything solutions to this problem and I cannot figure out which process occupies the downloaded video in the tmp dir. But I tried another way to solve it : now that the downloading is actually done, I can just skip this read operation here and continue downloading the rest files so the program won't get stuck. After downloading is finished, do the rest spilt and subsample operation. |
Hi @Qianjx,
Is that correct? May I know your solution for that? Do you set a timeout so if the video cannot be read within time, just ignore that and continue processing the next video? Hi @kuno989: Does this information help you solve the issue? And may I know your solution? Thanks! |
Hi @tsaishien-chen, import portalocker
...
streams = {}
for modality, modality_path in modality_paths.items():
try:
with portalocker.Lock(modality_path, 'rb', timeout=125) as locked_file:
streams[modality] = locked_file.read()
os.remove(modality_path)
except portalocker.exceptions.LockException:
print(f"Timeout occurred trying to lock the file: {modality_path}")
except IOError as e:
print(f"Failed to delete the file: {modality_path}. Error: {e}") And you're right, at line 268, it's still occupying the system, so of course it can't do any more work, so the CPU and RAM utilization will drop over time.
If you have a better solution, please share it! Thanks! |
I tried to download the dataset and it get stuck when I downloaded 13.1G files. The command line just get stuck with no updates, and network stat shows that it has also stopped. Have no idea about what happened. Perhaps some entry point in the csv file causes this?
I shuffled the csv file several times. Each time the stop point is different, ranging from 11G to 14G.
We have tried downloading on Windows and WSL, both leads to the same error. There's no problem with network or disk.
The text was updated successfully, but these errors were encountered: