Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running train_op took too long ?? #24

Open
auzyze opened this issue Apr 5, 2019 · 2 comments
Open

running train_op took too long ?? #24

auzyze opened this issue Apr 5, 2019 · 2 comments

Comments

@auzyze
Copy link

auzyze commented Apr 5, 2019

Thanks for sharing this great work!

I run into this issue when training ours_savp on kth dataset, the training looks going properly, but is very slow.

running train_op took too long (7.2s)
running train_op took too long (7.2s)
.....
progress  global step 100  epoch 0.5
          image/sec 1.1  remaining 37520m (625.3h) (26.1d)
d_loss 0.10482973
   discrim_video_sn_gan_loss (0.5204395, 0.1)
   discrim_video_sn_vae_gan_loss (0.5278577, 0.1)
g_loss 2.0725453
   gen_l1_loss (0.016228592, 100.0)
   gen_video_sn_gan_loss (0.32749984, 0.1)
   gen_video_sn_vae_gan_loss (0.35494953, 0.1)
   gen_video_sn_vae_gan_feature_cdist_loss (0.038144115, 10.0)
   gen_kl_loss (0.6190775, 0.0)
learning_rate 0.0002
running train_op took too long (7.2s)
running train_op took too long (7.2s)
running train_op took too long (7.3s)
......
......

My configuration:
tensorflow: 1.10.0
cuda: 9.0
cudnn: 7.3.0.29

I'm running KTH dataset with ours_savp model. When I use default hparms, I got out of memory error, so I change batch_size=8.

My GPU looks working properly:
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40c Off | 00000000:02:00.0 Off | 0 |
| 37% 73C P0 124W / 235W | 10963MiB / 11441MiB | 76% Default |
+-------------------------------+----------------------+----------------------+

Tensorboard refreshes when summery_freq is reahced.

Appreciate for any suggestions.
Regards,

@nishokkumars
Copy link

@alexlee-gk , could you please help? I am facing the same issue.

@Berndinio
Copy link

I am also facing the same issue. It seems like it is only a print in the train.py file line 267.
Due to they are only printing it and nothing further happens in the if state and the checked variables arent used lateron, i assume it is training correctly. Maybe was originally trained on better gpus or tpus.
As you can see, the sess.run() (where the time is measured from) is always executed. You can just add a tqdm to the for-loop it is in. So you can see the progress

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants