Blitz tutorial takes too long to run (GD instead of SGD?) #336

gdalle · 2022-02-18T10:15:30Z

Hi there!
In the 60-minute blitz tutorial (https://fluxml.ai/tutorials/2020/09/15/deep-learning-flux.html), the part where we train a network on CIFAR10 takes longer than expected. Could it be because we actually go through every minibatch in each epoch, instead of sampling only one?
I am specifically referring to this line

model-zoo/tutorials/60-minute-blitz/60-minute-blitz.jl

Line 366 in 52a7b89

for d in train

. Because of it, I feel like we are actually doing a non-stochastic gradient descent, which would explain the large runtime.

darsnack · 2022-02-18T16:54:27Z

train is already a vector of batches (see here), so iterating it in a for-loop will do mini-batch SGD.

But our mini-batches appear to not be so mini...the batch size is 1000! Fixing that should help.

gdalle · 2022-02-18T21:52:14Z

My bad, I was wrong on the meaning of a training epoch! Thanks for your answer, I will try reducing batch size

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blitz tutorial takes too long to run (GD instead of SGD?) #336

Blitz tutorial takes too long to run (GD instead of SGD?) #336

gdalle commented Feb 18, 2022

darsnack commented Feb 18, 2022

gdalle commented Feb 18, 2022

Blitz tutorial takes too long to run (GD instead of SGD?) #336

Blitz tutorial takes too long to run (GD instead of SGD?) #336

Comments

gdalle commented Feb 18, 2022

darsnack commented Feb 18, 2022

gdalle commented Feb 18, 2022