-
Notifications
You must be signed in to change notification settings - Fork 951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross mini-batch gradients average #826
Comments
Hey, sorry for the late reply. Your implementation looks correct, except that having a separate |
Thinking again, while a modular implementation would be feasible (it'd need an interface like def accumulate_gradients(loss_or_grads, params):
grads = get_gradients(...) # as in the other update functions
accumulate = OrderedDict()
reset = OrderedDict()
for param, grad in zip(params, grads):
accu = theano.shared(np.zeros_like(...)) # as in some other update functions
accumulate[accu] = accu + grad
reset[accu] = accu * 0
return accumulate, reset Then you could do: accu, reset = lasagne.updates.accumulate_gradients(loss, params)
apply = lasagne.updates.adam(accu.keys(), params, ...)
apply.update(reset)
train_fn1 = theano.function([input_var, target_var], loss, updates=accu)
train_fn2 = theano.function([], [], updates=apply) And voilà, there's one function to call on every minibatch and another to call every now and then. (It now sums the gradients instead of averaging them, but this could be fixed as well using a counter and Welford's method.) |
Hi, I'm really interested in the same implementation, on adam. I tried to implement the solution of @f0k but it didn't work. May I have the full implementation with adam you've done? thank you |
@dataintensiveapplication following is my code, enjoy it and happy coding.
|
GPU RAM are always rare resources.
Although momentum act as some kind of cross mini-batch gradients average, I learned that it is not good enough, Do you like some updates support mini-batch gradients ?
I try to write some code like this:
I know
count
part need optimization, just fast coding so that I can test it.Following is the result of the test:
The test using lasagne example mnist.py cnn, dropout is removed, maybe later I would run a test with dropout enabled, minibatch=N for the minibatch size, default value in mnist.py is 500, minibatch=N(M) for using M as average in the code above, x100 means every epoch run only 100 mini batch, not through the full samples, 100/500 iters for num of epochs, following two float value are test loss and test accuracy, the right most is the momentum used.
I also writed another adam. It works fine, I would not think lack of GPU memory for sometime.
The text was updated successfully, but these errors were encountered: