-
Notifications
You must be signed in to change notification settings - Fork 951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trainer #759
base: master
Are you sure you want to change the base?
Trainer #759
Conversation
Completed tests in `test_batch`.
Implemented alternative `mnist_trainer` example that uses `Trainer`, along with changes necessary to fit Trainer model. Documentation for `trainer` and `batch` modules. Fixes to `Trainer`.
Ready for review (@f0k). |
@f0k, @benanne This PR contains the code that I am offering with respect to issue #756. @benanne: I like your proposal with respect to having a This PR does not go that far; it does the You provide functions to train and evaluate a single mini-batch, along with the number of epochs, batch size, callbacks, etc. The Would you like to review this code and guide me as to how to proceed? |
I'm currently abroad so it might be a while until I can take a look at it, unfortunately... so if anyone else wants to review be my guest. I'll make a note on my calendar to take a look when I'm back, though! |
…reporting batch-by-batch progress. This makes the `VERBOSITY_BATCH` verbosity option obsolete, hence its removal.
…sets. Partial refactor of tests for improved understandability.
train_batch_func=train_fn, train_log_func=train_log_func, | ||
eval_batch_func=eval_fn, eval_log_func=eval_log_func, | ||
num_epochs=num_epochs, verbosity=lasagne.trainer.VERBOSITY_EPOCH, | ||
layer_to_restore=network, shuffle_rng=lasagne.random.get_rng()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a pretty complicated API, especially with the log functions. I'm not a big fan. It hides the for-loop, which is nice if you're happy with a lot of defaults, but in this case it ends up requiring a bunch of lambdas to be able to customise things, which is a lot less readable than just having that stuff in the body of a for loop. I think this is probably not the best API for this use case.
post_epoch_callback=None, progress_iter_func=None, | ||
verbosity=VERBOSITY_EPOCH, log_stream=sys.stdout, | ||
log_final_result=True, get_state_func=None, set_state_func=None, | ||
layer_to_restore=None, updates_to_restore=None, shuffle_rng=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @benanne, this may be a bit heavy on options. My idea was to have a relatively simple train
function that calls a train_epoch
function in a loop, and when you need anything more complex, you'd call the train_epoch
function yourself and extend the loop body with whatever you need (learning rate updates, early stopping, ...). I'd need more time to think this through, though, and I won't have that time soon. Sorry for delaying this for so long, we certainly appreciate your efforts and I would love to continue this discussion in one or two months! We definitely want training code in Lasagne, but first we need to manage releasing v0.2 at all...
… `batch` module and renamed it to `mean_batch_apply`. The `mean_batch_apply` and `batch_apply` functions in the `batch` module extract mini-batches of data from a dataset and pass them to a supplied function. They also gather the results and return their mean or the results as is, respectively.
…tomised by passing a string template rather than a callable / lambda. `mnist_trainer`: simplified `train()` call as some args were redundant due to using default values.
…pply` functions to `batch_map` and `mean_batch_map` respectively.
@benanne @f0k With a view to (maybe only partially) addressing some of your concerns, I have made some changes. There are the new In attempt to adderss @benanne's specific concerns, I have simplified the
I realise that this is not quite what you want, as is. I understand that you are looking for an API that works along the lines of Perhaps a good way forward would be to split this into two PRs? The first PR could contain just the |
…w accept iterators as data sets. `batch` module: `mean_batch_map` now takes the `sum_axis` argument instead of `func_returns_sum` as it is more flexible and easier to understand. Fixed usages of the above APIs in `trainer` to account for changes.
Removed unnecessary imports from `trainer`.
Yes, I think that's about what I had in mind. Although maybe the user should still provide the updates dictionary. The API for creating the updates dictionary is not too complicated, and it's difficult to customize if it's created inside
I'm not convinced yet these should be separate modules. The only argument is that the batch iteration code can also be used for testing, so it doesn't really fit the
A nicer API for this might be something like: for epoch, errors in training_loop(...):
print whatever you want
save whatever you want
update the learning rate when you want
early stop when you want There could be convenience functions for some of these. And well, yes, there could also be a function that encapsulates all this (that's the multiple levels of abstraction Sander and me were referring to). But then maybe the training loop should become a class that you can register such callbacks with, so you don't have a function with a myriad of arguments, but something like: trainer = Trainer(...)
trainer.after_epoch(LogResults(your_format_string))
trainer.after_epoch(ScaleParam(eta, 0.95))
trainer.at_epoch(100, ScaleParam(eta, 0.1))
trainer.run() Again, the design space is huge, and we need to properly sit down to decide what to include in Lasagne. |
…at prevents the state from being stored or the validation score improvements from being detected until a specified epoch.
@f0k, I hope you don't mind, but I decided to go ahead with the separate PR approach. I think that the design space is sufficiently large that it warrants building the training loop up in small pieces. The |
New
batch
andtrainer
modules with batch iteration helper andTrainer
class respectively, as per issue #756.