Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainer #759

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

Trainer #759

wants to merge 25 commits into from

Conversation

Britefury
Copy link
Contributor

New batch and trainer modules with batch iteration helper and Trainer class respectively, as per issue #756.

Britefury and others added 4 commits October 23, 2016 11:44
Completed tests in `test_batch`.
Implemented alternative `mnist_trainer` example that uses `Trainer`, along with changes necessary to fit Trainer model.
Documentation for `trainer` and `batch` modules.
Fixes to `Trainer`.
@Britefury
Copy link
Contributor Author

Ready for review (@f0k).
There is a slight drop in coverage due to one RuntimeError that is raised in conditions that should never arise and a couple of logging statements that aren't run.

@Britefury
Copy link
Contributor Author

@f0k, @benanne This PR contains the code that I am offering with respect to issue #756.

@benanne: I like your proposal with respect to having a train function to which you provide a model, a loss to minimise, data and a function for computing updates (adam/nesterov/etc).

This PR does not go that far; it does the fit-loop in Keras terminology.

You provide functions to train and evaluate a single mini-batch, along with the number of epochs, batch size, callbacks, etc. The train function supplied (perhaps it should be renamed train_loop) runs the loop (optionally perming early termination), monitors the performance on validation and test sets, etc.

Would you like to review this code and guide me as to how to proceed?

@benanne
Copy link
Member

benanne commented Nov 12, 2016

I'm currently abroad so it might be a while until I can take a look at it, unfortunately... so if anyone else wants to review be my guest. I'll make a note on my calendar to take a look when I'm back, though!

…reporting batch-by-batch progress. This makes the `VERBOSITY_BATCH` verbosity option obsolete, hence its removal.
…sets.

Partial refactor of tests for improved understandability.
train_batch_func=train_fn, train_log_func=train_log_func,
eval_batch_func=eval_fn, eval_log_func=eval_log_func,
num_epochs=num_epochs, verbosity=lasagne.trainer.VERBOSITY_EPOCH,
layer_to_restore=network, shuffle_rng=lasagne.random.get_rng())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a pretty complicated API, especially with the log functions. I'm not a big fan. It hides the for-loop, which is nice if you're happy with a lot of defaults, but in this case it ends up requiring a bunch of lambdas to be able to customise things, which is a lot less readable than just having that stuff in the body of a for loop. I think this is probably not the best API for this use case.

post_epoch_callback=None, progress_iter_func=None,
verbosity=VERBOSITY_EPOCH, log_stream=sys.stdout,
log_final_result=True, get_state_func=None, set_state_func=None,
layer_to_restore=None, updates_to_restore=None, shuffle_rng=None):
Copy link
Member

@f0k f0k Nov 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @benanne, this may be a bit heavy on options. My idea was to have a relatively simple train function that calls a train_epoch function in a loop, and when you need anything more complex, you'd call the train_epoch function yourself and extend the loop body with whatever you need (learning rate updates, early stopping, ...). I'd need more time to think this through, though, and I won't have that time soon. Sorry for delaying this for so long, we certainly appreciate your efforts and I would love to continue this discussion in one or two months! We definitely want training code in Lasagne, but first we need to manage releasing v0.2 at all...

… `batch` module and renamed it to `mean_batch_apply`.

The `mean_batch_apply` and `batch_apply` functions in the `batch` module extract mini-batches of data from a dataset and pass them to a supplied function.
They also gather the results and return their mean or the results as is, respectively.
…tomised by passing a string template rather than a callable / lambda.

`mnist_trainer`: simplified `train()` call as some args were redundant due to using default values.
…pply` functions to `batch_map` and `mean_batch_map` respectively.
@Britefury
Copy link
Contributor Author

Britefury commented Nov 28, 2016

@benanne @f0k With a view to (maybe only partially) addressing some of your concerns, I have made some changes.

There are the new batch_map and mean_batch_map functions in the batch module that are mini-batch driven versions of map. mean_batch_map differs in that it returns the mean result across all samples, e.g. mean of the losses or error rates. I have modified the mnist example as a demo; the batch loops are gone and replaced by calls to mean_batch_map. I think that this is some way towards @f0k's train_epoch suggestion.

In attempt to adderss @benanne's specific concerns, I have simplified the train_log_func and epoch_log_func arguments of the train function so that you can pass a string (template) whose format method is invoked to generate the log message. These arguments are not strictly necessary; the defaults will report the results just fine. Customising the output makes things more readable though. The mnist_trainer example has been simplified as the verbosity and shuffle_rng arguments were redundant as the values passed were the defaults anyway.

train certainly does have a lot of arguments. In many cases you won't need to use most of them as the defaults will do. I have however found early stopping, network state saving and restoring and various other features there to be very useful, hence their inclusion.

I realise that this is not quite what you want, as is. I understand that you are looking for an API that works along the lines of train_network(final_layer, loss_to_minimise, metrics_to_evaluate, training_set, val_set, test_set, batch_size, ...). It is my intention to get there. Such an API would be very handy in most cases. It would not be so helpful in scenarios such as training GANs, as there isn't a single loss to minimise. In such cases, something like the training loop that I have implemented can be used as you have a greater ability to customise. I would therefore suggest that your final goal API could be built on top of a training loop that works somewhat like that which I have implemented.

Perhaps a good way forward would be to split this into two PRs? The first PR could contain just the batch module with batch_map and mean_batch_map, with the second containing the trainer module that builds on top of batch.

…w accept iterators as data sets.

`batch` module: `mean_batch_map` now takes the `sum_axis` argument instead of `func_returns_sum` as it is more flexible and easier to understand.
Fixed usages of the above APIs in `trainer` to account for changes.
Removed unnecessary imports from `trainer`.
@f0k
Copy link
Member

f0k commented Dec 13, 2016

I realise that this is not quite what you want, as is. I understand that you are looking for an API that works along the lines of train_network(final_layer, loss_to_minimise, metrics_to_evaluate, training_set, val_set, test_set, batch_size, ...).

Yes, I think that's about what I had in mind. Although maybe the user should still provide the updates dictionary. The API for creating the updates dictionary is not too complicated, and it's difficult to customize if it's created inside train_network. But a similar argument can be made for the compiled functions -- maybe train_network shouldn't do everything, but accept the compiled functions (train_fn, test_fn), and there should be an API for creating those. We really need to sit down and think through different options and use cases. But I don't know when I'll come to this.

Perhaps a good way forward would be to split this into two PRs? The first PR could contain just the batch module with batch_map and mean_batch_map, with the second containing the trainer module that builds on top of batch.

I'm not convinced yet these should be separate modules. The only argument is that the batch iteration code can also be used for testing, so it doesn't really fit the lasagne.trainer module name.

train certainly does have a lot of arguments. In many cases you won't need to use most of them as the defaults will do. I have however found early stopping, network state saving and restoring and various other features there to be very useful, hence their inclusion.

A nicer API for this might be something like:

for epoch, errors in training_loop(...):
    print whatever you want
    save whatever you want
    update the learning rate when you want
    early stop when you want

There could be convenience functions for some of these. And well, yes, there could also be a function that encapsulates all this (that's the multiple levels of abstraction Sander and me were referring to). But then maybe the training loop should become a class that you can register such callbacks with, so you don't have a function with a myriad of arguments, but something like:

trainer = Trainer(...)
trainer.after_epoch(LogResults(your_format_string))
trainer.after_epoch(ScaleParam(eta, 0.95))
trainer.at_epoch(100, ScaleParam(eta, 0.1))
trainer.run()

Again, the design space is huge, and we need to properly sit down to decide what to include in Lasagne.

Britefury and others added 3 commits January 17, 2017 18:30
…at prevents the state from being stored or the validation score improvements from being detected until a specified epoch.
@Britefury
Copy link
Contributor Author

@f0k, I hope you don't mind, but I decided to go ahead with the separate PR approach. I think that the design space is sufficiently large that it warrants building the training loop up in small pieces. The data_source module in PR #801 works toward this. I would like to take this 'smaller steps' approach as getting there in one leap could involve quite a lot of back and forth, with all the refactoring that goes along the way. If we can build and agree on it piece by piece, I think that I will find the process more manageable. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants