Trainer #759

Britefury · 2016-10-23T19:21:49Z

New batch and trainer modules with batch iteration helper and Trainer class respectively, as per issue #756.

Completed tests in `test_batch`.

Implemented alternative `mnist_trainer` example that uses `Trainer`, along with changes necessary to fit Trainer model. Documentation for `trainer` and `batch` modules. Fixes to `Trainer`.

Extended test coverage for `batch` and `trainer` modules.

Britefury · 2016-10-26T09:08:14Z

Ready for review (@f0k).
There is a slight drop in coverage due to one RuntimeError that is raised in conditions that should never arise and a couple of logging statements that aren't run.

…o use new API.

…esults.

Britefury · 2016-11-09T12:35:16Z

@f0k, @benanne This PR contains the code that I am offering with respect to issue #756.

@benanne: I like your proposal with respect to having a train function to which you provide a model, a loss to minimise, data and a function for computing updates (adam/nesterov/etc).

This PR does not go that far; it does the fit-loop in Keras terminology.

You provide functions to train and evaluate a single mini-batch, along with the number of epochs, batch size, callbacks, etc. The train function supplied (perhaps it should be renamed train_loop) runs the loop (optionally perming early termination), monitors the performance on validation and test sets, etc.

Would you like to review this code and guide me as to how to proceed?

benanne · 2016-11-12T04:51:54Z

I'm currently abroad so it might be a while until I can take a look at it, unfortunately... so if anyone else wants to review be my guest. I'll make a note on my calendar to take a look when I'm back, though!

…reporting batch-by-batch progress. This makes the `VERBOSITY_BATCH` verbosity option obsolete, hence its removal.

…sets. Partial refactor of tests for improved understandability.

benanne · 2016-11-27T17:50:58Z

examples/mnist_trainer.py

+ train_batch_func=train_fn, train_log_func=train_log_func,
+ eval_batch_func=eval_fn, eval_log_func=eval_log_func,
+ num_epochs=num_epochs, verbosity=lasagne.trainer.VERBOSITY_EPOCH,
+ layer_to_restore=network, shuffle_rng=lasagne.random.get_rng())


This seems like a pretty complicated API, especially with the log functions. I'm not a big fan. It hides the for-loop, which is nice if you're happy with a lot of defaults, but in this case it ends up requiring a bunch of lambdas to be able to customise things, which is a lot less readable than just having that stuff in the body of a for loop. I think this is probably not the best API for this use case.

f0k · 2016-11-27T20:28:42Z

lasagne/trainer.py

+ post_epoch_callback=None, progress_iter_func=None,
+ verbosity=VERBOSITY_EPOCH, log_stream=sys.stdout,
+ log_final_result=True, get_state_func=None, set_state_func=None,
+ layer_to_restore=None, updates_to_restore=None, shuffle_rng=None):


Agree with @benanne, this may be a bit heavy on options. My idea was to have a relatively simple train function that calls a train_epoch function in a loop, and when you need anything more complex, you'd call the train_epoch function yourself and extend the loop body with whatever you need (learning rate updates, early stopping, ...). I'd need more time to think this through, though, and I won't have that time soon. Sorry for delaying this for so long, we certainly appreciate your efforts and I would love to continue this discussion in one or two months! We definitely want training code in Lasagne, but first we need to manage releasing v0.2 at all...

… `batch` module and renamed it to `mean_batch_apply`. The `mean_batch_apply` and `batch_apply` functions in the `batch` module extract mini-batches of data from a dataset and pass them to a supplied function. They also gather the results and return their mean or the results as is, respectively.

…tomised by passing a string template rather than a callable / lambda. `mnist_trainer`: simplified `train()` call as some args were redundant due to using default values.

…pply` functions to `batch_map` and `mean_batch_map` respectively.

Britefury · 2016-11-28T23:29:21Z

@benanne @f0k With a view to (maybe only partially) addressing some of your concerns, I have made some changes.

There are the new batch_map and mean_batch_map functions in the batch module that are mini-batch driven versions of map. mean_batch_map differs in that it returns the mean result across all samples, e.g. mean of the losses or error rates. I have modified the mnist example as a demo; the batch loops are gone and replaced by calls to mean_batch_map. I think that this is some way towards @f0k's train_epoch suggestion.

In attempt to adderss @benanne's specific concerns, I have simplified the train_log_func and epoch_log_func arguments of the train function so that you can pass a string (template) whose format method is invoked to generate the log message. These arguments are not strictly necessary; the defaults will report the results just fine. Customising the output makes things more readable though. The mnist_trainer example has been simplified as the verbosity and shuffle_rng arguments were redundant as the values passed were the defaults anyway.

train certainly does have a lot of arguments. In many cases you won't need to use most of them as the defaults will do. I have however found early stopping, network state saving and restoring and various other features there to be very useful, hence their inclusion.

I realise that this is not quite what you want, as is. I understand that you are looking for an API that works along the lines of train_network(final_layer, loss_to_minimise, metrics_to_evaluate, training_set, val_set, test_set, batch_size, ...). It is my intention to get there. Such an API would be very handy in most cases. It would not be so helpful in scenarios such as training GANs, as there isn't a single loss to minimise. In such cases, something like the training loop that I have implemented can be used as you have a greater ability to customise. I would therefore suggest that your final goal API could be built on top of a training loop that works somewhat like that which I have implemented.

Perhaps a good way forward would be to split this into two PRs? The first PR could contain just the batch module with batch_map and mean_batch_map, with the second containing the trainer module that builds on top of batch.

…w accept iterators as data sets. `batch` module: `mean_batch_map` now takes the `sum_axis` argument instead of `func_returns_sum` as it is more flexible and easier to understand. Fixed usages of the above APIs in `trainer` to account for changes.

Removed unnecessary imports from `trainer`.

f0k · 2016-12-13T18:31:13Z

I realise that this is not quite what you want, as is. I understand that you are looking for an API that works along the lines of train_network(final_layer, loss_to_minimise, metrics_to_evaluate, training_set, val_set, test_set, batch_size, ...).

Yes, I think that's about what I had in mind. Although maybe the user should still provide the updates dictionary. The API for creating the updates dictionary is not too complicated, and it's difficult to customize if it's created inside train_network. But a similar argument can be made for the compiled functions -- maybe train_network shouldn't do everything, but accept the compiled functions (train_fn, test_fn), and there should be an API for creating those. We really need to sit down and think through different options and use cases. But I don't know when I'll come to this.

Perhaps a good way forward would be to split this into two PRs? The first PR could contain just the batch module with batch_map and mean_batch_map, with the second containing the trainer module that builds on top of batch.

I'm not convinced yet these should be separate modules. The only argument is that the batch iteration code can also be used for testing, so it doesn't really fit the lasagne.trainer module name.

train certainly does have a lot of arguments. In many cases you won't need to use most of them as the defaults will do. I have however found early stopping, network state saving and restoring and various other features there to be very useful, hence their inclusion.

A nicer API for this might be something like:

for epoch, errors in training_loop(...):
    print whatever you want
    save whatever you want
    update the learning rate when you want
    early stop when you want

There could be convenience functions for some of these. And well, yes, there could also be a function that encapsulates all this (that's the multiple levels of abstraction Sander and me were referring to). But then maybe the training loop should become a class that you can register such callbacks with, so you don't have a function with a myriad of arguments, but something like:

trainer = Trainer(...)
trainer.after_epoch(LogResults(your_format_string))
trainer.after_epoch(ScaleParam(eta, 0.95))
trainer.at_epoch(100, ScaleParam(eta, 0.1))
trainer.run()

Again, the design space is huge, and we need to properly sit down to decide what to include in Lasagne.

… datasets.

…at prevents the state from being stored or the validation score improvements from being detected until a specified epoch.

Britefury · 2017-02-16T22:30:00Z

@f0k, I hope you don't mind, but I decided to go ahead with the separate PR approach. I think that the design space is sufficiently large that it warrants building the training loop up in small pieces. The data_source module in PR #801 works toward this. I would like to take this 'smaller steps' approach as getting there in one leap could involve quite a lot of back and forth, with all the refactoring that goes along the way. If we can build and agree on it piece by piece, I think that I will find the process more manageable. :)

Britefury and others added 4 commits October 23, 2016 11:44

batch module with mini-batch iteration functions and incomplete tests.

2834a99

Fixes and renames in batch.

7866f97

Completed tests in `test_batch`.

New trainer module and corresponding test_trainer module.

db47e8b

Updated mnist example to use batch.batch_iterator.

1eddb87

Implemented alternative `mnist_trainer` example that uses `Trainer`, along with changes necessary to fit Trainer model. Documentation for `trainer` and `batch` modules. Fixes to `Trainer`.

Britefury mentioned this pull request Oct 23, 2016

Is there interest for implementing flexible training loops, batch iteration schemes, etc. [offer of code] #756

Open

Britefury added 3 commits October 23, 2016 22:34

PEP8 fix.

7e6b024

Fixes to Trainer for problems uncovered by tests.

d5d49da

Extended test coverage for `batch` and `trainer` modules.

PEP8 fix.

676ae33

Britefury added 5 commits November 1, 2016 10:20

Trainer class replaced with function; modified tests and examples t…

1b74d84

…o use new API.

Slight refactor and coverage improvements.

1ffea33

Removed debug print + py3 fix.

7b220af

Removed another debug print + py3 fix.

19ed81a

train: post_epoch_callback now receives training and validation r…

530af53

…esults.

Britefury added 3 commits November 24, 2016 21:57

Training loop now accepts a tqdm style batch progress function for …

88fb399

…reporting batch-by-batch progress. This makes the `VERBOSITY_BATCH` verbosity option obsolete, hence its removal.

progress_iter_func now monitors iteration over validation and test …

4eea278

…sets. Partial refactor of tests for improved understandability.

Refactored more training loop tests.

56e9af1

benanne reviewed Nov 27, 2016

View reviewed changes

f0k reviewed Nov 27, 2016

View reviewed changes

Britefury added 4 commits November 28, 2016 14:34

Bug fix.

4357672

trainer: training, evaluation and epoch log messages can now be cus…

950dbcf

…tomised by passing a string template rather than a callable / lambda. `mnist_trainer`: simplified `train()` call as some args were redundant due to using default values.

Nomenclature improvement: renamed the batch_apply and `mean_batch_a…

5e2052d

…pply` functions to `batch_map` and `mean_batch_map` respectively.

Britefury added 3 commits December 5, 2016 11:58

mnist: modified example to account for mean_batch_map API changes.

b7eee16

Removed unnecessary imports from `trainer`.

Comment fix.

a34434d

Britefury and others added 3 commits January 17, 2017 18:30

Implemented circular batch iterators and partially implemented nested…

6601fe2

… datasets.

train function now takes the store_state_after_epoch parameter th…

f3313c6

…at prevents the state from being stored or the validation score improvements from being detected until a specified epoch.

Fixes.

74cac41

Britefury mentioned this pull request Feb 16, 2017

Data source module for working with mini-batches from data sets #801

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trainer #759

Trainer #759

Britefury commented Oct 23, 2016

Britefury commented Oct 26, 2016

Britefury commented Nov 9, 2016

benanne commented Nov 12, 2016

benanne Nov 27, 2016

f0k Nov 27, 2016 •

edited

Britefury commented Nov 28, 2016 •

edited

f0k commented Dec 13, 2016

Britefury commented Feb 16, 2017

Trainer #759

Are you sure you want to change the base?

Trainer #759

Conversation

Britefury commented Oct 23, 2016

Britefury commented Oct 26, 2016

Britefury commented Nov 9, 2016

benanne commented Nov 12, 2016

benanne Nov 27, 2016

Choose a reason for hiding this comment

f0k Nov 27, 2016 • edited

Choose a reason for hiding this comment

Britefury commented Nov 28, 2016 • edited

f0k commented Dec 13, 2016

Britefury commented Feb 16, 2017

f0k Nov 27, 2016 •

edited

Britefury commented Nov 28, 2016 •

edited