Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added feature: Multi-input with dictionaries #195

Open
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

bklebel
Copy link

@bklebel bklebel commented Apr 11, 2018

This is an improvement to the MultiInputProcessor, establishing correct communication between keras-rl and a gym-environment, which has a dictionary-like observation space.

Code examples of model and environment can be found here https://gist.github.com/bklebel/913d8f155e6ed23f8a35fba989c70140. It is not a minimal working example, but contains the important parts of model and environment (corresponding names of input layers and spaces in the observation).

merging keras-rl/keras-rl into bklebel/keras-rl
This is an improvement to the MultiInputProcessor, establishing correct communication between keras-rl and a gym-environment, which has a dictionary-like observation space.
@bklebel
Copy link
Author

bklebel commented May 7, 2018

Currently only 2D input is supported (images, matrices, ...) as this was what I needed - an update to variable input dimensions (1d vectors, scalars, cubic matrices, ...) is under way, but not high up in my priorities

previously only 2D inputs (matrices) could be handled, now alll kind of inputs, vectors, 2D matrices or higher dimensions should be handled
@bklebel
Copy link
Author

bklebel commented May 10, 2018

PR updated for functionality - now also non-2D arrays can be processed.
the bug failing all checks was removed - should be fully functional now!

@bklebel
Copy link
Author

bklebel commented May 14, 2018

For anyone who wants to use this already, in /rl/callbacks.py, in the lines 191-193, the np.mean/min/max need to be exchanged (e.g. with 0) because numpy cannot calculate the mean of dictionaries, which is all to understandable.
Apart from this little piece of code, everything should run now as intended.

@bklebel bklebel changed the title Add feature for the MultiInputProcessor Added feature: Multi-input with dictionaries May 14, 2018
@bklebel
Copy link
Author

bklebel commented May 15, 2018

I found some problems with nonzero window lengths in memories, I will continue working on it

@RaphaelMeudec
Copy link
Contributor

@bklebel Feel free to ping me if you need help/advice on specific points

@bklebel
Copy link
Author

bklebel commented May 15, 2018

@RaphaelMeudec Thank you!
It is very good and heartening to know that there is some attention to this topic.

@bklebel
Copy link
Author

bklebel commented Jul 23, 2018

@RaphaelMeudec The nonzero window length option works now, however with an ugly workaround for a rather weird problem: lines 48-50 (in processors.py) exist, because at a nonzero window length, the "last instance in the window" is being wrapped in an additional 0-dim numpy array. I did not find the place in the code where this happens (so arbitrarily), and I could not think of a good implicit and array-like way how to solve this, so I ended up with going through the whole batch and window instances individually. I am sure this can be solved nicer (e.g. finding the place where the mischief is contrived i.e. the dict wrapped in 0-dim array). So, please help!

Test cases for this dict-like multi-input are still on my list.

a) One single dict entry could not properly would not properly work
b) channels_first is now a possible format of input
c) the keras.train_on_batch function was not correctly loaded with target values in case of a dict-input
I also completely removed the output of mean, max and min observations at output-verbose=2,
since it essentially becomes meaningless as soon as non-scalar observations are used
@RaphaelMeudec
Copy link
Contributor

@bklebel I'll check all this soon, I'll keep you updated!

@bklebel
Copy link
Author

bklebel commented Aug 3, 2018

Thanks - I think I solved it in the commits after my comment, but I am not entirely sure.
I have tested the dqn_cartpole example, rewriting everything to dictionaries, and got good results, I am just about to show them here and upload the altered environment+agent, maybe this will make everything easier for checking weird behaviour.

@bklebel
Copy link
Author

bklebel commented Aug 3, 2018

In this gist, the there is the code for the keras-rl cartpole example, rewritten for dictionary inputs. The corresponding results are displayed, once for the whole observation put into one value of a dict (dqn_onedict_cartpole.py):
log_onedict

And the same for all inputs in separate values of the dictionary (dqn_multidict_cartpole.py)
log_multidict

In both cases, the agent learns a working policy.

I could integrate them in the tests, there is a TODO to use an environment to see whether it learns something, however I am not quite sure whether the cartpole environment is simple enough for travis.

I'm looking forward to your assessment, @RaphaelMeudec :)

Merge pull request #3 from bklebel/bklebel-codacy-1
I will not change line 60 `order[idx_state, idx_window, i] = state_batch[idx_state][idx_window][key][i]`, in spite of codacy, because I think it is easier to understand what happens if all indices are visible as is.
@pittnerf
Copy link

pittnerf commented Dec 8, 2018

The link to your examples changed: https://gist.github.com/bklebel/e3bd43ce228a53d27de119c639ac61ee

update from keras-rl master
@bklebel bklebel mentioned this pull request Jan 9, 2019
using `state` in order to satisfy Codacy
line 63 (and following) remains unchanged (with all indices), to ensure the clarity of what is happening.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants