-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added feature: Multi-input with dictionaries #195
base: master
Are you sure you want to change the base?
Conversation
merging keras-rl/keras-rl into bklebel/keras-rl
This is an improvement to the MultiInputProcessor, establishing correct communication between keras-rl and a gym-environment, which has a dictionary-like observation space.
Currently only 2D input is supported (images, matrices, ...) as this was what I needed - an update to variable input dimensions (1d vectors, scalars, cubic matrices, ...) is under way, but not high up in my priorities |
previously only 2D inputs (matrices) could be handled, now alll kind of inputs, vectors, 2D matrices or higher dimensions should be handled
PR updated for functionality - now also non-2D arrays can be processed. |
For anyone who wants to use this already, in /rl/callbacks.py, in the lines 191-193, the np.mean/min/max need to be exchanged (e.g. with 0) because numpy cannot calculate the mean of dictionaries, which is all to understandable. |
I found some problems with nonzero window lengths in memories, I will continue working on it |
@bklebel Feel free to ping me if you need help/advice on specific points |
@RaphaelMeudec Thank you! |
@RaphaelMeudec The nonzero window length option works now, however with an ugly workaround for a rather weird problem: lines 48-50 (in processors.py) exist, because at a nonzero window length, the "last instance in the window" is being wrapped in an additional 0-dim numpy array. I did not find the place in the code where this happens (so arbitrarily), and I could not think of a good implicit and array-like way how to solve this, so I ended up with going through the whole batch and window instances individually. I am sure this can be solved nicer (e.g. finding the place where the mischief is contrived i.e. the dict wrapped in 0-dim array). So, please help! Test cases for this dict-like multi-input are still on my list. |
updating fork
a) One single dict entry could not properly would not properly work b) channels_first is now a possible format of input c) the keras.train_on_batch function was not correctly loaded with target values in case of a dict-input I also completely removed the output of mean, max and min observations at output-verbose=2, since it essentially becomes meaningless as soon as non-scalar observations are used
@bklebel I'll check all this soon, I'll keep you updated! |
Thanks - I think I solved it in the commits after my comment, but I am not entirely sure. |
In this gist, the there is the code for the keras-rl cartpole example, rewritten for dictionary inputs. The corresponding results are displayed, once for the whole observation put into one value of a dict (dqn_onedict_cartpole.py): And the same for all inputs in separate values of the dictionary (dqn_multidict_cartpole.py) In both cases, the agent learns a working policy. I could integrate them in the tests, there is a TODO to use an environment to see whether it learns something, however I am not quite sure whether the cartpole environment is simple enough for travis. I'm looking forward to your assessment, @RaphaelMeudec :) |
Merge pull request #3 from bklebel/bklebel-codacy-1
I will not change line 60 `order[idx_state, idx_window, i] = state_batch[idx_state][idx_window][key][i]`, in spite of codacy, because I think it is easier to understand what happens if all indices are visible as is.
updating with changes
The link to your examples changed: https://gist.github.com/bklebel/e3bd43ce228a53d27de119c639ac61ee |
update from keras-rl master
using `state` in order to satisfy Codacy line 63 (and following) remains unchanged (with all indices), to ensure the clarity of what is happening.
This is an improvement to the MultiInputProcessor, establishing correct communication between keras-rl and a gym-environment, which has a dictionary-like observation space.
Code examples of model and environment can be found here https://gist.github.com/bklebel/913d8f155e6ed23f8a35fba989c70140. It is not a minimal working example, but contains the important parts of model and environment (corresponding names of input layers and spaces in the observation).