Fix CEM agent (prefer 'good' weights instead of 'lucky' weights) #330

Nikolay-Lysenko · 2019-08-26T17:43:07Z

Consider weights for CEMAgent. Given these weights, there are two options for action selection:

to choose one at random with respect to output probabilities;
to choose the one with the highest predicted probability.

At testing stage, an agent has only the latter option. However, at training stage, the former option is always activated. This is inconsistent, because it means that an agent is trained to do other things than those that are expected from it during testing/deployment.

Given weights, stochastic selection of actions may, by chance, result in abnormally high reward. However, there is no guarantee that deterministic selection of the most probable action can lead to the same reward. In general case, this does not hold true. Chances are that 'lucky' weights are preferred over 'good' weights during training. An evidence of it is that rewards at testing are lower than rewards reported at self.best_seen.

This pull request fixes above problem and also fixes a minor issue with EpisodeParameterMemory.

Nikolay-Lysenko added 2 commits August 26, 2019 20:20

fix: Learn good weights instead of lucky weights

5922bcb

fix: Use proper name of parent class

55e2045

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CEM agent (prefer 'good' weights instead of 'lucky' weights) #330

Fix CEM agent (prefer 'good' weights instead of 'lucky' weights) #330

Nikolay-Lysenko commented Aug 26, 2019

Fix CEM agent (prefer 'good' weights instead of 'lucky' weights) #330

Are you sure you want to change the base?

Fix CEM agent (prefer 'good' weights instead of 'lucky' weights) #330

Conversation

Nikolay-Lysenko commented Aug 26, 2019