Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CEM agent (prefer 'good' weights instead of 'lucky' weights) #330

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Nikolay-Lysenko
Copy link

Consider weights for CEMAgent. Given these weights, there are two options for action selection:

  • to choose one at random with respect to output probabilities;
  • to choose the one with the highest predicted probability.

At testing stage, an agent has only the latter option. However, at training stage, the former option is always activated. This is inconsistent, because it means that an agent is trained to do other things than those that are expected from it during testing/deployment.

Given weights, stochastic selection of actions may, by chance, result in abnormally high reward. However, there is no guarantee that deterministic selection of the most probable action can lead to the same reward. In general case, this does not hold true. Chances are that 'lucky' weights are preferred over 'good' weights during training. An evidence of it is that rewards at testing are lower than rewards reported at self.best_seen.

This pull request fixes above problem and also fixes a minor issue with EpisodeParameterMemory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant