Decrease in reward during training with MaskablePPO #207

vahidqo · 2023-09-01T07:07:01Z

❓ Question

Hi,

During training in a custom environment with MaskablePPO, the reward decreased and then converged. Is there any specific reason? It means the algorithm has found a better policy but is outputting another one?

My environment has two normalized rewards that will be weighted sum to measure the final reward. I have 19 timestep and my gamma was set to 0.001.

class customenv(gym.Env):....
env = customenv()
env = ActionMasker(env, mask_fn)
model = MaskablePPO(MaskableActorCriticPolicy, env, gamma = 0.0001, verbose=0)
model.learn(4000000)

Thank you!

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

vahidqo added the question Further information is requested label Sep 1, 2023

araffin added more information needed Please fill the issue template completely custom gym env Issue related to Custom Gym Env labels Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decrease in reward during training with MaskablePPO #207

Decrease in reward during training with MaskablePPO #207

vahidqo commented Sep 1, 2023 •

edited

Decrease in reward during training with MaskablePPO #207

Decrease in reward during training with MaskablePPO #207

Comments

vahidqo commented Sep 1, 2023 • edited

❓ Question

Checklist

vahidqo commented Sep 1, 2023 •

edited