You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using Keras-RL2 (v1.0.5) because my use case requires the AdamW optimiser which currently actively supported in TensorFlow 2 and thus, incompatible with Keras-RL since it is based on TensorFlow 1. I understand this repo is not Keras-RL2 but that repo is archived (so I can't post an issue there). This is why I am posting an issue here so please do not remove it.
Problem
The results from the trained DQN agent using Keras-RL2 show that 2 Atari environments (Pong and Boxing) completely fail to learn while 1 environment (Freeway) learns correctly. However, I tested this same DQN configuration (i.e., same hyperparameters etc.) across multiple other RL frameworks such as Stable Baselines 3 and RLlib and observed that all 3 environments were able to learn correctly. This leads me to think that there is either a bug in the DQN agent or I made a mistake while configuring the DQN agent here in Keras-RL2.
Keras-RL2 reward graph for comparison
Stable Baselines 3 reward graph for comparison
The DQN algorithm should conform to the algorithm in the Nature DQN paper. Listed below are the hyperparameters I set to be the same across all RL frameworks.
Gymnasium Environment Configuration
Max epsisode frames: 108k frames (default for ALE/*-v5 envs)
Why I am using Keras-RL2
I am using Keras-RL2 (v1.0.5) because my use case requires the AdamW optimiser which currently actively supported in TensorFlow 2 and thus, incompatible with Keras-RL since it is based on TensorFlow 1. I understand this repo is not Keras-RL2 but that repo is archived (so I can't post an issue there). This is why I am posting an issue here so please do not remove it.
Problem
The results from the trained DQN agent using Keras-RL2 show that 2 Atari environments (Pong and Boxing) completely fail to learn while 1 environment (Freeway) learns correctly. However, I tested this same DQN configuration (i.e., same hyperparameters etc.) across multiple other RL frameworks such as Stable Baselines 3 and RLlib and observed that all 3 environments were able to learn correctly. This leads me to think that there is either a bug in the DQN agent or I made a mistake while configuring the DQN agent here in Keras-RL2.
Keras-RL2 reward graph for comparison
Stable Baselines 3 reward graph for comparison
The DQN algorithm should conform to the algorithm in the Nature DQN paper. Listed below are the hyperparameters I set to be the same across all RL frameworks.
Gymnasium Environment Configuration
Network Configuration
Algorithm Configuration
Below is my code for Keras-RL2
I would appreciate any thoughts/comments on this matter. Thanks!
The text was updated successfully, but these errors were encountered: