Learning RL by implementing and analysing different RL methods.
Directory | Game | Number of agents | RL method |
---|---|---|---|
nim-dqn | Nim-21 | 2 | Deep Q-network |
nim-a2c | Nim-21 | 2 | Advantage Actor Critic |
matching-pennies-a2c | Matching Pennies | 2 | Advantage Actor Critic |
snake-a2c | Snake | 1 | Advantage Actor Critic |
snake-ppo | Snake | 1 | Proximal Policy Optimisation |