Tidy Reinforcement Learning with Tensorflow

I am a researcher working on automation tasks using deep reinforcement learning. The paper and the reality were quite different, and there was a lot of difficulty during the automation task. I create this repository to help those who start a task using deep reinforcement learning. All of the code is in tensorflow and Python 3.

Objective

The code in this repo has a simple and pretty structure in which each algorithm achieves uniformity. It will be a huge help for you to understand the differences between the reinforcement learning algorithms.
Take advantage of the pseudo code folder. You should be able to see pseudo code and make all the algorithms into a consistent architecture.

List of Implemented Algorithms

We used open-gym cartpoles (for discrete tasks), mountain car (for continous tasks), and pendulum (for continous tasks). In the case of HER, we used the coin flipping environment cited in the paper.

Policy Gradient (PG) for cartpole (discrete task)
Advantage Actor-Critic (A2C) for cartpole (discrete task)
Advantage Actor-Critic (A2C) for mountain car (continuous task),,,(imperfect...!!!)
Proximal Policy Optimization (PPO) (continuous task)
Deep Q Network (DQN) for cartpole (discrete task)
Deep Deterministic Policy Gradient (DDPG) for pendulum (continuous task)
Hindsight Experience Replay for coin flipping (discrete task),,,(imperfect...!!!)
Soft Actor-Critic (SAC) for pendulum (continuous task)

Papers / Pseudo Codes of RL Algorithms

Policy Gradient (PG) [Paper] [Pseudo Code]
Proximal Policy Optimization (PPO) [Paper] [Pseudo Code]
Asynchronous Advantage Actor-Critic (A3C) [Paper] [Pseudo Code]
Deep Q-learning Network (DQN) [Paper] [Pseudo Code]
Deep Deterministic Policy Gradient (DDPG) [Paper] [Pseudo Code]
Hindsight Experience Replay (HER) [Paper] [Pseudo Code]
Soft Actor-Critic (SAC) [Paper] [Pseudo Code]

Compare the Following Algorithms

PG (cartpole) vs A2C (cartpole)
A2C (cartpole) vs A2C (pendulum)
A2C (pendulum) vs PPO (pendulum)
PG (cartpole) vs DQN (cartpole)
A2C (pendulum) vs DDPG (pendulum)
DQN (cartpole) vs HER (coin)
PPO (pendulum) vs SAC (pendulum)

Some Tips in Realistic Development

A positive reward is a magnet, and a negative reward is a mole game.
The fastest agent to learn the Pendulum task is DDPG. So is ddpg the best reinforcement learning algorithm?
Which of the sparse reward and dense reward is practical?
If the problem is difficult, split it up and approach it.
Without a simulator, there is no answer other than model-based running.
In very difficulut task, it does not affect the skill drawn from experience replay.
Hindsight Experience Replay changes reward function. Unless you are a master of reinforcement learning, do not try HER.

Installation

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Algorithm		Algorithm
Pseudo_code		Pseudo_code
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tidy Reinforcement Learning with Tensorflow

Objective

List of Implemented Algorithms

Papers / Pseudo Codes of RL Algorithms

Compare the Following Algorithms

Some Tips in Realistic Development

Installation

Inspired by

About

Releases

Packages

Languages

License

babyapple/tidy-rl

Folders and files

Latest commit

History

Repository files navigation

Tidy Reinforcement Learning with Tensorflow

Objective

List of Implemented Algorithms

Papers / Pseudo Codes of RL Algorithms

Compare the Following Algorithms

Some Tips in Realistic Development

Installation

Inspired by

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages