Baseline for variance reduction in Policy Gradient Algorithm

Modular implementation of Vanila Policy Gradient (VPG) algorithm with baseline using an RNN policy.

Dependencies

Using a value function based baseline for reducing the variance in the vanila policy gradient algorithms
Using an RNN policy for giving the action probabilities for a reinforcement learning problem
Using a sampler that reshape the trajectory to be feed into an RNN policy
Using gradient clipping to solve the exploding gradient problem
Using GRU to solve the vanishing gradient problem

To train a model for Cartpole-v0:

$ python test_graph_pg.py

To view the tensorboard

$tensorboard --logdir .

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
configuration.json		configuration.json
pg_reinforce.py		pg_reinforce.py
sampler.py		sampler.py
test_graph_pg.py		test_graph_pg.py