Skip to content

๐Ÿ Implementation of the REINFORCEjs library from Kaparthy in Python

License

Notifications You must be signed in to change notification settings

PeeteKeesel/reinforce-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

26 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค– REINFORCEpy

Implementation of the REINFORCEjs library from Kaparthy in Python. The original library has been implemented in JavaScript. The objective of this repository is to implement the RL algorithms and the demos in Python.

Note that this is not a 1-to-1 implementation in Python. The idea is simply trying to develop similar algorithms and demos as shown in Kaparthy's library.

Value Iteration

We started by implemented the most trivial algorithm, Value Iteration, from scratch.

The following shows an example of the value function for different iterations.

After 1 iterations After 100 iterations
Value function after $1$ iteration Value function after $100$ iteration

๐Ÿƒ How to Run?

There are multiple parameters which can be chosen to set when running the main.py. An example call would look like this:

python main.py \
    --seed=42 \
    --verbose=1 \
    --episodes=1 \
    --timesteps=1 \
    --grid_size=10 \
    --algo=value_iteration \
    --render_large=True \
    --render_with_values=True

All supported arguments are listed below:

usage: 
  main.py [--seed] [--verbose] [--episodes] [--timesteps] [--grid_size] [--algo] 
          [--render_large] [--render_with_values]
Argument Help Default
--seed random seed $42$
--verbose verbosity level $1$
--episodes number of episodes $1$
--timesteps maximal number of timesteps $1,000$
--grid_size size of the gridworld $10$
--algo learning algorithm value_iteration
--render_large render large gridworld False
--render_with_values render gridworld with value estimates False

๐Ÿ“ ToDo's

Added to docs/changelog.md