SARSA Frozen Lake

Introduction

This project aims to train a SARSA agent to learn policies in the Frozen Lake environment from OpenAI gym.

Frozen Lake

Frozen Lake is an environment where an agent is able to move a character in a grid world. Starting from the state S, the agent aims to move the character to the goal state G for a reward of 1. Although the agent can pick one of four possible actions at each state including left, down, right, up, it only succeeds $\frac{1}{3}$ of the times due to the slippery frozen state F. The agent is likely to move to any other directions for the remaining $\frac{2}{3}$ times evenly. Additionally, stepping in a hole state H will lead to a bad ending with a reward of 0.

S: Start State
G: Goal State
F: Slippery State
H: Hole State

Random Strategy

SARSA Agent

Dependencies

python >= 3.7.2
jupyter >= 1.0.0
gym >= 0.15.4
tqdm >= 4.41.1
numpy >= 1.16.2
matplotlib >= 3.1.1

Setup

Please ensure the following packages are already installed. A virtual environment is recommended.

Python (for .py)
Jupyter Notebook (for .ipynb)

$ cd SARSA-Frozen-Lake/
$ pip3 install pip --upgrade
$ pip3 install -r requirements.txt

Run

To view the note book:

$ jupyter notebook

To run the script:

$ python3 main.py

Output

If everything goes well, you may see the similar results shown as below.

Initialize environment...

SFF
FHF
FFG

An agent taking random actions:
Episode 100 Reward 1.0: 100%|█████████████████████████████| 100/100 [00:00<00:00, 2860.68it/s]
Averaged reward per episode 0.24
Saving output to img/result_img_0.png

SARSA agent:
Episode 1000 Reward 0.0: 100%|██████████████████████████| 1000/1000 [00:00<00:00, 2772.03it/s]
Trained Q Table:
[[0.11833864 0.12764726 0.11830122 0.10965143]
 [0.05782744 0.13956833 0.1052786  0.16751989]
 [0.20571471 0.19419847 0.30342657 0.20114661]
 [0.18734408 0.14452312 0.14835486 0.06992683]
 [0.         0.         0.         0.        ]
 [0.42679779 0.35017348 0.37137972 0.1893355 ]
 [0.2281178  0.3050125  0.28307348 0.2512109 ]
 [0.21656739 0.55051001 0.33186199 0.39988802]
 [0.         0.         0.         0.        ]]
Training Averaged reward per episode 0.281
Episode 100 Reward 0.0: 100%|█████████████████████████████| 100/100 [00:00<00:00, 3300.47it/s]
Training Averaged reward per episode 0.81
Saving output to img/result_img_1.png
Done.

Please find output under ./img.

Authors

Ning Shi - [email protected]

Reference

Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems (Vol. 37). Cambridge, England: University of Cambridge, Department of Engineering.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
img		img
.gitignore		.gitignore
README.md		README.md
main.ipynb		main.ipynb
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SARSA Frozen Lake

Introduction

Frozen Lake

Random Strategy

SARSA Agent

Directory

Dependencies

Setup

Run

Output

Authors

Reference

About

Releases

Packages

Contributors 2

Languages

MrShininnnnn/SARSA-Frozen-Lake

Folders and files

Latest commit

History

Repository files navigation

SARSA Frozen Lake

Introduction

Frozen Lake

Random Strategy

SARSA Agent

Directory

Dependencies

Setup

Run

Output

Authors

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages