Skip to content

Deep reinforcement learning (PPO) apply in FrozenLakev1

Notifications You must be signed in to change notification settings

Elapsedf/FrozenLake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Install

Envs

conda create -n ppo python=3.9
conda activate ppo

If you have GPU, Please install pytorch corresponding to the CUDA version because it could faster your training.

If you only have CPU for training, That's OK

I have tried the

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113		#for CUDA 11.4

and torch==2.1.0+cu118, both could Work

Requirements

git clone https://github.com/Elapsedf/FrozenLake.git
pip install -r requirements.txt

When you want to train a new model in Stochasitic Mode, Please make sure your GPU memory is Enough, If you find CUDA out of memroy, Try to decline the update_freq. As this parameter gets larger, it will consume more memory

File List

Make_gif.py

This file contain the train and test mode, you only need to change this file in the file end and run it

PPO.py

Main Algorithm

Train

Please change the work dir to your own path

save_gif_images(env_name, has_continuous_action_space, max_ep_len, action_std, pretrained)
save_gif(env_name)
list_gif_size(env_name)

Note that Remember to change the test_num when you want to train a new model

Test

test()		
save_gif(env_name)
list_gif_size(env_name)

Note that Remember to change the pre-trained model and the gif_num when you want to test a new model

Pretrained Model

I trained 2 Model:Determinstic and Stochastic, both are in the PreTrained dir

They correspond to different settings of a parameter of the environment, As the following

env = gym.make("FrozenLake-v1", is_slippery=False)	#Deterministic
env = gym.make("FrozenLake-v1")		# Stochastic

Deterministic Mode: The State Transition is depend on the Action which is chosen by the Actor Network

Stochastic Mode:State transfer in the environment is a stochastic process, i.e. the final outcome of the state transition does not depend only on the action but is also influenced by the environment

The following are the train result, and the ideal result is 1

  

Figure1: Deterministic Result

but in the Stochastic mode the result is not so good for the environment effect

So I Training in two phases

  • In the First phases,I train a model and save it
  • In the Second phases, I train a new model base on the previous model in first phases

So the Results are the following

  

Figure2:First PhaseFigure3:Second Phase(Final Result)

Result

The gif result please see the gif dir

About

Deep reinforcement learning (PPO) apply in FrozenLakev1

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages