Skip to content

kounelisagis/Reinforcement-Learning-Playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement-Learning-Playground

Mountain Car Problem - Continuous

Used machin library to solve the Continuous Mountain Car Problem using PPO and TD3. Implemented the right actor-critic networks and found the right hyper-parameters.

Expected Return per iteration Expected Return per iteration.

Pendulum swing-up - Torques

Used RobotDART, OpenAI Gym spaces, created reward function and used PPO and TD3. Used Frame Skipping technique. The initial position is defined as x0 = [π], the observation space is the vector: [cos θ, sin θ, torque], and the reward function uses the angle θ, torque, and the command given to the robot.

Figure_1(1) TD3 - Expected return per iteration.

Figure_3 PPO - Expected return per iteration.

Iiwa joint space RL-controller - Servo

Used RobotDART, OpenAI Gym spaces, created reward function and used PPO and TD3. The observation space is a vector that contains all the positions and velocities of the robot's joints, and the reward function is the norm of the difference between the final and current positions.

Figure_5 TD3 - Expected return per iteration.

Figure_7 PPO - Expected return per iteration.

Releases

No releases published

Packages

No packages published

Languages