Robot arm control with Reinforcement Learning

This project focuses on controlling a 7 DOF robot arm provided in the pandas_gym Reacher environment using two continuous reinforcement learning algorithms: DDPG (Deep Deterministic Policy Gradients) and TD3 (Twin Delayed Deep Deterministic Policy Gradients). The technique of Hindsight Experience Replay is used to enhance the learning process of both algorithms.

Continuous RL Algorithms

Continuous reinforcement learning deals with environments where actions are continuous, such as the precise control of robotic arm joints or controlling the throttle of an autonomous vehicle. The primary objective is to find policies that effectively map observed states to continuous actions, ultimately optimizing the accumulation of expected rewards. Several algorithms have been specifically developed to address this challenge, including DDPG, TD3, SAC, PPO, and more.

1- DDPG (Deep Deterministic Policy Gradients)

DDPG is an actor-critic algorithm designed for continuous action spaces. It combines the strengths of policy gradients and Q-learning. In DDPG, an actor network learns the policy, while a critic network approximates the action-value (Q-function). The actor network directly outputs continuous actions, which are evaluted by the critic network to find the best action thus allowing for fine-grained control.

2- TD3 (Twin Delayed Deep Deterministic Policy Gradients)

TD3 is an enhancement of DDPG that addresses issues such as overestimation bias. It introduces the concept of "twin" critics to estimate the Q-value (it uses two critic networks instead of a single one like in DDPG), and it uses target networks with delayed updates to stabilize training. TD3 is known for its robustness and improved performance over DDPG.

Hindsight Experience Replay

Hindsight Experience Replay (HER) is a technique developed to address the challenge of sparse and binary rewards in RL environments. For example, in many robotic tasks, achieving the desired goal is rare, and traditional RL algorithms struggle to learn from such feedback (agent always gets a zero reward unless the robot successfully completed the task which makes it difficult for the algorithm to learn as it doesn't know if the steps done were good or not).

HER tackles this issue by reusing past experiences for learning, even if they didn't lead to the desired goal. It works by relabeling and storing experiences in a replay buffer, allowing the agent to learn from both successful and failed attempts which significantly accelerates the learning process.

Link to HER paper: https://arxiv.org/pdf/1707.01495.pdf

How ro run

You can train a given model simply by running one of the files in the `training` folder.

DDPG With HER: ddpg_her.py

TD3 With HER: td3_her_training.py
You can change the values of the hyperparameters of both algorithms (learning_rate (alpha/beta), discount factor (gamma),...) by going directly to each agent class in the agents folder. The architecture of the Actor/Critic networks can be modified from the networks.py file.

Results

The training of both agents was done in the colab environment :

Contact

If you have any questions, feedback, or issues, please don't hesitate to open an issue or reach out to me: [email protected].

License

Distributed under the MIT License. See LICENSE.txt for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
agents		agents
replay_memory		replay_memory
training		training
utils		utils
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robot arm control with Reinforcement Learning

Continuous RL Algorithms

1- DDPG (Deep Deterministic Policy Gradients)

2- TD3 (Twin Delayed Deep Deterministic Policy Gradients)

Hindsight Experience Replay

How ro run

Results

Contact

License

About

Releases

Packages

Languages

License

kaymen99/Robot-arm-control-with-RL

Folders and files

Latest commit

History

Repository files navigation

Robot arm control with Reinforcement Learning

Continuous RL Algorithms

1- DDPG (Deep Deterministic Policy Gradients)

2- TD3 (Twin Delayed Deep Deterministic Policy Gradients)

Hindsight Experience Replay

How ro run

Results

Contact

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages