Explainability of Deep Reinforcement Learning Algorithms in Robotic Domains by using Layer-wise Relevance Propagation

Environments

Our modified versions of robotic environments are under the ./CustomGymEnvs directory. In this directory, there is a changed_envs directory which contains the new FetchReach-v1 environment called FetchReach-v2 with changed action-space. The actions in the updated environment are torques (rather than the x, y, and z velocity of the end-effector). Under the envs directory, there are original environments and environments with occluded entities. Under the faulty_envs, there are environments with blocked joints. Under the graph_envs, there are environments with graph representation of the robots.

Graph Representation

For parsing the robot's xml model and converting the representation into a graph, the RobotGraphModel package has been developed. Under this directory, there is a model_parser.py file which parses the xml model of the environment. The robot_graph.py first parses the model of the robot, identifying the nodes (<body> in the xml) and edges (<joint> in the xml) of the robot. Two nested <body>'s are connected to each other through a <joint> that is defined in the inner body. For each environment, we have developed a class specific to that environment that has inherited from the RobotGraph class withing the robot_graph.py file. Each of these subclasses define the set of node and edge features for a specific environment. Each of these subclasses are used by the OpenAI Gym wrappers under the CustomGymEnvs/graph_envs.

Algorithm

Our algorithm is Soft Actor-Critic. The one with graph representation is under ./Graph_SAC and the original one with fully-connected network is under ./SAC. For using Graph Neural Network architecture, we use the implementation of torchgraph developed for the paper: Explainability Techniques for Graph Convolutional Networks. For the LRP implementation, we use this repository developed for the same paper.

Installation and Usage Guidelines

Setup

The python version is 3.8.10. The first step before running the project is to install MuJoCo 2.1:

$ wget https://github.com/deepmind/mujoco/releases/download/2.1.0/mujoco210-linux-x86_64.tar.gz
$ tar -xvf mujoco210-linux-x86_64.tar.gz
$ mv mujoco210 ~/.mujoco/
$ pip3 install -U 'mujoco-py<2.2,>=2.1'

Download the project file into the $HOME/Documents folder. Then add the following lines to the ~/.bashrc file:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin
export PYTHONPATH=$PYTHONPATH:$HOME/Documents/SAC_GCN

Then install the requirements of the project:

$ pip3 install -r requirements.txt

Experiments

First Phase

To run the experiments with graph representation of the robot, run the following command:

$ python $MAIN_FILE --env-name {ENV-NAME} --exp-type graph

where the MAIN_FILE is the absolute path to the /Controller/graph/main.py file. For a complete set of arguments, please check out the main.py file. The ENV-NAME can be the following names:

FetchReach-v2
Walker2d-v2
HalfCheetah-v2
Hopper-v2

After training the agent using graph networks, the Layer-wise Relevance Propagation (LRP) is applied to highlight the contribution of each part of the robot to the decision making. The data for experiments are saved under ./Data/{ENV-NAME}/graph.

After the convergence of the policy, the LRP is applied to the learned policy to calculate the relevance scores given by each action to each entity across time-steps. To run LRP for the ENV-NAME environment, run the following:

$ python $EVALUATE --env-name {ENV-NAME} --exp-type graph

where EVALUATE is the absolute path to the ./Evaluate/evaluate.py file. The result of running this file would be stored under ./Data/{ENV-NAME}/graph/edge_relevance.pkl and ./Data/{ENV-NAME}/graph/global_relevance.pkl, which contains the relevance scores given to edge and global units of the input graph, respectively.

Second Phase

In this phase, the results of the first phase are evaluated by either the following experiments:

Occluding the entity's features in the observation space, which validates its relevance score.
Blocking the joint which validates the importance of each joint in the action space.

In each of the above, based on the amount of drop in their performance, their relevance scores are validated. For more information, please refer to the paper. For all the following commands, $MAIN_FILE is the absolute path to the ./Controller/basic/main.py file. For running experiments in the standard setting, just run the following:

$ python $MAIN_FILE --env-name {ENV-NAME} --exp-type standard

To run experiments for the occlusion case, use the following command:

$ python $MAIN_FILE --env-name {ENV-NAME} --exp-type {ENTITY-NAME}

where ENTITY-NAME is the name of the entity we want to occlude. For each environment, the list of the ENTITY-NAMEs' are appeared in the following:

FetchReach-v2
- goal
- shoulder_pan_joint
- shoulder_lift_joint
- upperarm_roll_joint
- wrist_flex_joint
- forearm_roll_joint
- wrist_roll_joint
- elbow_flex_joint
Walker2d-v2
- torso
- foot_joint
- leg_joint
- thigh_joint
- foot_left_joint
- leg_left_joint
- thigh_left_joint
HalfCheetah-v2
- torso
- bfoot
- bshin
- bthigh
- ffoot
- fshin
- fthigh
Hopper-v2
- torso
- foot_joint
- leg_joint
- thigh_joint

For running experiments for the blockage case, use the following command:

$ python $MAIN_FILE --env-name {BROKEN-ENV-NAME} --exp-type {JOINT-NAME}

where BROKEN-ENV-NAME is the name of the environment with broken joint, as appeared in the following list:

FetchReachBroken-v2
Walker2dBroken-v2
HalfCheetahBroken-v2
HopperBroken-v2

and JOINT-NAME is the name of the joint we want to block. For each environment, the list of the JOINT-NAMEs' are appeared in the following:

FetchReachBroken-v2
- shoulder_pan_joint
- shoulder_lift_joint
- upperarm_roll_joint
- wrist_flex_joint
- forearm_roll_joint
- wrist_roll_joint
- elbow_flex_joint
Walker2dBroken-v2
- foot_joint
- leg_joint
- thigh_joint
- foot_left_joint
- leg_left_joint
- thigh_left_joint
HalfCheetahBroken-v2
- bfoot
- bshin
- bthigh
- ffoot
- fshin
- fthigh
HopperBroken-v2
- foot_joint
- leg_joint
- thigh_joint

Note that these experiments use the original SAC algorithm with fully-connected networks under the ./SAC directory. For each environment, the resulting data is stored under the following directories:

For the occlusion case: ./Data/{ENV-NAME}/{ENTITY-NAME}
For the blockage case: ./Data/{BROKEN-ENV-NAME}/{JOINT-NAME}

Plots

To plot the results of the experiments, run the following code:

$ python $PLOT --env-name {ENV-NAME}

where $PLOT is the absolute path to the ./Plots/plot.py file. The result would be stored under ./Result/{ENV-NAME}.jpg.

Name		Name	Last commit message	Last commit date
Latest commit History 439 Commits
Controller		Controller
CustomGymEnvs		CustomGymEnvs
Evaluate		Evaluate
Graph_SAC		Graph_SAC
Logs		Logs
Plots		Plots
Relevance		Relevance
Result		Result
RobotGraphModel		RobotGraphModel
SAC		SAC
scripts		scripts
torchgraphs		torchgraphs
.gitignore		.gitignore
GraphVisualization.ipynb		GraphVisualization.ipynb
README.md		README.md
evaulate.ipynb		evaulate.ipynb
requirements.txt		requirements.txt
test.ipynb		test.ipynb
test.py		test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Explainability of Deep Reinforcement Learning Algorithms in Robotic Domains by using Layer-wise Relevance Propagation

Environments

Graph Representation

Algorithm

Installation and Usage Guidelines

Setup

Experiments

First Phase

Second Phase

Plots

About

Releases

Packages

Languages

MehranTaghian/SAC_GCN

Folders and files

Latest commit

History

Repository files navigation

Explainability of Deep Reinforcement Learning Algorithms in Robotic Domains by using Layer-wise Relevance Propagation

Environments

Graph Representation

Algorithm

Installation and Usage Guidelines

Setup

Experiments

First Phase

Second Phase

Plots

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages