Skip to content
/ mlsh Public
forked from openai/mlsh

Meta-Learning Shared Hierarchies

Notifications You must be signed in to change notification settings

janEbert/mlsh

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fork of the original Meta-Learning Shared Hierarchies repository.

Meta-Learning Shared Hierarchies

Code for Meta-Learning Shared Hierarchies.

Includes pre-trained checkpoints for task AntBandits-v1 (up to 2000 epochs) in mlsh_code/savedir/.

Installation

Install mujoco-py and MuJoCo by following the installation instructions.

To test whether MuJoCo works, execute the following command in the mjpro*/bin folder:

./simulate ../model/humanoid.xml

Add to your .bashrc (replace ... with path to directory):

export PYTHONPATH=$PYTHONPATH:/.../mlsh/gym;
export PYTHONPATH=$PYTHONPATH:/.../mlsh/rl-algs;

Install MovementBandits environments:

cd test_envs
pip install -e .

Use pip3 if pip links to an older version of Python (like in the university computers).

Installing on the university computers

If you are installing on the university computers, install mujoco-py version 0.5.7 with MuJoCo version 1.31 as newer versions will not work. You may also think about installing TensorFlow 1.5.0 to be sure that it will run on any university computer although this is not as important. Also remember to always use python3 or pip3 instead of python or pip.

pip3 install tensorflow==1.5.0 mujoco-py==0.5.7
Running Experiments
cd mlsh_code

The MLSH script works on any Gym environment that implements the randomizeCorrect() function. See the envs/ folder for examples of such environments.

To run on multiple cores:

mpirun -np 12 python main.py ...
Training
python main.py --task AntBandits-v1 --num_subs 2 --num_epochs 10000 --macro_duration 1000 --num_rollouts 2000 --warmup_time 20 --train_time 30 --replay False AntAgent

Use python3 if python links to an older version of Python (like in the university computers).

Parameter --save_every X controls how often checkpoints are saved (every X epochs). Default is 500.

Visualization

Once you've trained your agent, view it by running:

python main.py [...] --replay True --continue_iter [your iteration] AntAgent

[your iteration] should be 2000 to view the latest uploaded checkpoint

About

Meta-Learning Shared Hierarchies

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Other 0.2%