Discrete action space switching continuous action space problem in custom environment #217

shengqie · 2024-01-26T16:11:50Z

   Hello developers, I am trying to customize aircombat env, the aircombat environment included in MARLlib, but I have encountered some problems in the post-customization training process, specifically.
   First, I set the 2v2 scene defined by the environment as a competitive multi-agent air combat scene of 1v1. Training with ippo produced a certain effect. Then, I changed the mutidiscrete actionspace defined by the environment into a continuous actionspace as follows:
                                        self.action_space = spaces.Box(low=-10, high=10., shape=(4,))
    However, after defining the action space as continuous, ippo, maddpg, mappo and other algorithms I used in MARLlib had extremely poor training effect, unable to produce effective strategies, and their reward curve could not produce an upward trend and could not converge.
    I have used mappo and other algorithms to achieve similar implementation, so I don't think it is the environment that leads to the failure of the algorithm. I would like to ask if you have any opinions, and whether MARLlib may have special code writing specifications for continuous action space, which I am not familiar with. Thank you for your answer.


    开发者您好，我试图对MARLlib所包含的空战环境aircombat自定义，但自定义后训练过程遇到了一些问题，具体来说是这样的。
    首先我将环境所定义的2v2场景设置为1v1的竞争多智能体空战场景，此时使用ippo进行训练后产生了一定效果，随后，我将环境所定义的mutidiscrete actionspace变为连续动作空间形式，具体为：
                                         self.action_space = spaces.Box(low=-10, high=10., shape=(4,))
    然而，在将动作空间定义为连续化后，我使用MARLlib的ippo、maddpg、mappo等算法均训练效果极差，无法产生有效策略，其奖励曲线也无法产生上升趋势，并且无法收敛。
   类似的实现我曾经使用过mappo等算法完成，所以我并不认为是环境的原因导致算法的失效，想请问您是否有什么见解，是否可能是MARLlib对于连续动作空间有特殊的代码书写规范，而我对此并不熟知，感谢您的回答。

The text was updated successfully, but these errors were encountered:

shengqie · 2024-01-27T07:03:59Z

I would like to add one more thing to you. After converting the action space to a continuous action space, the algorithm encountered an error after iterating thousands of times:

向您补充一点，在我将动作空间转换为连续动作空间后，算法迭代数千次以后出现了报错：

Failure # 1 (occurred at 2024-01-26_02-47-08)
Traceback (most recent call last):
File "/home/user/miniconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 890, in _process_trial
results = self.trial_executor.fetch_result(trial)
File "/home/user/miniconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 788, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/home/user/miniconda3/envs/marllib/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/user/miniconda3/envs/marllib/lib/python3.8/site-packages/ray/worker.py", line 1625, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): �[36mray::IPPOTrainer.train_buffered()�[39m (pid=1138520, ip=10.31.22.121, repr=IPPOTrainer)
File "/home/user/miniconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/ppo/ppo_torch_policy.py", line 46, in ppo_surrogate_loss
curr_action_dist = dist_class(logits, model)
File "/home/user/miniconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/models/torch/torch_action_dist.py", line 186, in init
self.dist = torch.distributions.normal.Normal(mean, torch.exp(log_std))
File "/home/user/miniconda3/envs/marllib/lib/python3.8/site-packages/torch/distributions/normal.py", line 50, in init
super(Normal, self).init(batch_shape, validate_args=validate_args)
File "/home/user/miniconda3/envs/marllib/lib/python3.8/site-packages/torch/distributions/distribution.py", line 53, in init
raise ValueError("The parameter {} has invalid values".format(param))
ValueError: The parameter loc has invalid values

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrete action space switching continuous action space problem in custom environment #217

Discrete action space switching continuous action space problem in custom environment #217

shengqie commented Jan 26, 2024

shengqie commented Jan 27, 2024

Discrete action space switching continuous action space problem in custom environment #217

Discrete action space switching continuous action space problem in custom environment #217

Comments

shengqie commented Jan 26, 2024

shengqie commented Jan 27, 2024