We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
return value + advantage - advantage.mean()
return value + advantage - advantage.mean(dim=1, keepdim=True)
self.policy_net = model.to(self.device)
self.target_net = model.to(self.device)
self.policy_net = DuelingNet(cfg.n_states, cfg.n_actions, hidden_dim=cfg.hidden_dim).to(self.device)
self.target_net = DuelingNet(cfg.n_states, cfg.n_actions, hidden_dim=cfg.hidden_dim).to(self.device)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
return value + advantage - advantage.mean()
可能有误,应该改为return value + advantage - advantage.mean(dim=1, keepdim=True)
。因为按照定义,优势网络输出的值要满足的条件应该是保持在动作维度上的和为0,那么减去的均值应该只是动作维度的均值,而不是总体的均值。
self.policy_net = model.to(self.device)
与self.target_net = model.to(self.device)
有误,应该改成self.policy_net = DuelingNet(cfg.n_states, cfg.n_actions, hidden_dim=cfg.hidden_dim).to(self.device)
和self.target_net = DuelingNet(cfg.n_states, cfg.n_actions, hidden_dim=cfg.hidden_dim).to(self.device)
。因为原初始化方式是初始化了两个相同内存地址的policy_net和target_net对象,修改后的初始化方式才是初始化两个不同内存地址的对象。
The text was updated successfully, but these errors were encountered: