Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alpha zero是如何避免在不可行的位置落子的 #113

Open
ZhangXi20181002 opened this issue Sep 26, 2020 · 2 comments
Open

alpha zero是如何避免在不可行的位置落子的 #113

ZhangXi20181002 opened this issue Sep 26, 2020 · 2 comments

Comments

@ZhangXi20181002
Copy link

想请教一下,alpha zero是如何避免在不可行的位置落子的,比如该位置已经被占了,因为mcts在select的时候,每一个动作的概率是跟policy的输出有关,而在一开始的时候,policy是不知道哪些位置可行,哪些不可行,这样是否会产生不可行的动作?

@KohakuBlueleaf
Copy link

在產生policy之後把所有不能動的位置的porb改成-INF或0
(有過softmax用0即可 沒有的話用-INF)

@ZhangXi20181002
Copy link
Author

在產生policy之後把所有不能動的位置的porb改成-INF或0
(有過softmax用0即可 沒有的話用-INF)

明白了,感谢您的解答!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants