-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
value_iteration 算法不收敛 ? #138
Comments
All of these algorithms converge to an optimal policy for discounted finite MDPs. FYI,引自强化学习导论,你可以尝试添加discount |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
value_iteration 测试的成功率是: 0.638 ,价值算法需要不断 的迭代,做策略评估, 代码里面只做了一次迭代
The text was updated successfully, but these errors were encountered: