New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

value_iteration 算法不收敛？ #138

Open

chensisi0730 opened this issue Jun 1, 2023 · 1 comment

Assignees

chensisi0730 commented Jun 1, 2023

value_iteration 测试的成功率是： 0.638 ，价值算法需要不断的迭代，做策略评估，代码里面只做了一次迭代

qiwang067 assigned johnjim0816

sherlcok314159 commented Jun 8, 2023

All of these algorithms converge to an optimal policy for discounted ﬁnite MDPs. FYI，引自强化学习导论，你可以尝试添加discount

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment