/chapter5/chapter5_questions&keywords #55

qiwang067 · 2021-05-24T01:16:07Z

https://datawhalechina.github.io/easy-rl/#/chapter5/chapter5_questions&keywords

Description

Strawberry47 · 2021-11-11T02:14:14Z

总结的超级好，谢谢博主！

yyysjz1997 · 2021-11-11T07:42:56Z

总结的超级好，谢谢博主！
谢谢～

Strawberry47 · 2021-11-11T08:00:21Z

代码部分我有一个小小的疑问：为什么actor的输出（即输入state，产生action的概率）要命名为dist呀？是distance的简写吗？

FulChou · 2021-11-11T08:02:08Z

我理解成是动作的分布，所以是distribution的意思发自我的iPhone

…

------------------ Original ------------------ From: Strawberry47 ***@***.***> Date: Thu,Nov 11,2021 4:00 PM To: datawhalechina/easy-rl ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [datawhalechina/easy-rl] /chapter5/chapter5_questions&keywords (#55) 代码部分我有一个小小的疑问：为什么actor的输出（即输入state，产生action的概率）要命名为dist呀？是distance的简写吗？ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

Strawberry47 · 2021-11-11T08:04:12Z

@CSU-FulChou
我理解成是动作的分布，所以是distribution的意思

发自我的iPhone

------------------ Original ------------------
From: Strawberry47 @.>
Date: Thu,Nov 11,2021 4:00 PM
To: datawhalechina/easy-rl @.>
Cc: Subscribed @.***>
Subject: Re: [datawhalechina/easy-rl] /chapter5/chapter5_questions&keywords (#55)

代码部分我有一个小小的疑问：为什么actor的输出（即输入state，产生action的概率）要命名为dist呀？是distance的简写吗？

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.

啊啊，是哈，我疏忽了，Thanks♪(･ω･)ﾉ

Strawberry47 · 2021-11-11T09:14:52Z

啊，代码部分我还有一个critic_loss计算问题：是Q_value(old)-critic_value(new)，这样算的吗？不知道理解的对不对~

JimmyYoungggg · 2021-12-29T02:41:45Z

请问博主PPO算法里的θk多久更新一次？如果是每次迭代都更新的话，那采样效率岂不是依然不高？

littlebrotherdog · 2024-03-13T07:56:10Z

@JimmyYoungggg
请问博主PPO算法里的θk多久更新一次？如果是每次迭代都更新的话，那采样效率岂不是依然不高？

update policy every n steps

    if self.sample_count % self.update_freq != 0:
        return

看代码，频率可以自己设置的。

littlebrotherdog · 2024-03-13T07:58:47Z

@Strawberry47
啊，代码部分我还有一个critic_loss计算问题：是Q_value(old)-critic_value(new)，这样算的吗？不知道理解的对不对~

critic_loss = (returns - values).pow(2).mean()
这里做了一个MSE,critic这个网络是用来估计V的。

qiwang067 added Gitalk /chapter5/chapter5_questions&keywords labels May 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/chapter5/chapter5_questions&keywords #55

/chapter5/chapter5_questions&keywords #55

qiwang067 commented May 24, 2021

Strawberry47 commented Nov 11, 2021

yyysjz1997 commented Nov 11, 2021

Strawberry47 commented Nov 11, 2021

FulChou commented Nov 11, 2021 via email

Strawberry47 commented Nov 11, 2021

Strawberry47 commented Nov 11, 2021

JimmyYoungggg commented Dec 29, 2021

littlebrotherdog commented Mar 13, 2024

littlebrotherdog commented Mar 13, 2024

/chapter5/chapter5_questions&keywords #55

/chapter5/chapter5_questions&keywords #55

Comments

qiwang067 commented May 24, 2021

Strawberry47 commented Nov 11, 2021

yyysjz1997 commented Nov 11, 2021

Strawberry47 commented Nov 11, 2021

FulChou commented Nov 11, 2021 via email

Strawberry47 commented Nov 11, 2021

Strawberry47 commented Nov 11, 2021

JimmyYoungggg commented Dec 29, 2021

littlebrotherdog commented Mar 13, 2024

update policy every n steps

littlebrotherdog commented Mar 13, 2024