You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been trying out gSDE lately, which seems to be working well for my problem, but I have found that when I simulate the learned model using the model.predict() approach described in the examples in the documentation (e.g., here), it gives deterministic behaviour (in the sense that the behaviour in each simulated episode will look the samel) even if I set deterministic=False in the call to predict(). After some digging, I think I understand that this is due to the sde_sample_freq setting not being made use of in predict(), which sort of makes sense because that function doesn't have access to the environment.
So my question is just: Am I correct in understanding that when running models learned with gSDE, if the user wants the same non-deterministic behaviour as at the end of learning, the user needs to keep track of n_sde_freq themselves and do the model.policy.reset_noise(env.num_envs) themselves at appropriate intervals? If so, it's possibly something to mention in the documentation? (Happy to have a go at contributing with such edit(s) if appropriate.)
Checklist
I have checked that there is no similar issue in the repo
Am I correct in understanding that when running models learned with gSDE, if the user wants the same non-deterministic behaviour as at the end of learning, the user needs to keep track of n_sde_freq themselves and do the model.policy.reset_noise(env.num_envs) themselves at appropriate intervals?
yes, you are correct. gSDE is meanly meant to be used during training as at test time, for continuous control, it is recommended to use the deterministic controller.
If so, it's possibly something to mention in the documentation? (Happy to have a go at contributing with such edit(s) if appropriate.)
❓ Question
Many thanks for the great library!
I have been trying out gSDE lately, which seems to be working well for my problem, but I have found that when I simulate the learned model using the
model.predict()
approach described in the examples in the documentation (e.g., here), it gives deterministic behaviour (in the sense that the behaviour in each simulated episode will look the samel) even if I setdeterministic=False
in the call topredict()
. After some digging, I think I understand that this is due to thesde_sample_freq
setting not being made use of inpredict()
, which sort of makes sense because that function doesn't have access to the environment.So my question is just: Am I correct in understanding that when running models learned with gSDE, if the user wants the same non-deterministic behaviour as at the end of learning, the user needs to keep track of
n_sde_freq
themselves and do themodel.policy.reset_noise(env.num_envs)
themselves at appropriate intervals? If so, it's possibly something to mention in the documentation? (Happy to have a go at contributing with such edit(s) if appropriate.)Checklist
The text was updated successfully, but these errors were encountered: