-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inferencing the learned Policies #233
Comments
MARLlib's
|
I can't thank you enough for the script! I had a tweak a few things to make it work for the custom policies I had, but it works like a charm! |
Hi,
Firstly, thank you for your efforts, amazing work!
I am currently working with a custom environment and managed to train MARL policies. However, I am grappling with running inferences on the learned policies.
I came across #69 and load_and_reference, but I noticed that while running inference this way, the actions received by the
step
function (where I decided to log the relevant metrics during inference, since render invokes the training loop, which in turn thestep
function) do not make sense for the following reasons:ray.yaml
config.Based on these observations, I am inclined to believe that it is the action noise added to the actions (I am running the PPO algorithm), but I may be wrong.
(/My environment does not have any randomness)
May I please know what the right way is to run inference, through MARL interface or directly load the policies via Raylib. And if render is the way to run inference, then how can I get consistent results that do not keep changing with the random seed.
Thanks,
Arshad
The text was updated successfully, but these errors were encountered: