Inferencing the learned Policies #233

arshad171 · 2024-04-07T17:07:14Z

Hi,

Firstly, thank you for your efforts, amazing work!

I am currently working with a custom environment and managed to train MARL policies. However, I am grappling with running inferences on the learned policies.

I came across #69 and load_and_reference, but I noticed that while running inference this way, the actions received by the step function (where I decided to log the relevant metrics during inference, since render invokes the training loop, which in turn the step function) do not make sense for the following reasons:

The actions keep fluctuating given the same state, which I believe shouldn't be the case.
The randomness associated with the actions keeps changing when I update the seed in ray.yaml config.

Based on these observations, I am inclined to believe that it is the action noise added to the actions (I am running the PPO algorithm), but I may be wrong.
(/My environment does not have any randomness)

May I please know what the right way is to run inference, through MARL interface or directly load the policies via Raylib. And if render is the way to run inference, then how can I get consistent results that do not keep changing with the random seed.

Thanks,
Arshad

The text was updated successfully, but these errors were encountered:

Morphlng · 2024-04-10T09:41:49Z

MARLlib's render API is actually a one episode training, so the policy is not actually "inferencing". I would recommend you to only load the model with MARLlib's API, then inference using RLlib's compute_single_action API. For more detail, you can refer to my evaluation script

In general, for stochastic policy, you should set explore=True to acheive the best performance.

arshad171 · 2024-04-18T11:25:50Z

I can't thank you enough for the script! I had a tweak a few things to make it work for the custom policies I had, but it works like a charm!
I was grappling with running inference for quite some time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inferencing the learned Policies #233

Inferencing the learned Policies #233

arshad171 commented Apr 7, 2024 •

edited

Morphlng commented Apr 10, 2024

arshad171 commented Apr 18, 2024

Inferencing the learned Policies #233

Inferencing the learned Policies #233

Comments

arshad171 commented Apr 7, 2024 • edited

Morphlng commented Apr 10, 2024

arshad171 commented Apr 18, 2024

arshad171 commented Apr 7, 2024 •

edited