PPO baseline does not work on Canyons map #49

subhash · 2019-12-10T03:35:22Z

With the latest code, when I run python main.py --ppo-baseline, the RL agent is run but it starts training from scratch because it expects the checkpoint path to be set in the config

So, I downloaded the weights from here (is that the latest?) and set the path in the config. With this, the agent seems to be running with a pretrained model, but still collisions and g-force exceptions are many. This is in contrast to the mnet2 baseline which basically keeps ego within the lane for the whole episode.

@crizCraig Please confirm if this is the latest baseline for PPO or if there's a better one that I am missing.

The text was updated successfully, but these errors were encountered:

subhash · 2019-12-10T03:55:24Z

Without the weights and config change:

With the weights and config change:

crizCraig · 2019-12-10T21:14:25Z

Hi @subhash - It looks both are regressions on the model that was trained in June 2018, however, the top one is running the same model - it's just that the environment has changed.

For clarity, I get something similar to the top agent with

main.py --agent bootstrapped_ppo2 --ppo-baseline --experiment bootstrap --eval-only --sync

which downloads and runs the weights trained in June 2018.

https://twitter.com/crizcraig/status/1008957580054441984

https://www.youtube.com/watch?v=AG0EqPTjgVE

This is one of the reasons we built continuous integration that evaluates baseline agent performance and only merges if it's within a confidence interval. Unfortunately, the PPO model was built before this test, and so regressions were allowed. I think retraining should get you much better results as things like the reflectivity of the road and lighting have changed. Robustness to this type of change could also be trained using our view modes for domain randomization.

subhash · 2019-12-11T14:30:04Z

Thanks @crizCraig

The demo videos look really good. Is there a way for me to run the simulator under the same night/rainy environment? If I could do that, I would be able to generate the same results with the PPO baseline, I think
I attempted retraining the PPO agent from scratch on the regular Canyons environment. But even after 12 hours, no luck. Ego trundles along for a while before getting stuck. The average episode reward hovers around -1200. Do you think the reward configuration might be overfitting the original environment?

subhash closed this as completed Dec 10, 2019

subhash reopened this Dec 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO baseline does not work on Canyons map #49

PPO baseline does not work on Canyons map #49

subhash commented Dec 10, 2019 •

edited

subhash commented Dec 10, 2019

crizCraig commented Dec 10, 2019

subhash commented Dec 11, 2019

PPO baseline does not work on Canyons map #49

PPO baseline does not work on Canyons map #49

Comments

subhash commented Dec 10, 2019 • edited

subhash commented Dec 10, 2019

crizCraig commented Dec 10, 2019

subhash commented Dec 11, 2019

subhash commented Dec 10, 2019 •

edited