Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO baseline does not work on Canyons map #49

Open
subhash opened this issue Dec 10, 2019 · 3 comments
Open

PPO baseline does not work on Canyons map #49

subhash opened this issue Dec 10, 2019 · 3 comments

Comments

@subhash
Copy link

subhash commented Dec 10, 2019

With the latest code, when I run python main.py --ppo-baseline, the RL agent is run but it starts training from scratch because it expects the checkpoint path to be set in the config

So, I downloaded the weights from here (is that the latest?) and set the path in the config. With this, the agent seems to be running with a pretrained model, but still collisions and g-force exceptions are many. This is in contrast to the mnet2 baseline which basically keeps ego within the lane for the whole episode.

@crizCraig Please confirm if this is the latest baseline for PPO or if there's a better one that I am missing.

@subhash subhash closed this as completed Dec 10, 2019
@subhash subhash reopened this Dec 10, 2019
@subhash
Copy link
Author

subhash commented Dec 10, 2019

Without the weights and config change:

see1

With the weights and config change:

see2

@crizCraig
Copy link
Member

Hi @subhash - It looks both are regressions on the model that was trained in June 2018, however, the top one is running the same model - it's just that the environment has changed.

For clarity, I get something similar to the top agent with

main.py --agent bootstrapped_ppo2 --ppo-baseline --experiment bootstrap --eval-only --sync

which downloads and runs the weights trained in June 2018.

https://twitter.com/crizcraig/status/1008957580054441984

https://www.youtube.com/watch?v=AG0EqPTjgVE

This is one of the reasons we built continuous integration that evaluates baseline agent performance and only merges if it's within a confidence interval. Unfortunately, the PPO model was built before this test, and so regressions were allowed. I think retraining should get you much better results as things like the reflectivity of the road and lighting have changed. Robustness to this type of change could also be trained using our view modes for domain randomization.

@subhash
Copy link
Author

subhash commented Dec 11, 2019

Thanks @crizCraig

  1. The demo videos look really good. Is there a way for me to run the simulator under the same night/rainy environment? If I could do that, I would be able to generate the same results with the PPO baseline, I think

  2. I attempted retraining the PPO agent from scratch on the regular Canyons environment. But even after 12 hours, no luck. Ego trundles along for a while before getting stuck. The average episode reward hovers around -1200. Do you think the reward configuration might be overfitting the original environment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants