Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the imitation learning strategy in the paper #6

Open
YESAndy opened this issue Mar 8, 2024 · 1 comment
Open

Question about the imitation learning strategy in the paper #6

YESAndy opened this issue Mar 8, 2024 · 1 comment

Comments

@YESAndy
Copy link

YESAndy commented Mar 8, 2024

Hi Yicong,

I realized that the imitation learning loss you used in the code base is essentially the cross entropy loss between the predicted action and the oracle action which is obtained by selecting the closest waypoint to the goal. However, this oracle action might not be optimal because sometimes the closest waypoint may not be on the ground truth path (reference path in the dataset). like the following pic,

Screenshot 2024-03-07 at 5 52 59 PM

It is likely to cause the agent to loop around the area.

As the waypoint predictor shows very good results, I wonder if you can comment on how the waypoint predictor manages to avoid the above issue.

Many thanks!
Andy

@wz0919
Copy link
Collaborator

wz0919 commented Mar 8, 2024

I'd like to give you a quick answer to the shown example...We use the shortest geodesic distance (the length of the shortest path to the goal starting from the waypoint) to select the gt action. Although the top waypoint has the shortest geometric distance to the goal, its geodesic distance to the goal is still higher than the right waypoint (as it needs to trace back to S and follow the right path) so the right waypoint is still GT.

One very rare case is that the shortest path to the goal is from the left, and the geodesic distance of the top waypoint is lower than the right, in this case, it does have the problem you mentioned...strongest-path waypoint isn't optimal. But considering the robustness of our predictor, it usually will output a waypoint on the left in this case and this will be a good gt waypoint. In our paper, we can achieve 97% SR with ground truth actions in R2RCE, which shows it's hard to have a loop point. However, I've visualized the failure case, besides some cases with a bad simulation from habitat (like a visually open place but you can't go), there are about only two cases caused by such a loop, which is, the predictor cannot predict a waypoint on the optimal direction (even no waypoint in the optimal 180-degree side), so it has to go back and starts a loop (like between the top waypoint and S in your figure). But as I mentioned before this is a very very rare case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants