Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQN-Social Attention in Highway-env #79

Open
SimoMaestri opened this issue Apr 22, 2022 · 5 comments
Open

DQN-Social Attention in Highway-env #79

SimoMaestri opened this issue Apr 22, 2022 · 5 comments

Comments

@SimoMaestri
Copy link

Hi,
i've trained a DQN model with social attention in HighwayEnv. I've used egoattention with 2 heads but i don't understand the results.
In the first video you can see the output when the model is trained with 1000 episodes. In the second video the model is trained with 3000 episodes.
I've noticed that, in the first case, one head gives attention to all the vehicles that are on the left or in front of the ego-vehicle, while the other head gives attention to vehicles that are on the right lane.
In the second case, it's all different. Both heads gives more attention to vehicles that are behind the ego-vehicle and i don't understant why it happens. It's also strange that both heads gives attention to the same vehicle. Can you give some explanation about it?

1000episodes:

download.mp4

3000 episodes:
https://user-images.githubusercontent.com/32385644/164644814-43cdf4de-64c6-4b22-978b-116583bd56ad.mp4

@eleurent
Copy link
Owner

Hi @SimoMaestri
Very interesting results, thank you for sharing them!

I think that although we can observe and describe what these models do, it is often quite difficult to explain why they do it.
What these attention heads end up doing is really is a byproduct of training and representation learning, and they can fall into different local maxima under different conditions (network initialisation, random data generated through exploration, etc).

I guess the bottomline is that the attention-based representation network seems to have no incentive to do anything different (than having two heads looking at the same vehicles behind) as long as it works (i.e. induces high rewards), so the question is rather: why does this strategy work? Does the 2nd policy (3k episodes) indeed achieves similar performance as the 1st one (1k episodes)?

@eleurent
Copy link
Owner

Actually, it looks like the 2nd policy does look at front vehicles but only when they are very close:
image
image
image

So the information of a possibly imminent collision is present in the output, even if its not weighted 100%

Also, after overtaking a vehicle, we can see that the model is still attending to them for a little while:
image
but this makes sense especially when you have to take a lane change decision, to avoid rear-collisions when cutting before another vehicle.

However, that rear-attention drops whenever the vehicle gets a bit further, and the risk of collision from a lane change disappears:
image

This happens three times: at 0:03, 0:06, and 0:09, so it looks like a consistent behaviour.

So the last mystery is why the attention focuses on the single vehicle in the back most of the time: I would guess it is kind of a default behaviour that emerged and occurs whenever there is no imminent danger and no decision to take. Then, it does not really matter where you look anyway, as long as you can switch your attention back when anything comes up. In my own experiments, the default vehicle that is observed when nothing in particular is happening is often the ego-vehicle itself, like e.g. here: https://raw.githubusercontent.com/eleurent/social-attention/master/assets/straight.mp4

But it is not always the case, and you can see e.g. here: https://raw.githubusercontent.com/eleurent/social-attention/master/assets/2_distance.mp4 that after the important decisions have been taken and the vehicle can safely proceed (at 0:06), both attention heads are also looking at a vehicle way behind which seems irrelevant, just like in your 3k training example.

So although it is hard to justify exactly, my best guess would be that in these nominal situations, the (hypothetical) collision-detection neurons which normally drive the attention scores in dangerous states are inhibited, and so the attention just focuses on anything because it has to from the softmax formulation. It could also be uniform, which would probably feel better to us, but it doesnt has to, as long as it works.
Again, that is just a guess, and I won't fight for it ^^

@SimoMaestri
Copy link
Author

Hi, thank you so much for this exhaustive and very clear explanation.

@Shuchengzhang888
Copy link

Shuchengzhang888 commented Feb 28, 2023

I met the same problem when I trained attention model in highway-env, but I think it's just a bug.

In Kinematic Observation, it uses MAXSPEED to normalize the observation. It is fine for other NPC in highway-env, because it uses the relative coordinates, but for ego vehicle which uses absolute coordinate, it cannot work after the driving distance over the upper boundary. That means it is always be 1.

self.features_range = {
                "x": [-5.0 * Vehicle.MAX_SPEED, 5.0 * Vehicle.MAX_SPEED],
                "y": [-AbstractLane.DEFAULT_WIDTH * len(side_lanes), AbstractLane.DEFAULT_WIDTH * len(side_lanes)],
                "vx": [-2*Vehicle.MAX_SPEED, 2*Vehicle.MAX_SPEED],
                "vy": [-2*Vehicle.MAX_SPEED, 2*Vehicle.MAX_SPEED]
            }

In this case, when it compute the attention, the code will regard the ego vehicle as the other one which is behind all other vehicle. That is why it always keep attention to the vehicle which is behind it, even it is not in the input. I just add a "if" here, and it can work well.

deep_q_network/ graphics.py

 if v_index == 0:
                vehicle = agent.env.unwrapped.vehicle
  else:
                v_position = {}
                for feature in ["x", "y"]:
                    v_feature = state[v_index, obs_type.features.index(feature)]
                    v_feature = remap(v_feature, [-1, 1], obs_type.features_range[feature])
                    v_position[feature] = v_feature
                v_position = np.array([v_position["x"], v_position["y"]])
                if not obs_type.absolute and v_index > 0:
                    v_position += agent.env.unwrapped.vehicle.position
                vehicle = min(agent.env.road.vehicles, key=lambda v: np.linalg.norm(v.position - v_position))
            v_attention[vehicle] = attention[:, v_index]

@eleurent
Copy link
Owner

eleurent commented Mar 4, 2023

Oooooh, that's a great catch @Shuchengzhang888! thanks :)
Yeah, this observation normalisation - denormalisation in the attention visualisation is really ugly. Maybe it would be better to directly include vehicle indices in the obs (and mask them from the model), instead of mapping to the closest vehicle based on position...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants