Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some implementation issues #92

Open
6Lackiu opened this issue Jan 11, 2023 · 14 comments
Open

some implementation issues #92

6Lackiu opened this issue Jan 11, 2023 · 14 comments

Comments

@6Lackiu
Copy link

6Lackiu commented Jan 11, 2023

Hi! First of all thank you for sharing such a great project! This gave me a lot of inspiration! Really appreciate!
I have some questions I would like to ask you.

  1. I read your paper "Social Attention for Autonomous Decision-Making in Dense Traffic", and you mainly proposed an "attention-based neural network architecture". But in this repo, what is the purpose of implementing so many agents (MCTS, DQN...)? Different ways to implement this ‘attention architecture’?

  2. Where is the scripts/analyze.py file? Has it been superseded?
    image
    image

  3. As an RL rookie, I would like to ask if the ‘attention architecture’ you proposed can be used in other RL algorithms? As an example, suppose I have trained an RL algorithm called 'ABC' to control all the autonomous vehicles in the scene.
    Now I want to add your proposed 'attention architecture' to it, so that each vehicle knows which vehicles around it should pay the most attention. Finally the 'ABC' algorithm is used to train the whole model.
    I want to know is this possible? How should I integrate 'attention architecture' into 'ABC'?

Looking forward to your reply! Thanks!

@eleurent
Copy link
Owner

Hi, thanks for the feedback!

  1. You're right, maybe it's a bit confusing. My intent was to implement a lightweight RL library with many (unrelated) agents, and add my own contributions in it. But retrospectively that was probably not a great idea, as the code for each paper is not isolated... For this paper, only the DQN agent is useful, as this was the baseline that I used, just changing the network architecture.
  2. Yes, I used that initially to generate rewards plots but you should now use tensorboard, which is generally superior and broadly used.
  3. Yes, the attention-based architecture can be used with any algorithm that trains a neural network to make decisions (DQN, PPO, A3C, MuZero, etc.). The way you should integrate it to your 'ABC' algorithm depends on the library/implementation you are using, but generally there will be a file where the network/model is defined, this is what you need to replace.

@6Lackiu
Copy link
Author

6Lackiu commented Feb 4, 2023

Hello! @eleurent Thank you so much for your reply, but I'm still a little confused.

  1. What are the input and output of EgoAttentionNetwork? Is the input the observation information of the vehicle? What conversion is needed? I can't seem to find the location of the specific code.

  2. It may be that my statement about question 3 was not clear last time, please allow me to explain again.
    One step in the 'ABC' algorithm is to sum the rewards of all neighboring vehicles, but I want to introduce the attention mechanism into it, which is to only calculate the rewards of nearby vehicles that NEED attention. That is to say, I need to extract the content of the ego attention part in your project, but it seems that the coupling degree of each module is very high, how do I need to extract it? Is it rl_agents/agents/common/models.py?

  3. After extracting the attention mechanism part, where should I put it in the 'ABC' part? According to your last answer, if it is placed in the network/model part, will it not destroy the original model/training process of 'ABC'?

  4. I am also curious if the above is true, what is the whole 'ABC' training process like? Is it training the original decision-making algorithm of 'ABC' while training the attention mechanism?Can I still train the 'ABC' algorithm the same way I did before, or do I need some changes?

I'm sorry if my statement is not clear, because I haven't done this kind of work before, and my thinking is a little confused.. so I come here again to ask you for advice.

Really looking forward to your reply, this is very helpful for me, thank you!

@eleurent
Copy link
Owner

eleurent commented Feb 4, 2023

  1. the inputs is an array containing a set of per-vehicle features. For instance, I used position x, y (absolute or relative to ego vehicle position), velocity vx vy (absolute or relative), cos/sin of heading angle, etc. The code here only assumes that the observation provided by the agent is an array of shape (n_vehicles, n_features). The output is an embedding, obtained from the attention between the ego-vehicle's token and all other vehicles'.

  2. I don't know the details of the ABC algorithm, and I don't understand why you would want to use attention to aggregate reward. In my understanding, the reward is the optimisation objective, so it has to be defined independently of the agent. Anyway, yes if you want to reuse my code you can just copy the EgoAttention class from rl_agents/agents/common/models.py.

  3. I don't know the specifics of your ABC algorithm. You can probably put it wherever you would have a network anyway.

  4. Again, I have not heard of this ABC algorithm you are referring to. Generally, you can change the architecture of a network trained by any RL algorithm without affecting how the RL algorithm itself works.

@6Lackiu
Copy link
Author

6Lackiu commented Feb 5, 2023

@eleurent
Sorry for your confusion caused by my unclear expression! But your answers have already helped me a lot! Thank you so much!
Wish you all the best!

@6Lackiu 6Lackiu closed this as completed Feb 5, 2023
@eleurent
Copy link
Owner

eleurent commented Feb 5, 2023

No worries at all, glad you found this helpful!

@6Lackiu 6Lackiu reopened this Mar 13, 2023
@6Lackiu
Copy link
Author

6Lackiu commented Mar 13, 2023

Hello @eleurent ! I would like to ask some questions, so I reopened this issue.
I want to see some information about variable dimensions in the function compute_vehicles_attention in "rl_agents/agents/deep_q_network/graphics.py". So I modified the "main" function in "scripts/experiments.py" and added a breakpoint.
image
image
However, when I run the program using Pycharm, it does not stop at the breakpoint.
Do you have any good solutions for this? Thanks!

@eleurent
Copy link
Owner

Hi,
No, I have no idea why PyCharm would not stop at the breakpoint... I tried it and it worked fine
image

Are you running the program in debug mode (Shift+F9), and not run mode (Shift+F10)?

(also, I would advise to put the breakpoint directly in graphics.py if you want to analyse the variables there)

@6Lackiu
Copy link
Author

6Lackiu commented Mar 19, 2023

Hi, No, I have no idea why PyCharm would not stop at the breakpoint... I tried it and it worked fine image

Are you running the program in debug mode (Shift+F9), and not run mode (Shift+F10)?

(also, I would advise to put the breakpoint directly in graphics.py if you want to analyse the variables there)

It works now. I must have messed something up before...
Sorry for taking up your time with questions like this!
Thanks!!

@6Lackiu 6Lackiu closed this as completed Mar 19, 2023
@6Lackiu
Copy link
Author

6Lackiu commented Mar 23, 2023

Hi @eleurent! Sorry I have another question.

  1. Do the columns of the attention matrix not correspond to the corresponding vehicles? Why do we need to assign attention values to vehicles based on their distances?
  2. If we do this, are we not just looking for the nearest few cars? Where does the role of attention mechanism come into play?

Look forward to your answer! Thank you!!

image

@eleurent
Copy link
Owner

Do the columns of the attention matrix not correspond to the corresponding vehicles? Why do we need to assign attention values to vehicles based on their distances?

They do! It's just that the we don't know the mapping between vehicle i in the attention matrix and the vehicles in the scene, since the vehicles ids are not provided in the observation. So in order to draw the attention edges, I'm mapping rows of the observation to vehicles in the scene based on their x,y coordinate features.

If we do this, are we not just looking for the nearest few cars? Where does the role of attention mechanism come into play?

Yes we typically limit the observation to the nearest few cars, e.g. 15 vehicles. We observe empirically that the attention is useful by enabling the model to focus its computations on the 1-2 most relevant vehicles at any given time, which leads to better decisions. It is also invariant to permutations of the vehicles ordering, unlike other architectures such as MLPs.

@6Lackiu
Copy link
Author

6Lackiu commented Mar 28, 2023

Do the columns of the attention matrix not correspond to the corresponding vehicles? Why do we need to assign attention values to vehicles based on their distances?

They do! It's just that the we don't know the mapping between vehicle i in the attention matrix and the vehicles in the scene, since the vehicles ids are not provided in the observation. So in order to draw the attention edges, I'm mapping rows of the observation to vehicles in the scene based on their x,y coordinate features.

If we do this, are we not just looking for the nearest few cars? Where does the role of attention mechanism come into play?

Yes we typically limit the observation to the nearest few cars, e.g. 15 vehicles. We observe empirically that the attention is useful by enabling the model to focus its computations on the 1-2 most relevant vehicles at any given time, which leads to better decisions. It is also invariant to permutations of the vehicles ordering, unlike other architectures such as MLPs.

Sorry for the late reply! Thank you for answering my confusion!

@6Lackiu
Copy link
Author

6Lackiu commented Mar 31, 2023

Hi! @eleurent
I have a small question.
What role does the attention of the vehicle to itself play throughout the process? Does the vehicle need to pay attention to itself? I don't quite understand.
Thanks for your answer!

@6Lackiu 6Lackiu reopened this Mar 31, 2023
@eleurent
Copy link
Owner

eleurent commented Apr 1, 2023

I think there can be two roles:

  1. the decision may depend on the state of the vehicle, e.g. its current position, or its speed or heading. So the attention focusing on the ego-vehicle help propagate these features forward (even though they should still be there through the residual connection in the attention block)
  2. the attention layer typically converges to some filtering function which highlights only "dangerous vehicles" (which have a high risk of short term collision), and set the weights of other vehicles to 0 (they are irrelevant to the decision, their information can be dropped). But in a situation where no vehicle is dangerous, they should all be filtered out and all their weights should be 0. However, since attention weights are normalised (it is a probability distribution), they have to sum up to 1, and so the only way of keeping a weight of 0 for all other vehicles is to put all the probability mass on the ego-vehicle.

These are just hypotheses of course, the function that the attention layer ends up implementing is emerging through learning. You could very well do the experiment of removing the ego-vehicle from the available tokens, and see if this degrades performance. (you'd probably need to keep the residual connection though, we still want the ego-vehicle's feature to be available for the final decision.

@6Lackiu
Copy link
Author

6Lackiu commented Apr 2, 2023

I think there can be two roles:

  1. the decision may depend on the state of the vehicle, e.g. its current position, or its speed or heading. So the attention focusing on the ego-vehicle help propagate these features forward (even though they should still be there through the residual connection in the attention block)
  2. the attention layer typically converges to some filtering function which highlights only "dangerous vehicles" (which have a high risk of short term collision), and set the weights of other vehicles to 0 (they are irrelevant to the decision, their information can be dropped). But in a situation where no vehicle is dangerous, they should all be filtered out and all their weights should be 0. However, since attention weights are normalised (it is a probability distribution), they have to sum up to 1, and so the only way of keeping a weight of 0 for all other vehicles is to put all the probability mass on the ego-vehicle.

These are just hypotheses of course, the function that the attention layer ends up implementing is emerging through learning. You could very well do the experiment of removing the ego-vehicle from the available tokens, and see if this degrades performance. (you'd probably need to keep the residual connection though, we still want the ego-vehicle's feature to be available for the final decision.

Got it! Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants