Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about detach #30

Open
zen-d opened this issue Aug 29, 2022 · 7 comments
Open

Question about detach #30

zen-d opened this issue Aug 29, 2022 · 7 comments

Comments

@zen-d
Copy link

zen-d commented Aug 29, 2022

Hi,

Thanks for the nice work. For calculating the IoU target, I think the detach should be used on the "predicted items". Specifically, in my view, for these two lines with detach applied, the detach should be moved to their previous lines, i.e. line 408 and 424. Correct me if I miss something.
https://github.com/hyz-xmaster/VarifocalNet/blob/master/mmdet/models/dense_heads/vfnet_head.py#L409
https://github.com/hyz-xmaster/VarifocalNet/blob/master/mmdet/models/dense_heads/vfnet_head.py#L425

Could you please explain it more? Thanks.

@hyz-xmaster
Copy link
Owner

Hi, we would like the gradient of losses to flow to pos_decoded_bbox_preds and pos_decoded_bbox_preds_refine so as to optimize the predictions, so we don't detach them. Conversely, we don't want the gradient to back-propagate to pos_decoded_target_preds.

@zen-d
Copy link
Author

zen-d commented Aug 30, 2022

@hyz-xmaster Thanks for your prompt reply.
For pos_decoded_target_preds, I guess it may not be a must. It seems common in machine learning that the gradient does not flow back to the "target".
For pos_decoded_bbox_preds and pos_decoded_bbox_preds_refine, I understand now that you intended to do that, but could you please share more insights about this gradient behavior? As also mentioned in your paper, VFL is somewhat like GFL, in which QFL uses detached localization prediction to calculate IoU for classification loss, refer to mmdet implementation here. According to my experience, if detach is removed in this line for QFL, final AP drops significantly. Could you please comment on the different design choices between VFL and QFL? Thanks.

@zen-d
Copy link
Author

zen-d commented Sep 15, 2022

@hyz-xmaster any update?

@hyz-xmaster
Copy link
Owner

As I see in the QFL implementation, it uses pos_decoded_bbox_preds to compute score and then uses the score here. It seems QFL uses the score and label to compute the target for the cls_score, which has the similar function of pos_decoded_target_preds. You should not propagate the gradient to targets.

@zen-d
Copy link
Author

zen-d commented Sep 15, 2022

@hyz-xmaster Exactly, I see that QFL does not and should not propagate gradients to pos_decoded_bbox_preds. Then back to my above question, why did VFL choose to propagate gradients to pos_decoded_bbox_preds, without hurting the performance?

@hyz-xmaster
Copy link
Owner

@hyz-xmaster Exactly, I see that QFL does not and should not propagate gradients to pos_decoded_bbox_preds. Then back to my above question, why did VFL choose to propagate gradients to pos_decoded_bbox_preds, without hurting the performance?

See this line, we detach the iou_targets_ini calculated from pos_decoded_bbox_preds, and this line.

We only propagate gradients to pos_decoded_bbox_preds here.

@jacksonsc007
Copy link

@zen-d
Though I only took a cursory look at your question, I think you've made a misconception. The line you quoted refers to the calculation of box regression loss, in which the gradient of predicted box regression value should not be detached. What'more, varifocal loss does not apply to the calculation of regression loss, it serves as a cls loss. So the only thing we should take care is to detach the gradient of cls_target in this line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants