Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It does'nt work for dectecting 1600 objects for the funtion "torch.clamp(classification, min=1e-4, max=1.0 - 1e-4)" in focol_loss #131

Open
y78h11b09 opened this issue Mar 17, 2020 · 2 comments

Comments

@y78h11b09
Copy link

y78h11b09 commented Mar 17, 2020

Recently, I found that efficientDet-do didn't work for 1600 objects because cls_loss still didn't downgrade in training.

So , i modified the funtion **"torch.clamp(classification, min=1e-4, max=1.0 - 1e-4)"
into "torch.clamp(classification, min=1e-8, max=1.0 - 1e-8)" in focol_loss,
and then efficientDet-d0 can be traind on for 1600 objects.
i
Who can tell me what advantage is the funtion of torch.clamp() in focal_loss? i think it should be removed completely!

@rmcavoy
Copy link

rmcavoy commented Mar 25, 2020

The clamp function probably improves stability in some cases but is very much unnecessary as you can switch to using the "with logits" version of the focal loss as is used in the tensorflow version of the official code (quote below from the official code's comments)

# Below are comments/derivations for computing modulator.
# For brevity, let x = logits,  z = targets, r = gamma, and p_t = sigmod(x)
# for positive samples and 1 - sigmoid(x) for negative examples.
#
# The modulator, defined as (1 - P_t)^r, is a critical part in focal loss
# computation. For r > 0, it puts more weights on hard examples, and less
# weights on easier ones. However if it is directly computed as (1 - P_t)^r,
# its back-propagation is not stable when r < 1. The implementation here
# resolves the issue.
#
# For positive samples (labels being 1),
#    (1 - p_t)^r
#  = (1 - sigmoid(x))^r
#  = (1 - (1 / (1 + exp(-x))))^r
#  = (exp(-x) / (1 + exp(-x)))^r
#  = exp(log((exp(-x) / (1 + exp(-x)))^r))
#  = exp(r * log(exp(-x)) - r * log(1 + exp(-x)))
#  = exp(- r * x - r * log(1 + exp(-x)))
#
# For negative samples (labels being 0),
#    (1 - p_t)^r
#  = (sigmoid(x))^r
#  = (1 / (1 + exp(-x)))^r
#  = exp(log((1 / (1 + exp(-x)))^r))
#  = exp(-r * log(1 + exp(-x)))
#
# Therefore one unified form for positive (z = 1) and negative (z = 0)
# samples is:
#      (1 - p_t)^r = exp(-r * z * x - r * log(1 + exp(-x))). 

@y78h11b09
Copy link
Author

y78h11b09 commented Mar 25, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants