Potential bug in NegativeLogLikelihood impl #150

hweom · 2021-10-25T01:06:56Z

Just started to explore deep learning and chose Juice as the starting framework, since I want to stick with Rust. Since I'm pretty new to the domain, it might be just my mistake.

I was looking at the NegativeLogLikelihood::compute_output() and I think there is a bug. Instead of

        for &label_value in native_labels {
            let probability_value = native_probabilities[label_value as usize];
            writable_loss.push(-probability_value);
        }

it should be

        let mut offset = 0;
        for &label_value in native_labels {
            let probability_value = native_probabilities[offset + label_value as usize];
            writable_loss.push(-probability_value);
            offset += self.num_classes;
        }

Otherwise we're comparing all labels in the batch with the first output from the batch?

Interesting that I've tried changing it and there were no noticeable effect on MNIST example...

The text was updated successfully, but these errors were encountered:

drahnr · 2021-10-29T12:55:10Z

@hweom sorry for the long delay, I see your point and I would appreciate a PR for this issue with a testcase. Happy to review. Practically this is not destructive, all values are summed in the following few lines, so some values get more weight and others are omitted.

Closes #150

hweom · 2021-10-30T23:47:38Z

Thanks for the answer. I'm still trying to wrap my head around the code. I think I understand why changing this part made no effect on the MNIST example -- since NLL layer is the last one, the output of it is not used anywhere.

Now I have a different question :) This definition of NLL that I found gives the formula:

y = -ln (x)

(here y is the output of the NLL layer and x is input for the correct class).

However, NegativeLogLikelihood::compute_output() does this:

y = -x

And similarly, the input gradient in NegativeLogLikelihood::compute_input_gradient() is computed like this:

for (batch_n, &label_value) in native_labels.iter().enumerate() {
    let index = (num_classes * batch_n) + label_value as usize;
    writable_gradient[index] = -1f32;
}

which is essentially ∂y/∂x = -1. This matches the way the output is computed, but doesn't quite match the definition of NLL.

FWIW, I tried changing compute_input_gradient() to compute the real NLL gradient with

writable_gradient[index] = -1.0 / native_probabilities[index].max(0.001);

(I didn't change compute_output() since it's not used, as mentioned above), but it actually caused training to stop converging.

Should the name of the layer be changed?

drahnr · 2021-10-31T07:29:16Z

That is correct. The implementation is pretty old and was done by the original authors of the predecessor if juice.

The only explanation I have is when looking at the expansion of ln(x) which is

ln(x) ~= \sum_{n=0}^{\inf} - x + (1/2)x^2 - (1/3)x^3 + (1/4)x^4 + ..`

now if you terminate after the first term, you get the above. This saves a bunch of computations - ln is still rather slow, but at least two or 3 terms wouldn't hurt, but that would come at a quite significant additional cost of doing 3 more multiplications, a lock call, and an array lookup in the gradient compute.

I added a commit that would improve the approximation up to x^3 in #151

hweom · 2021-10-31T20:14:18Z

Is this a Taylor expansion of ln(x)? At which point? It doesn't look like one: example.

I'm probably missing something obvious...

drahnr · 2021-10-31T21:12:28Z

https://math.stackexchange.com/questions/585154/taylor-series-for-logx says almost the above, an offset of -1 or respectively +1 is missing, do I guess that's another issue

hweom · 2021-10-31T21:18:38Z

You mean the most upvoted answer (https://math.stackexchange.com/a/585158)? Note that it has a formula for -ln(1-x), not ln(x).

drahnr · 2021-10-31T21:21:23Z

That's what I meant in my previous comment

hweom · 2021-10-31T21:33:41Z

Well, if we take the Taylor expansion suggested by Wolfram:
$ln(x) = (x-1)-\frac{1}{2}(x-1)^2+\frac{1}{3}(x-1)^3$

and simplify it, we'll get this (https://www.wolframalpha.com/input/?i=expand+-%28%28x-1%29+-+1%2F2%28x-1%29%5E2+%2B+1%2F3%28x-1%29%5E3%29)

$-ln(x) \approx -\frac{x^3}{3} + \frac{3x^2}{2}-3x + \frac{11}{6}$

Or can just add the offset and keep the math cleaner.

drahnr · 2021-11-29T16:30:08Z

I think the offset is not really relevant for learning.

Closes #150

hweom added documentation Documentation related troubles question labels Oct 25, 2021

hweom assigned drahnr Oct 25, 2021

drahnr added a commit that referenced this issue Oct 30, 2021

fix: negative log liekelihood missed an offset

4bf7dda

Closes #150

drahnr linked a pull request Oct 30, 2021 that will close this issue

fixup nll #151

Open

4 tasks

drahnr added a commit that referenced this issue Sep 12, 2022

fix: negative log liekelihood missed an offset

e306708

Closes #150

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug in NegativeLogLikelihood impl #150

Potential bug in NegativeLogLikelihood impl #150

hweom commented Oct 25, 2021

drahnr commented Oct 29, 2021 •

edited

hweom commented Oct 30, 2021

drahnr commented Oct 31, 2021 •

edited

hweom commented Oct 31, 2021

drahnr commented Oct 31, 2021 •

edited

hweom commented Oct 31, 2021

drahnr commented Oct 31, 2021

hweom commented Oct 31, 2021 •

edited

drahnr commented Nov 29, 2021 •

edited

Potential bug in NegativeLogLikelihood impl #150

Potential bug in NegativeLogLikelihood impl #150

Comments

hweom commented Oct 25, 2021

drahnr commented Oct 29, 2021 • edited

hweom commented Oct 30, 2021

drahnr commented Oct 31, 2021 • edited

hweom commented Oct 31, 2021

drahnr commented Oct 31, 2021 • edited

hweom commented Oct 31, 2021

drahnr commented Oct 31, 2021

hweom commented Oct 31, 2021 • edited

drahnr commented Nov 29, 2021 • edited

drahnr commented Oct 29, 2021 •

edited

drahnr commented Oct 31, 2021 •

edited

drahnr commented Oct 31, 2021 •

edited

hweom commented Oct 31, 2021 •

edited

drahnr commented Nov 29, 2021 •

edited