Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question for Histogram.probabilityFor #85

Open
pipi32167 opened this issue Aug 31, 2022 · 2 comments
Open

Question for Histogram.probabilityFor #85

pipi32167 opened this issue Aug 31, 2022 · 2 comments

Comments

@pipi32167
Copy link

I just curious why use max of bins as divisor instead of sum of bins?

LocoKit/Timelines/ActivityTypes/Histogram.swift

    public func probabilityFor(_ value: Double) -> Double {
        guard let max = bins.max() else {
            return 0
        }
       
        // shouldn't be possible. but... 
        guard !binWidth.isNaN else {
            return 0
        }
        
        // single bin histograms result in binary 0 or 1 scores
        if bins.count == 1 {
            return value == range.min ? 1 : 0
        }
        
        let bin: Int
        if value == range.max {
            bin = bins.count - 1
        } else {
            let binDouble = floor((value - range.min) / binWidth)
            
            if binDouble > Double(bins.count - 1) {
                return 0
            }
            
            guard !binDouble.isNaN && binDouble > Double(Int.min) && binDouble < Double(Int.max) else {
                return 0
            }
            
            bin = binWidth > 0 ? Int(binDouble) : 0
        }
        
        guard bin >= 0 && bin < bins.count else {
            return 0
        }
        
        return (Double(bins[bin]) / Double(max)).clamped(min: 0, max: 1)
    }
@sobri909
Copy link
Owner

Good question! I'm actually not sure why. I wish I'd commented that code 😬 I agree, it does look wrong, and seems like surely it should be bins.sum.

But I just tried changing it to bins.sum and it produces erroneous classifier results. I suspect because the histograms don't have fixed bin counts (bin widths are calculated by FD rule). To use sum each ActivityType would need the same fixed bin counts (and maybe same fixed bin widths).

Hmm.

@sobri909
Copy link
Owner

Either way, that method name is clearly wrong. It's not returning the probability at all. So I'll at least fix that now.

I've now also got a strong urge to do a deep dive on fixing the classifiers to produce mathematically correct results, not just intuitively correct. But all this custom classifier code is already supposed to be deprecated and replaced by a Core ML based implementation, now that Core ML is mature enough to take over. I shouldn't be putting more time into it, when that time should be going to the Core ML based replacement. Sigh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants