Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corner case where values are identical? #60

Open
microprediction opened this issue Oct 6, 2022 · 1 comment
Open

Corner case where values are identical? #60

microprediction opened this issue Oct 6, 2022 · 1 comment

Comments

@microprediction
Copy link
Contributor

I'm interested in the case where a variable takes on discrete values. I created tdigest notebook to illustrate what might be an interesting issue.

Suppose I have sampled many rolls of a die. If I add a tiny amount of noise then tdigest works just fine as a nice representation of the data, with quite an accurate cdf and percentiles.

However, if you run the same spreadsheet with HACK=False then only six centroids are created. This leads to gross inaccuracy in both cdf and percentiles.

I am wondering if there could be a trick here, in order for tdigest to be able to handle cases like this without my hack.

@CamDavidsonPilon
Copy link
Owner

hi @microprediction, I'm surprised it fails so bad for discrete data. I don't know a solution immediately, this will take some thought...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants