Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zoomable histograms #40

Open
spfrommer opened this issue Dec 23, 2023 · 2 comments
Open

Add zoomable histograms #40

spfrommer opened this issue Dec 23, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@spfrommer
Copy link
Owner

Not sure what's the best way to implement. There's a few difficult things to handle:

  • The binning: would probably have to do a super-fine binning (100s of bins), then aggregate in the vega interface dynamically using the bin transform. Might be slow. Could also do a tensorboard-style exponential binning. This could be a little tricky as for things like grad norm it's not obvious that we want many more bins around zero... Maybe around the median tensor value? As an alternative, could just do the tensorboard histogram view, which isn't zoomable.
    On reflection, I think exponential binning of some kind would be necessary. Usually you want to "zoom in" when there's very tightly clustered values around zero with a few outliers, and just linear binning wouldn't cut it.
  • Could no longer use the string histogram min/max vals, would have to be dynamic. Uglier formatting.
  • Would have to give up the "correctness" of the binning interpolation when going from exponential to linear scale.
@spfrommer spfrommer added the enhancement New feature or request label Dec 23, 2023
@spfrommer spfrommer self-assigned this Dec 23, 2023
@spfrommer
Copy link
Owner Author

On further reflection, this is probably the idea way:

  • Exponential binning in the backend around the median. No outlier rejection. Serialize the bin bounds as well for the frontend. -> Actually should maybe do around zero, since even nonzero things tend to go to zero pretty often?
  • In the front end, have a slider that controls the outlier rejection rate, and all the plots update dynamically.

Benefits:

  • Prevent user from "zooming in" on areas with little data

Issues:

  • Most users are going to do outlier rejection, resulting in ugly vega-formatted floats...
  • If something changes dramatically over training, the exponential binning is going to really hurt.
  • Not sure how to "rebin". I'm tempted to not do so at all, just keep adding bins as necessary. Shouldn't be too bad if exponentially scaled.
  • Not sure if we should linearize the bins in the vega interface...
  • We could totally solve this with a "smoothed" histogram, using something like: https://stackoverflow.com/questions/75885679/vega-lite-gradient-for-line-chart

@spfrommer
Copy link
Owner Author

spfrommer commented Dec 27, 2023

Unfortunately, "smooth" histograms ended up being too low-performance. So I think that any kind of client-side outlier rejection is out of the question.

I think now the only plausible way to do this is to pre-compute a few levels of outlier rejection in the backend, and add a drop-down box to select from these in the frontend. The rejection_outlier_proportion will take a Union of float and list of floats. For each float, a new histogram will be concatenated to the histogram strings, with an added text containing the outlier rejection percentage. The data will first be split at a high level, then a new dataset will parse out all the valid outlier percentages.

The outlier percentages will be rendered at the bottom of the explorer panel, as pretty much a slider with a few discrete nodes. There will have to be a maximum of say four nodes in order for this to work since vega doesn't handle dynamic gui elements very well.

The resulting signal goes back to the data, where the histograms are filtered before further processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant