Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Explained Variance #1037

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from
Open

[WIP] Explained Variance #1037

wants to merge 11 commits into from

Conversation

bbengfort
Copy link
Member

@bbengfort bbengfort commented Feb 11, 2020

This PR fixes #316 and extends #954 to get it wrapped up.

I have made the following changes:

  1. Added documentation and tests
  2. Refactored the quick method according to the quick method sprint

Sample Code and Plot

If you are adding or modifying a visualizer, PLEASE include a sample plot here along with the code you used to generate it.

TODOs and questions

Still to do:

CHECKLIST

  • Is the commit message formatted correctly?
  • Have you noted the new functionality/bugfix in the release notes of the next release?
  • Included a sample plot to visually illustrate your changes?
  • Do all of your functions and methods have docstrings?
  • Have you added/updated unit tests where appropriate?
  • Have you updated the baseline images if necessary?
  • Have you run the unit tests using pytest?
  • Is your code style correct (are you using PEP8, pyflakes)?
  • Have you documented your new feature/functionality in the docs?
  • Have you built the docs using make html?

@bbengfort
Copy link
Member Author

@naresh-bachwani prototype version:

Explained Variance Prototype

@naresh-bachwani
Copy link
Contributor

Hey there @bbengfort,
The above plot looks quite good! Few features that we can add:

  1. We have a parameter cutoff which takes in the threshold value and returns the number of component corresponding to it. We can also have a reverse scenario where we can pass a parameter n_features which returns the explained variance by that number of features. Although, I am not sure if we have a use case corresponding to this.
  2. Also, In the above example, we do not get the exact number of features(it is somewhere between 20 and 30). It would be better if we can provide ticks corresponding to the cutoff line.

@BradKML
Copy link

BradKML commented Mar 12, 2022

Questions: Is 85% variance some kind of magical number similar to p=0.05, or "rule of thumb" correlation coefficients or effect size? Below is an example on some of the expected magic numbers in "rule of thumb" tables:

Correlation Cohen's d Angular p-value Odds Ratio
Very Strong > 0.8 > 2 < 15° < 0.001 > 10
Strong > 0.6 > 1.2 < 30° < 0.005 > 3
Moderate > 0.4 > 0.8 <45° < 0.01 > 1.5
Weak > 0.2 > 0.5 < 60° < 0.05 > 1.2
Very Weak > 0.1 >0.2 < 75° < 0.10 > 1.0
Negligible > 0 > 0 = 90° < 1 = 1

Reference: https://archive.fo/Bi6yF https://archive.fo/ourlZ https://archive.fo/hVKdt https://archive.fo/jQOX8 https://archive.fo/Xh9xk

@bbengfort
Copy link
Member Author

@BrandonKMLee - yes, exactly right, it's just a rule of thumb about how many components are needed to have a "very strong" correlation or "enough explained variance" similar to the table you provided or a p value of 0.05.

@gregparkes
Copy link

@bbengfort Thanks for the issue mention.

Looking at the above plot (which is pretty good) I would mention a few things.

  1. I'm not convinced about plotting both explained var and cumulative variance. From my experience I have only ever plotted cumulative explained variance as generally it's easier to visually gauge the rate at which you approach 1. A steep curve is good, flat curve is bad. It also opens up things like area-under-curve.
  2. Do we want this as a percentage (0-100) instead of (0-1)? People like percentages, or failing this we want to reference the fact that the value is an explained variance ratio/proportion/relative EV and not just "explained variance". These values have been normalized to 1 and not just the scaled singular values.
  3. I'm happy with a default of 85% choice for cumulative variance assuming this parameter can be changed

@BradKML
Copy link

BradKML commented May 22, 2022

Some alternative ideas:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Finish decomposition explained variance visualizer
4 participants