[WIP] Explained Variance #1037

bbengfort · 2020-02-11T00:46:39Z

This PR fixes #316 and extends #954 to get it wrapped up.

I have made the following changes:

Added documentation and tests
Refactored the quick method according to the quick method sprint

Sample Code and Plot

If you are adding or modifying a visualizer, PLEASE include a sample plot here along with the code you used to generate it.

TODOs and questions

Still to do:

Integrate [WIP] Added features to Decomposition.py #954
Update visualizer
Write documentation
Add better tests

CHECKLIST

Is the commit message formatted correctly?
Have you noted the new functionality/bugfix in the release notes of the next release?

Included a sample plot to visually illustrate your changes?
Do all of your functions and methods have docstrings?
Have you added/updated unit tests where appropriate?
Have you updated the baseline images if necessary?
Have you run the unit tests using pytest?
Is your code style correct (are you using PEP8, pyflakes)?
Have you documented your new feature/functionality in the docs?

Have you built the docs using make html?

…owbrick into naresh-bachwani-decomposition

bbengfort · 2020-02-25T23:25:36Z

Resources:

bbengfort · 2020-02-25T23:29:06Z

@naresh-bachwani prototype version:

naresh-bachwani · 2020-03-04T13:16:18Z

Hey there @bbengfort,
The above plot looks quite good! Few features that we can add:

We have a parameter cutoff which takes in the threshold value and returns the number of component corresponding to it. We can also have a reverse scenario where we can pass a parameter n_features which returns the explained variance by that number of features. Although, I am not sure if we have a use case corresponding to this.
Also, In the above example, we do not get the exact number of features(it is somewhere between 20 and 30). It would be better if we can provide ticks corresponding to the cutoff line.

BradKML · 2022-03-12T14:18:28Z

Questions: Is 85% variance some kind of magical number similar to p=0.05, or "rule of thumb" correlation coefficients or effect size? Below is an example on some of the expected magic numbers in "rule of thumb" tables:

	Correlation	Cohen's d	Angular	p-value	Odds Ratio
Very Strong	> 0.8	> 2	< 15°	< 0.001	> 10
Strong	> 0.6	> 1.2	< 30°	< 0.005	> 3
Moderate	> 0.4	> 0.8	<45°	< 0.01	> 1.5
Weak	> 0.2	> 0.5	< 60°	< 0.05	> 1.2
Very Weak	> 0.1	>0.2	< 75°	< 0.10	> 1.0
Negligible	> 0	> 0	= 90°	< 1	= 1

Reference: https://archive.fo/Bi6yF https://archive.fo/ourlZ https://archive.fo/hVKdt https://archive.fo/jQOX8 https://archive.fo/Xh9xk

bbengfort · 2022-05-21T18:33:37Z

@BrandonKMLee - yes, exactly right, it's just a rule of thumb about how many components are needed to have a "very strong" correlation or "enough explained variance" similar to the table you provided or a p value of 0.05.

gregparkes · 2022-05-21T21:51:15Z

@bbengfort Thanks for the issue mention.

Looking at the above plot (which is pretty good) I would mention a few things.

I'm not convinced about plotting both explained var and cumulative variance. From my experience I have only ever plotted cumulative explained variance as generally it's easier to visually gauge the rate at which you approach 1. A steep curve is good, flat curve is bad. It also opens up things like area-under-curve.
Do we want this as a percentage (0-100) instead of (0-1)? People like percentages, or failing this we want to reference the fact that the value is an explained variance ratio/proportion/relative EV and not just "explained variance". These values have been normalized to 1 and not just the scaled singular values.
I'm happy with a default of 85% choice for cumulative variance assuming this parameter can be changed

BradKML · 2022-05-22T00:01:33Z

Some alternative ideas:

Visuals for K1 or Kaiser criterion https://en.wikipedia.org/wiki/Exploratory_factor_analysis#Kaiser_criterion
The "broken stick" method https://blogs.sas.com/content/iml/2017/08/02/retain-principal-components.html

naresh-bachwani and others added 4 commits August 22, 2019 23:34

Added a few features

3a9dece

Merge branch 'develop' into decomposition

e4a634b

explained variance stubs

7ae23fe

Merge branch 'decomposition' of git://github.com/naresh-bachwani/yell…

45c621f

…owbrick into naresh-bachwani-decomposition

bbengfort mentioned this pull request Feb 12, 2020

[WIP] Added features to Decomposition.py #954

Closed

13 tasks

Merge branch 'develop' into explained-variance

9603f1f

first prototype of explained variance

93730dd

Merge branch 'develop' into explained-variance

bfe6741

bbengfort added 4 commits March 5, 2020 18:57

typo fixes, changelog update

4d27883

remove old decomposition method

043f882

minor cleanup

6ec08c8

Merge branch 'develop' into explained-variance

188aff1

bbengfort mentioned this pull request Mar 12, 2022

Scree Plot for Pricipal Component Analysis #804

Closed

bbengfort mentioned this pull request May 21, 2022

Include explained variance in PCA component plot #1239

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Explained Variance #1037

[WIP] Explained Variance #1037

bbengfort commented Feb 11, 2020 •

edited

bbengfort commented Feb 25, 2020

bbengfort commented Feb 25, 2020

naresh-bachwani commented Mar 4, 2020

BradKML commented Mar 12, 2022 •

edited

bbengfort commented May 21, 2022

gregparkes commented May 21, 2022

BradKML commented May 22, 2022

[WIP] Explained Variance #1037

Are you sure you want to change the base?

[WIP] Explained Variance #1037

Conversation

bbengfort commented Feb 11, 2020 • edited

Sample Code and Plot

TODOs and questions

CHECKLIST

bbengfort commented Feb 25, 2020

bbengfort commented Feb 25, 2020

naresh-bachwani commented Mar 4, 2020

BradKML commented Mar 12, 2022 • edited

bbengfort commented May 21, 2022

gregparkes commented May 21, 2022

BradKML commented May 22, 2022

bbengfort commented Feb 11, 2020 •

edited

BradKML commented Mar 12, 2022 •

edited