Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Unified API #109

Open
wildart opened this issue Oct 10, 2019 · 7 comments
Open

RFC: Unified API #109

wildart opened this issue Oct 10, 2019 · 7 comments
Milestone

Comments

@wildart
Copy link
Collaborator

wildart commented Oct 10, 2019

Following #95, I looked at MV models/methods implemented in this package, trying to figure out what would be a type hierarchy and corresponding method interfaces for this package.

Here is a table of models and corresponding function names used by models.

Function \ Model CCA WHT ICA LDA FA PPCA PCA KPCA MDS
fit x x x x x x x x x
transform x x x x x x x x x
predict x
indim x x x x x x x x
outdim x x x x x x x x x
mean x x x x x x x ?
var x x ? ? ?
cov x ?
cor x
projection x x x x x x
reconstruct x x x x
loadings ? ? x x ? ? ?
eigvals ? ? ? ? x
eigvecs ? ? ? ? ?
length
size

I put ? where a possible implementation is missing or called differently.

So, I propose following type hierarchy

  • StatsBase.RegressionModel
    • Methods: CCA, LDA
    • Functions: fit, transfrom, indim, outdim, mean
    • Subtypes:
      • AbstractDimensionalityReduction
      • Functions: projection, var, reconstruct, loadings
      • Subtypes:
        • LinearDimensionalityReduction
          • Methods: ICA, PCA
        • NonlinearDimensionalityReduction
          • Methods: KPCA, MDS
        • LatentVariableModel or LatentVariableDimensionalityReduction
          • Methods: FA, PPCA
          • Functions: cov
  • StatsBase.AbstractDataTransform
    • Whitening
    • Functions: fit, transfrom, indim, outdim, mean, size

@nalimilan @ararslan Thoughts?

@ararslan
Copy link
Member

That makes sense to me. Might be nice to have an abstract dimensionality reduction type in there that linear, nonlinear, and latent variable types can subtype.

@wildart
Copy link
Collaborator Author

wildart commented Oct 10, 2019

Might be nice to have an abstract dimensionality reduction type in there that linear, nonlinear, and latent variable types can subtype.

That would be AbstractDimensionalityReduction

@ararslan
Copy link
Member

Whoops, don't know how I missed that...

@kescobo
Copy link
Contributor

kescobo commented Oct 19, 2019

This seems great to me.

As my primary interest in this is for plotting, one thing I'd like to know is whether there's a common method for obtaining a vector that would be used in a plot. I'm not super knowledgeable about the terminology, but I think different things are commonly plotted for different dimensionality reductions. For MDS and PCA (I think), one is supposed to plot the eigenvectors scaled by the square of the eigenvalue.

But finding information on this has been a bit challenging for me, not knowing all of the jargon.

@wildart
Copy link
Collaborator Author

wildart commented Oct 20, 2019

Loadings are scaled eigenvectors. It will be easy to add them to every eigendecomposition-based method.

@nalimilan
Copy link
Member

Sounds like a good idea. Is the LinearDimensionalityReduction vs. NonlinearDimensionalityReduction useful? I guess it doesn't hurt, but in your plan it doesn't really make a difference AFAICT.

Also, shouldn't PCA implement loadings?

@kescobo
Copy link
Contributor

kescobo commented Oct 21, 2019

Fantastic. What about things like LDA and CCA? I've definitely seen those plotted, but your schema above doesn't have loadings for those, cf.

I know this is somewhat orthogonal, I can open a separate issue if that would be useful. In any case, having unified APIs for this stuff will be fantastic.

@wildart wildart added this to the v1.0.0 milestone Feb 25, 2021
wildart added a commit to wildart/MultivariateStats.jl that referenced this issue Feb 25, 2021
wildart added a commit to wildart/MultivariateStats.jl that referenced this issue Feb 25, 2021
wildart added a commit to wildart/MultivariateStats.jl that referenced this issue Feb 25, 2021
wildart added a commit to wildart/MultivariateStats.jl that referenced this issue Mar 11, 2021
wildart added a commit to wildart/MultivariateStats.jl that referenced this issue Mar 13, 2021
wildart added a commit to wildart/MultivariateStats.jl that referenced this issue Jun 1, 2021
wildart added a commit that referenced this issue Jun 1, 2021
* refactor whitening for closer integration with StatsBase types (part of #109)
* deprecate `indim` & `outdim`
@wildart wildart pinned this issue Oct 20, 2021
wildart added a commit to wildart/MultivariateStats.jl that referenced this issue Jan 14, 2022
wildart added a commit to wildart/MultivariateStats.jl that referenced this issue Jan 17, 2022
wildart added a commit to wildart/MultivariateStats.jl that referenced this issue Jan 17, 2022
wildart added a commit that referenced this issue Jan 17, 2022
* Refactor MDS code and docs (for #109)
* fix typos and update Documenter
* fix plotting in CI
wildart added a commit to wildart/MultivariateStats.jl that referenced this issue Jan 17, 2022
wildart added a commit that referenced this issue Jan 18, 2022
* refactored regression code and docs
* added int to float data conversion, and methods for vector-type regressors
* added isotonic regression
* added docs for regression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants