Skip to content

Computation of training set XTX and XTY in a cross-validation setting using the fast algorithms by Engstrøm (2024).

License

Notifications You must be signed in to change notification settings

Sm00thix/CVMatrix

Repository files navigation

CVMatrix

PyPI Version

PyPI - Downloads

Python Versions

License

Documentation Status

Build Status

The cvmatrix package implements the fast algorithms by Engstrøm [1] for computation of training set $\mathbf{X}^{\mathbf{T}}\mathbf{X}$ and $\mathbf{X}^{\mathbf{T}}\mathbf{Y}$ in a cross-validation setting. In addition to correctly handling arbitrary row-wise pre-processing, the algorithms allow for and efficiently and correctly handle any combination of column-wise centering and scaling of X and Y based on training set statistics.

For an implementation of the fast cross-validation algorithms combined with Improved Kernel Partial Least Squares [2], see the Python package ikpls.

Installation

  • Install the package for Python3 using the following command:

    pip3 install cvmatrix
  • Now you can import the class implementing all the algorithms with:

    from cvmatrix.cvmatrix import CVMatrix

Quick Start

Use the cvmatrix package for fast computation of training set kernel matrices

import numpy as np
from cvmatrix.cvmatrix import CVMatrix

N = 100  # Number of samples.
K = 50  # Number of features.
M = 10  # Number of targets.

X = np.random.uniform(size=(N, K)) # Random X data
Y = np.random.uniform(size=(N, M)) # Random Y data
cv_splits = np.arange(100) % 5 # 5-fold cross-validation

# Instantiate CVMatrix
cvm = CVMatrix(
    cv_splits=cv_splits,
    center_X=True,
    center_Y=True,
    scale_X=True,
    scale_Y=True,
)
# Fit on X and Y
cvm.fit(X=X, Y=Y)
# Compute training set XTX and/or XTY for each fold
for val_split in cvm.val_folds_dict.keys():
    # Get both XTX and XTY
    training_XTX, training_XTY = cvm.training_XTX_XTY(val_split)
    # Get only XTX
    training_XTX = cvm.training_XTX(val_split)
    # Get only XTY
    training_XTY = cvm.training_XTY(val_split)

Examples

In examples, you will find:

Benchmarks

In benchmarks, we have benchmarked the fast algorithms in cvmatrix against the straight-forward, naive algorithms implemented in NaiveCVMatrix.


Left: Benchmarking the CVMatrix implementation versus the straight-forward, naive implementation (NaiveCVMatrix) using three common combinations of centering and scaling. Right: Benchmarking the CVMatrix implementation for all possible combinations of centering and scaling.

Contribute

To contribute, please read the Contribution Guidelines.

References

  1. Engstrøm, O.-C. G. (2024). Shortcutting Cross-Validation: Efficiently Deriving Column-Wise Centered and Scaled Training Set $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$ Without Full Recomputation of Matrix Products or Statistical Moments
  2. Dayal, B. S., & MacGregor, J. F. (1997). Improved PLS algorithms. Journal of Chemometrics, 11(1), 73-85.