Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Sparse cholesky #3069

Draft
wants to merge 2 commits into
base: develop
Choose a base branch
from
Draft

Conversation

dpsimpson
Copy link
Contributor

@dpsimpson dpsimpson commented May 15, 2024

Summary

Decided to dive in on the sparse cholesky. This is a WIP.

  • Implement prim sparse cholesky
  • Implement prim sparse triangular solves
  • Implement prim sparse triangular multiplies
  • Other things that a sparse triangular matrix should be able to do?
  • rev mode all things
  • Tests
  • Docs

The interesting wrinkle here is that

Eigen::SimplicialLLT<SpMat>(A) llt =  m.llt()

does not (without a lot of coercing) perform a Cholesky decomposition of A. It instead performs a Cholesky of A.twistedBy(perm), where perm is stored in the Eigen::SimplicialLLT<T> class. This means that it is not enough for cholesky_decompose to return

llt.matrixL()

as this will not be the matrix we are looking for. This is the problem with @SteveBronder's old branch

The two options here are to compute perm explicitly somewhere else and carry it around (I do not like this option). The other option is to carry treat Eigen::SimplicialLLT<T> as if it were the lower triangular matrix when performing triangular solves, multiplication, and anything else we might need. (those operations require knowledge of perm).

I'm not quite sure how my preferred option will work with autodiff - we might need a light specialization.

Tests

Coming

Side Effects

By adding a new, non-matrix Eigen type, we need to be very careful that none of the other template patterns match it. It should be fine, but care is needed.

Release notes

Replace this text with a short note on what will change if this pull request is merged in which case this will be included in the release notes.

Checklist

  • Copyright holder: Daniel Simpson

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@dpsimpson
Copy link
Contributor Author

argh sorry. didn't expect the CI to flag with a draft WIP PR.

@spinkney
Copy link
Collaborator

If you figure out the permutation thing I think that would also apply to the LDL decomposition because it does pivoting. I've been wanting to get that into stan-lang but I don't know how to differentiate and handle that issue.

@dpsimpson
Copy link
Contributor Author

Do you think it needs to be exposed at the language level? To me, things like pivoting are just implementation details that should be abstracted away. If this thing works with sparse matrices, it should be the same there too.

@spinkney
Copy link
Collaborator

I mean the derivatives are what's hard. At the language level no one needs to know the pivoting. I have an implementation of a cholesky factor of correlation matrices but it constructs the LD factor instead and removes the need for the square root. So users could have a ldl_factor_correlation that is a tuple object of a lower triangular matrix and a vector that holds the D diagonal. It makes sense to have the ability to construct an LDL tuple from a PD matrix too hence why I want the LDL factorization.

@dpsimpson
Copy link
Contributor Author

Permutation matrices are linear and have determinant one, so I don't think they affect the derivatives (it is 8am and I've not had coffee, so I might be wrong)

@dpsimpson
Copy link
Contributor Author

In my mind, this would not be terribly difficult to implement, but it would add the wrinkle that the vari type would now have a solver type for val_ and a matrix type for adj_. I'm not deep enough into the autodiff system to know if that would be a disaster (maybe for higher order? afaik the only place val_ and adj_ meet is in the chain methods, which are hand-written).

@SteveBronder
Copy link
Collaborator

In my mind, this would not be terribly difficult to implement, but it would add the wrinkle that the vari type would now have a solver type for val_ and a matrix type for adj_. I'm not deep enough into the autodiff system to know if that would be a disaster (maybe for higher order? afaik the only place val_ and adj_ meet is in the chain methods, which are hand-written).

The return type of the function has to be a matrix or tuple of matrices for the stan language, but you can store the actual solver to use in the reverse pass like we do here for mdivide_left_ldlt

https://github.com/stan-dev/math/blob/develop/stan/math/rev/fun/mdivide_left_ldlt.hpp#L42

@SteveBronder
Copy link
Collaborator

Also is the permutation matrix fixed for each iteration of the sampler? If so would we want a cholesky signature where the user passes in the permutation matrix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants