Initialize independent and dependent caches separately in ARNN #1656

wdphy16 · 2023-11-24T16:12:06Z

As the PR for RNN is merged, now I can start upstreaming the changes in my branch for MPS-RNN into the master branch, and I'd like to split them into a few small and orthogonal PRs.

The motivation for this PR is, while the caches in ARNN are usually independent of the model parameters (such as currently implemented fast AR sampling caches and RNN memories), some of them actually depend on the parameters (such as the gamma in MPS-RNN). The gamma is the partial contraction of the MPS, and it does not change during the AR sampling procedure, so we want to precompute it before the sampling, rather than recompute it in every AR sampling step.

The independent caches need to be initialized before providing the variables, while the dependent caches need to be initialized after Module.setup(). Now the user can separately override AbstractARNN._init_independent_cache and _init_dependent_cache for them. An example usage is in https://github.com/cqsl/mps-rnn/blob/master/models/mps.py .

Note that model.init_cache is called like model.init, rather than model.apply(..., method=model.init_cache).

codecov · 2023-11-24T16:40:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (90c9f4f) 82.24% compared to head (1c5de8a) 82.24%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1656   +/-   ##
=======================================
  Coverage   82.24%   82.24%           
=======================================
  Files         291      291           
  Lines       17834    17842    +8     
  Branches     3481     3482    +1     
=======================================
+ Hits        14667    14675    +8     
  Misses       2495     2495           
  Partials      672      672

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

PhilipVinc

@wdphy16 So this PR moves the responsibility of constructing the cache from the sampler itself to the model. I suppose this makes sense.

This also goes from having 1 'override point' to 2 different ones. Does this make sense? Can't we simply have one single init cache that does depend on the parameters, and expose a single 'extension point' to the user instead of 2?

Also, is there a reason for not following the standard flax philosophy of model.apply(...)? If I am not mistaken this would make it easier to define and interweave the cache with the code?

Regardless, I would ask that with this PR we have a documentation page (could be a ipynb file + a lot of discussion) explaining how to use this and giving at least one example of implementation.
Otherwise this is an 'hidden feature' that benefits no-one but you, and if someone discovers it will probably need to ask you anyway how to use it.

I would like to avoid having lots of those 'hidden feature

gcarleo · 2023-12-13T08:30:40Z

I wonder if the caching mechanism could be generalized to other samplers, for example there are networks that benefit from precomputed quantities in case you do only local updates...

wdphy16 · 2023-12-13T09:13:04Z

If there is no AR sampler I guess the straightforward choice is to save those precomputed quantities in SamplerState, and actually I think it's also possible to move all the AR sampling caches to SamplerState

A benefit of SamplerState is that it's not reset unless MCState.variables changes, so if we do sampling multiple times without changing variables, we don't need to initialize the cache multiple times

But we still need an interface of the model to provide the initial cache to the sampler

PhilipVinc · 2023-12-13T09:36:01Z

It’s not going to be easy and I would leave this effort aside for a future attempt.We would have to rewrite all layers to define a cache and how to use it.And while this works well for the first layer of a nn I’m not sure this helps in the following layers?Il giorno 13 dic 2023, alle ore 10:13, Dian Wu ***@***.***> ha scritto: If there is no AR sampler I guess the straightforward choice is to save those precomputed quantities in SamplerState, and actually I think it's also possible to move all the AR sampling caches to SamplerState A benefit of SamplerState is that it's not reset unless MCState.variables changes, so if we do sampling multiple times without changing variables, we don't need to initialize the cache multiple times But we still need an interface of the model to provide the initial cache to the sampler —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

wdphy16 · 2023-12-13T10:05:27Z

Looking at it again after these months, now I think it's not very useful to define _init_dependent_cache, as the user can just initialize the dependent cache in conditional (if he knows the trick)

For a more general and trickless way to initialize the cache, may be we can discuss that when we have more use cases

wdphy16 added 2 commits November 24, 2023 16:12

Add AbstractARNN._init_cache

ef8e4d5

Add doc

33beee1

gcarleo requested a review from Z-Denis December 7, 2023 11:18

Z-Denis approved these changes Dec 7, 2023

View reviewed changes

Merge branch 'master' into init_cache

1c5de8a

PhilipVinc reviewed Dec 7, 2023

View reviewed changes

wdphy16 marked this pull request as draft December 13, 2023 10:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialize independent and dependent caches separately in ARNN #1656

Initialize independent and dependent caches separately in ARNN #1656

wdphy16 commented Nov 24, 2023

codecov bot commented Nov 24, 2023 •

edited

PhilipVinc left a comment

gcarleo commented Dec 13, 2023

wdphy16 commented Dec 13, 2023

PhilipVinc commented Dec 13, 2023 via email

wdphy16 commented Dec 13, 2023

Initialize independent and dependent caches separately in ARNN #1656

Are you sure you want to change the base?

Initialize independent and dependent caches separately in ARNN #1656

Conversation

wdphy16 commented Nov 24, 2023

codecov bot commented Nov 24, 2023 • edited

Codecov Report

PhilipVinc left a comment

Choose a reason for hiding this comment

gcarleo commented Dec 13, 2023

wdphy16 commented Dec 13, 2023

PhilipVinc commented Dec 13, 2023 via email

wdphy16 commented Dec 13, 2023

codecov bot commented Nov 24, 2023 •

edited