Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Council vote: unconstrained variables #2056

Closed
OriolAbril opened this issue Jun 21, 2022 · 19 comments
Closed

Council vote: unconstrained variables #2056

OriolAbril opened this issue Jun 21, 2022 · 19 comments

Comments

@OriolAbril
Copy link
Member

#230 has been open for too long without any clear consensus. Following https://www.arviz.org/en/latest/governance/processes.html#council-decision-making-process I am opening this issue with a proposal on how to solve this and calling a vote of the RV council. cc @AlexAndorra @sethaxen @ColCarroll @ahartikainen @canyon289 @aloctavodia.

Proposal

Add a new unconstrained_posterior group (and other unconstrained_<group> as needed). I propose that the unconstrained representation of variables with a transformation (no need to duplicate those that are the same) are added to this group with the exact same name as they have in the posterior. The transform can be added as metadata either at variable level or as metadata of the unconstrained_... group and is left to the PPLs to add.

The whole unconstrained posterior can therefore be obtained by taking the variables in unconstrained_posterior and adding those missing using their values form the posterior group.

@OriolAbril OriolAbril added Discussion Issue open for discussion, still not ready for a PR on it Governance labels Jun 21, 2022
@arviz-devs arviz-devs locked as resolved and limited conversation to collaborators Jun 21, 2022
@OriolAbril
Copy link
Member Author

OriolAbril commented Jun 21, 2022

Note: We now enter the process of asking for clarifications on the proposal's wording, I am locking to prevent people not familiar with the governance process to join the discussion. We should remove the discussion label in 3 days and vote.

Ref: https://www.arviz.org/en/latest/governance/processes.html#call-for-a-vote

Extra note: being this the first "council vote" issue, we should probably also take notes on how it goes and update the process if we think it can be improved now that we'll have tried it

@sethaxen
Copy link
Member

The transformation will be formulated as a transformation from the unconstrained to constrained variable, correct?

@OriolAbril
Copy link
Member Author

The proposal is to store transformed and untransformed samples and make this part of the inferencedata schema.

The formulation of the transformation and how to do it is left completely on the hands of the PPLs and the people writing the converters. It can be the actual function in either direction stored as metadata (probably won't work saving to netcdf or zarr then but metadata can be excluded from the writing process, and would work while using the results after sampling); it can be a keyword that identifies the transform in that PPL, it might also be possible to combine both and use a function like turing_transforms_to/from_metadata...

@sethaxen
Copy link
Member

Cool, then I vote yes on this proposal.

@ahartikainen
Copy link
Contributor

ahartikainen commented Jun 21, 2022

What groups might have this property?

  • posterior
  • posterior_predictive (??)
  • predictions (??)
  • prior
  • prior_predictive (??)

@ColCarroll
Copy link
Member

I think the idea is that this would be one or two new groups:

  • posterior_transformed
  • prior_transformed (maybe -- I don't know why I would want that, or to what end I would use it)

I agree that we should leave this mostly up to the PPLs to implement, but as a suggestion, we might have a posterior:

sd ~ HalfNormal(...)
mu ~ Normal(...)
likelihood ~ Normal(data | mu, sd)

then I would suggest that

  1. idata.posterior has variables sd and mu
  2. idata.posterior_transformed has variable sd
  3. idata.posterior_transformed has metadata including
"transforms": {"sd": "exp"}

@ColCarroll
Copy link
Member

Note that this path also means it'd be nice if we provide a utility function that "merges" posterior_transformed with posterior, which, if xr.DataSet was like a dictionary, would be just

def get_full_unconstrained_posterior(idata):
  return idata.posterior.update(idata.posterior_transformed)

@ahartikainen
Copy link
Contributor

Going to route of transforming only the subset of variables is ok.

We currently don't have functionality which could use this information, but in future we would need both posterior and transformed posterior for this functionality?

@ColCarroll
Copy link
Member

I think right now only #2040 would use this.

@OriolAbril
Copy link
Member Author

Please do not discuss alternatives to the proposal here. This issue is for voting the proposal only and it is locked to everyone but core contributors. Any discussion should happen (or should have already happened) in #230 which is open to all the ArviZ community.

I think right now only #2040 would use this.

Yes, this is currently blocking/dificulting #2040 which is why I called for a vote now. But why the vote was called doesn't really matter. From the governance:

Core Contributors can call for a vote to resolve a target issue they feel has been stale for too long and for which informal consensus appears unlikely. For a vote to be called, the target issue must be at least 2 months old.

The idea now is to make sure the proposal is clear and vote in order to unlock the stalemate.

What groups might have this property?

For now I only see uses to this for posterior and prior (assuming the priors have potentials or something that requires sampling them with hmc too in which case having unconstrained samples can help diagnose any issues that might arise). But the proposal is for the exact same pattern to apply to any other group if anyone finds a use for it.

I think the idea is that this would be one or two new groups:

Yes, there would be new groups called unconstrained_posterior, unconstrained_prior...

then I would suggest that

  • idata.posterior has variables sd and mu
  • idata.unconstrained_posterior has variable sd

This is what the proposal covers. All variables will have samples in the posterior group and only those with transformations will have the transformed samples also stored (optionally) in this new group. Yes, PPLs can add metadata as they want. But that won't be part of the inferencedata schema, only the storage of samples in both spaces when available.

Note that this path also means it'd be nice if we provide a utility function that "merges" posterior_transformed with posterior, which, if xr.DataSet was like a dictionary, would be just ...

This is already possible. Datasets have a dict like interface. But outside of the scope of the proposal. Once this is unlocked, the default will be to continue deciding things on a consensus basis in issues and PRs without the council voting.

@ahartikainen
Copy link
Contributor

I vote for yes

@canyon289
Copy link
Member

Quick question

Oriols comment above says

Note: We now enter the process of asking for clarifications on the proposal's wording, I am locking to prevent people not familiar with the governance process to join the discussion. We should remove the discussion label in 3 days and vote.

Is issue for rewording an accepted idea, or voting on the implementing the idea itself?

@OriolAbril
Copy link
Member Author

The main goal of the issue is voting on the proposal described. However, to make sure we are all on the same page and are voting on the same thing. There is a 3 day period where clarifications about what the proposal entails can (and should) be asked. We are currently in this part of the process:

Before voting starts, at least 3 days will be left for Core Contributors to raise doubts about the proposal’s phrasing, no extra discussion will take place in the proposal issue. Proposal issues should be locked from creation to prevent attracting discussion from people not familiar with the decision process.

The idea is definitely not accepted yet, but we are using the issue to "reword" (but not modify) the proposal until is is phrased in a clear and unambiguous manner for everyone

@canyon289
Copy link
Member

Ok I vote yes,

I dont see any downsides as it doesnt hurt to have extra groups

@OriolAbril OriolAbril removed the Discussion Issue open for discussion, still not ready for a PR on it label Jun 26, 2022
@aloctavodia
Copy link
Contributor

Yes

@OriolAbril
Copy link
Member Author

I also vote yes

@AlexAndorra
Copy link
Contributor

Yep, love this!

@ColCarroll
Copy link
Member

yes

@OriolAbril
Copy link
Member Author

The vote has passed 🎉 , I'll close the issue and send a PR with the changes to the schema soonish. Updating of the converters will come later

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants