Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rigorous treatment effect estimation #59

Open
2 of 11 tasks
RobertTLange opened this issue Apr 21, 2021 · 2 comments
Open
2 of 11 tasks

Rigorous treatment effect estimation #59

RobertTLange opened this issue Apr 21, 2021 · 2 comments
Assignees
Labels
core-func Core functionality enhancement New feature or request

Comments

@RobertTLange
Copy link
Collaborator

RobertTLange commented Apr 21, 2021

The original motivation behind the toolbox has been to put an emphasis on biological study design and to provide a new perspective of analysing neural networks (inspired by Jon Frankle). This means replicable experiment pipelines (git hash, random seeds, configuration cloning, protocol logging, etc.). This has mainly been implemented by now.

But another super useful perspective comes from Economics and the evaluation of treatments (Thx Nandan!). In ML most result presentations do not show any significance levels/results. Often the learning curve confidence intervals even overlap. The entire process is not scientific and can benefit from tools from Econ such as diff-in-diff style experiments and simply being rigorous about the effect you are trying to sell. I want to incorporate this into the toolbox and make it part of my daily research routine for testing scientific ML hypothesis!

In the near future:

  • Add set of effect estimators (for now frequentist): Linear model, different standard errors and p-value extraction.
  • Incorporate into gridsearch/postprocessing routines for a specific metric and test.
  • Add correction for multiple testing (Bonferroni, False Discovery - look at Omiros' assignment).
  • Add difference-in-difference estimation setup: Compute effects for different timepoints.
  • Add plot utilities for visualizing significance over time/overall.

Note I: I think that it makes sense to add a causality subdirectory, which collects the different tests and can easily be extended. E.g. starting with a base class TreatmentTest, we can have different models standard error and correction formulations.

Note II: Many of these features will also be needed in the population-based training pipeline, e.g. Wald t-test for deciding whether two sampled population members are performing better/worse.

In the distant future:

  • Add experiment type that performs "an intervention" (e.g. learning rate change) at specific timepoint and estimates treatment effect over time.
  • Add support for Nandan's 'virtual-labs': Automatic spawning of more random seeds based on variance estimate of the effect coefficient. Fix a budget of total jobs and jobs per iteration.
  • Add Bayesian effect modelling with credible intervals. Get full posterior over the effect coefficient.

Things to Think About

  • Non-gaussianity/Bootstrap (param/non-param) standard error/Fisher exact p-values/permutation tests
  • Instrumental variables
  • Different timepoints of treatment and tracking their timeseries - e.g. when to adjust lrate?
@RobertTLange RobertTLange self-assigned this Apr 21, 2021
@RobertTLange RobertTLange added core-func Core functionality enhancement New feature or request labels Apr 21, 2021
@RobertTLange
Copy link
Collaborator Author

@RobertTLange
Copy link
Collaborator Author

I would love to add rliable here. Many of the comparisons/test techniques would be great to have for benchmarking. These include:

  • Stratified Bootstrap Confidence Intervals
  • Score Distributions
  • Interquartile Mean

I would recommend installation and importing metrics/bootstrap procedure. This could automatically be calculated on the hyper_log of a search or only be provided offline afterwards. See what works best with benchmark project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-func Core functionality enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant