Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create standard dgp for metric aggregation #705

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

fverac
Copy link
Collaborator

@fverac fverac commented Nov 21, 2022

No description provided.

Copy link
Collaborator

@kbattocchi kbattocchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start but there are a few things that should be cleaned up.

@@ -93,3 +94,182 @@ def _process_ihdp_sim_data():
# Append a column of ones as intercept
X = np.insert(X, 0, np.ones(X.shape[0]), axis=1)
return T, X


class StandardDGP():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(very minor)

Suggested change
class StandardDGP():
class StandardDGP:

@@ -93,3 +94,182 @@ def _process_ihdp_sim_data():
# Append a column of ones as intercept
X = np.insert(X, 0, np.ones(X.shape[0]), axis=1)
return T, X


class StandardDGP():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring, especially since some arguments can be a function or a dict.

else: # else must be dict
if nuisance_Y is None:
nuisance_Y = {'support': self.d_x, 'degree': 1}
nuisance_Y['k'] = self.d_x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider asserting the expected type here, something like:

Suggested change
nuisance_Y['k'] = self.d_x
assert(isinstance(nuisance_Y, dict), f"nuisance_Y must be a callable or dict, but got {type(nuisance_Y)}")
nuisance_Y['k'] = self.d_x

(and likewise for the other similar arguments)


def gen_Y(self):
self.y_noise = np.random.normal(size=(self.n, self.d_y), scale=self.y_eps)
self.Y = self.theta(self.X) * self.y_of_t(self.T) + self.nuisance_Y(self.X) + self.y_noise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like y_of_t is really a treatment featurizer, is that right? If so, change the name accordingly.

else: # else must be dict
if nuisance_Y is None:
nuisance_Y = {'support': self.d_x, 'degree': 1}
nuisance_Y['k'] = self.d_x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting nuisance_Y['k'] is not strictly necessary since the default behavior of gen_nuisance will do this for you, but perhaps it's worth being explicit.

More importantly, making this assignment will mutate the dictionary that was passed in, which is possibly undesirable; probably better to do something like:

Suggested change
nuisance_Y['k'] = self.d_x
nuisance_Y = { **nuisance_Y, 'k':self.d_x }

which will create a new dictionary instead of altering the old one.

Furthermore, what if the dictionary already had a 'k' entry - do you really want to ignore and override it?

self.t_eps = t_eps

def gen_Y(self):
self.y_noise = np.random.normal(size=(self.n, self.d_y), scale=self.y_eps)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider allowing the user to specify a seed so that results are reproducible.

@@ -93,3 +94,182 @@ def _process_ihdp_sim_data():
# Append a column of ones as intercept
X = np.insert(X, 0, np.ones(X.shape[0]), axis=1)
return T, X


class StandardDGP():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably support generating and using W as well.

@@ -93,3 +94,182 @@ def _process_ihdp_sim_data():
# Append a column of ones as intercept
X = np.insert(X, 0, np.ones(X.shape[0]), axis=1)
return T, X


class StandardDGP():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably support using a user-specified featurizer on X.

@@ -93,3 +94,182 @@ def _process_ihdp_sim_data():
# Append a column of ones as intercept
X = np.insert(X, 0, np.ones(X.shape[0]), axis=1)
return T, X


class StandardDGP():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests to at least verify that the API can actually be used and that you get arrays of the right dimensions out, etc., even if there aren't good checks for the actual data (but perhaps there are useful checks there, too)

@@ -93,3 +94,182 @@ def _process_ihdp_sim_data():
# Append a column of ones as intercept
X = np.insert(X, 0, np.ones(X.shape[0]), axis=1)
return T, X


class StandardDGP():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect users to use this class to test things? If so, perhaps also add a DGP section to the documentation (and at least add it to the module docs). But if this is mainly for internal use, add a comment to the docstring indicating that instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants