Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for generated columns when conditional sampling #1994

Open
srinify opened this issue May 7, 2024 · 0 comments
Open

Add support for generated columns when conditional sampling #1994

srinify opened this issue May 7, 2024 · 0 comments
Labels
feature request Request for a new feature feature:sampling Related to generating synthetic data after a model is built

Comments

@srinify
Copy link
Contributor

srinify commented May 7, 2024

Problem Description

Every column that the SDV synthesizes falls into 1 of 2 buckets:

  • Modeled Columns: The data in these columns are modeled, eg. numerical, datetime, boolean or categorical data
  • Generated Columns: The data in these columns are generated from scratch without modeling, etc. primary keys, PII values

Currently, you can't conditionally sample using ID, primary key, or other generated columns.

Expected behavior

As a user, I expect to be able to conditionally sample on any column(s) I see fit.

Additional context

I expect the following code to work:

import pandas as pd
from sdv.single_table import GaussianCopulaSynthesizer
from sdv.datasets.demo import download_demo

data, metadata = download_demo(
    modality='single_table',
    dataset_name='census_extended'
)

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.fit(data)
synthesizer.sample_remaining_columns(data[['id', 'workclass']].head(10))

Related to this issue: #1096

@srinify srinify added feature request Request for a new feature feature:sampling Related to generating synthetic data after a model is built labels May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature feature:sampling Related to generating synthetic data after a model is built
Projects
None yet
Development

No branches or pull requests

1 participant