Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update code to remove pandas FutureWarning messages that's displayed for each row during conditional sampling #1996

Open
srinify opened this issue May 7, 2024 · 0 comments
Labels
feature:sampling Related to generating synthetic data after a model is built internal The issue doesn't change the API or functionality

Comments

@srinify
Copy link
Contributor

srinify commented May 7, 2024

Environment Details

SDV 1.12

Problem Description

In our current implementation, conditional sampling thousands of rows generates thousands of pandas FutureWarning messages. This can actually crash Jupyter Notebook / Lab webpages sometimes (as I've experienced).

FutureWarning: The behavior of Series.replace (and DataFrame.replace) with CategoricalDtype is deprecated. 
In a future version, replace will only be used for cases that preserve the categories. 
To change the categories, use ser.cat.rename_categories instead.

result = result.replace(nan_name, np.nan)
Screenshot 2024-05-07 at 10 57 04 AM

Expected behavior

That we update the pandas methods we're using so these warnings aren't displayed.

Steps to Reproduce

import pandas as pd
from sdv.single_table import GaussianCopulaSynthesizer
from sdv.datasets.demo import download_demo

data, metadata = download_demo(
    modality='single_table',
    dataset_name='census_extended'
)

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.fit(data)
synthetic_data = synthesizer.sample_remaining_columns(data[['sex', 'income']].head(10))

Workaround

As an SDV user, you can squash these warnings for now while we resolve the underlying issue:

# Run before importing pandas
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import pandas as pd
@srinify srinify added internal The issue doesn't change the API or functionality feature:sampling Related to generating synthetic data after a model is built labels May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:sampling Related to generating synthetic data after a model is built internal The issue doesn't change the API or functionality
Projects
None yet
Development

No branches or pull requests

1 participant