-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PAR DiagnosticReport not 1.0 with float categorical columns #1910
Labels
Comments
WorkaroundIf anyone is running into this, here is a suggested workaround:
Here is a code snippet that accomplishes the below. Replace the list CAT_COLUMN_NAMES = ['ColA', 'ColB', ... ]
data = <your pandas DataFrame>
metadata = <your SingleTableMetadata object>
# cast the categorical columns to strings
for col_name in CAT_COLUMN_NAMES:
data[col_name] = data[col_name].astype('object')
# now proceed with modeling and sampling as usual
synthesizer = PARSynthesizer(metadata)
synthesizer.fit(data)
synthetic_data = synthesizer.sample(num_sequences=10)
# (optional) cast the categorical columns back to floats
for col_name in CAT_COLUMN_NAMES:
try:
synthetic_data[col_name] = synthetic_data[col_name].astype('float')
except:
print('Column name', col_name, 'could not be converted back to a float')
continue |
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
When running PAR with categorical columns that are floats, PAR does not stick to the original categories when sampling. This leads to a very low diagnostic score for
'Data Validity'
due to theCategoryAdherence
metric failing.Steps to reproduce
The text was updated successfully, but these errors were encountered: