Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: TypeError: ufunc 'isfinite' not supported for the input types #3656

Open
3 of 4 tasks
Jeremy98-alt opened this issue May 14, 2024 · 4 comments
Open
3 of 4 tasks
Labels
awaiting feedback Indicates that further information is required from the issue creator

Comments

@Jeremy98-alt
Copy link

Jeremy98-alt commented May 14, 2024

Issue Description

I parsed my dataset converting all column-values to float as defined by the library through a sklearn pipeline, but when I passed the entire dataframe to the explainer() this returned the error reported, here there is the schema of my single row passed:

Idx Column Non-Null Count Dtype
0 CreditScore 1 non-null float64
1 Geography 1 non-null float64
2 Gender 1 non-null float64
3 Age 1 non-null float64
4 Tenure 1 non-null float64
5 Balance 1 non-null float64
6 NumOfProducts 1 non-null float64
7 HasCrCard 1 non-null float64
8 IsActiveMember 1 non-null float64
9 EstimatedSalary 1 non-null float64
dtypes: float64(10)
memory usage: 208.0 bytes

here what I passed (a dataframe with one row):

CreditScore 0.0
Geography 2.0
Gender 0.348659
Age 1.5
Tenure 1.0
Balance 1.3
NumOfProducts 0.518667
HasCrCard 0.404255
IsActiveMember 0.0
EstimatedSalary 0.505253

Minimal Reproducible Example

import shap

preprocessor = ColumnTransformer(
    transformers=[
       ('cat', OrdinalEncoder(), categ_lst),
       ('num', MinMaxScaler(), numerical_cols)
    ]
)

df = ...from a csv parsed etc.
X, y = df.drop(columns=["Exited"]), df["Exited"]
explainer = shap.Explainer(model_trained.predict_proba, X)
single_employer_processed = model_trained["preprocessor"].transform(single_employer_processed)
single_employer_processed = pd.DataFrame(single_employer_processed, columns=df.drop(columns=["Exited"]).columns)
shap_values = explainer(single_employer_processed)

Traceback

Traceback (most recent call last):
  File "/mnt/c/Users/j.sapienza/OneDrive/Desktop/demo-streamlit-xai/.venv/lib/python3.8/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 600, in _run_script
    exec(code, module.__dict__)
  File "/mnt/c/Users/j.sapienza/OneDrive/Desktop/demo-streamlit-xai/streamlit_app/app.py", line 81, in <module>
    shap_values = explainer(single_employer_processed)
  File "/mnt/c/Users/j.sapienza/OneDrive/Desktop/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_exact.py", line 76, in __call__
    return super().__call__(
  File "/mnt/c/Users/j.sapienza/OneDrive/Desktop/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_explainer.py", line 264, in __call__
    row_result = self.explain_row(
  File "/mnt/c/Users/j.sapienza/OneDrive/Desktop/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_exact.py", line 91, in explain_row
    fm = MaskedModel(self.model, self.masker, self.link, self.linearize_link, *row_args)
  File "/mnt/c/Users/j.sapienza/OneDrive/Desktop/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/utils/_masked_model.py", line 30, in __init__
    self._variants = ~self.masker.invariants(*args)
  File "/mnt/c/Users/j.sapienza/OneDrive/Desktop/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/maskers/_tabular.py", line 152, in invariants
    return np.isclose(x, self.data)
  File "<__array_function__ internals>", line 200, in isclose
  File "/mnt/c/Users/j.sapienza/OneDrive/Desktop/demo-streamlit-xai/.venv/lib/python3.8/site-packages/numpy/core/numeric.py", line 2378, in isclose
    yfin = isfinite(y)
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types 
according to the casting rule ''safe''

Expected Behavior

No response

Bug report checklist

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest release of shap.
  • I have confirmed this bug exists on the master branch of shap.
  • I'd be interested in making a PR to fix this bug

Installed Versions

0.42.0

@Jeremy98-alt Jeremy98-alt added the bug Indicates an unexpected problem or unintended behaviour label May 14, 2024
@Jeremy98-alt
Copy link
Author

Jeremy98-alt commented May 14, 2024

I looked that X is not preprocessed, so I preprocessed it before calling the shap.Explainer().. but now I have this problem (that I think was not correct to preprocessed X...):

Traceback (most recent call last):
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 600, in _run_script
exec(code, module.dict)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/streamlit_app/app.py", line 95, in
shap_values = explainer(single_employer_processed)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_exact.py", line 76, in call
return super().call(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_explainer.py", line 264, in call
row_result = self.explain_row(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_exact.py", line 120, in explain_row
outputs = fm(extended_delta_indexes, zero_index=0, batch_size=batch_size)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/utils/_masked_model.py", line 59, in call
return self._delta_masking_call(masks, zero_index=zero_index, batch_size=batch_size)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/utils/_masked_model.py", line 205, in _delta_masking_call
outputs = self.model(*subset_masked_inputs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/models/_model.py", line 28, in call
out = self.inner_model(*args)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/pipeline.py", line 584, in predict_proba
Xt = transform.transform(Xt)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/compose/_column_transformer.py", line 827, in transform
Xs = self._fit_transform(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/compose/_column_transformer.py", line 681, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/parallel.py", line 65, in call
return super().call(iterable_with_config)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/joblib/parallel.py", line 1918, in call
return output if self.return_generator else list(output)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output
res = func(*args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/parallel.py", line 127, in call
return self.function(*args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/pipeline.py", line 940, in _transform_one
res = transformer.transform(X)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 1586, in transform
X_int, X_mask = self._transform(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 192, in _transform
diff, valid_mask = check_unknown(Xi, self.categories[i], return_mask=True)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/_encode.py", line 304, in _check_unknown
if np.isnan(known_values).any():
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

@CloseChoice
Copy link
Collaborator

Hey, would help a lot if you could provide a complete example that we can copy and paste in order to reproduce the issue. Would be amazing if you could at least provide a couple of sample rows that reproduce the issue.

@CloseChoice CloseChoice added awaiting feedback Indicates that further information is required from the issue creator and removed bug Indicates an unexpected problem or unintended behaviour labels May 14, 2024
@Jeremy98-alt
Copy link
Author

Jeremy98-alt commented May 15, 2024

Thanks @CloseChoice,
I tried without inserting inside the Pipeline the OrdinalEncoder() and all the execution is correctly executed, but.. i don't like avoid this solution.. so i hope to solve this problem
I will try to add the sample code:

    import shap
    import pandas as pd
    from utils.model import ChurnModel 
    import matplotlib.pyplot as plt
    import numpy as np
    
    churn_model = ChurnModel()
    
    model_trained = churn_model.load_latest_model(artifacts_dir="./utils/model_artifact/")
    df = churn_model.get_dataset(size=200)
    X, y = df.drop(columns=["Exited"]), df["Exited"]
    
    print(X.info())
    print(X.head())
    print(X.isna().sum())
    
    data = {'CreditScore': ["43743"],
            'Geography': ["Spain"],
            'Gender': ["Male"],
            'Age': ["34"],
            'Tenure': ["13"],
            'Balance': ["342"],
            'NumOfProducts': ["4"],
            'HasCrCard': ["1"],
            'IsActiveMember': ["1"],
            'EstimatedSalary': ["384972.0"]
    }
    
    features = pd.DataFrame(data)
    categ_lst, numerical_cols = churn_model.get_categ_features(), churn_model.get_numerical_features()
    features[categ_lst] = features[categ_lst].astype("string")
    features[numerical_cols] = features[numerical_cols].astype("float")
    
    print(features.head())
    print(f"The prediction of this sample is: {model_trained.predict(features)}")
    
    explainer = shap.Explainer(model_trained.predict_proba, X)
    transformed = model_trained["preprocessor"].transform(features)
    transformed = pd.DataFrame(transformed, columns=df.drop(columns=["Exited"]).columns, dtype=float)
    
    print(transformed)
    shap_values = explainer(transformed)
    shap.plots.waterfall(shap_values[0,:, 1], max_display = 10)
    plt.show()

Now, the link for the dataset is: https://www.kaggle.com/datasets/shubhammeshram579/bank-customer-churn-prediction?resource=download

To read the dataframe:

df_ = pd.read_csv(self.dataset_path, sep=',', on_bad_lines='skip', index_col=False, dtype='unicode')
df = df_.drop(columns=["RowNumber", "CustomerId", "Surname"])

The sklearn pipeline apply is:

preprocessor = ColumnTransformer(
            transformers=[
                ('cat', OrdinalEncoder(), categ_lst),
                ('num', StandardScaler(), numerical_cols)
            ]
        )

        self.model = Pipeline([
            ('preprocessor', preprocessor),
            ('classifier', LogisticRegression(random_state=42))
        ])

The list of string and numeric values:

categ_lst = ["Gender", "Geography"]
 numerical_cols = list(set(df.columns) - set(["Exited", "Gender", "Geography"]))

@CloseChoice
Copy link
Collaborator

Sorry, but your example is still not reproducible. I tried the following but this throws a different error:

import shap
import pandas as pd
# from utils.model import ChurnModel 
import matplotlib.pyplot as plt
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression

categ_lst = ["Geography", "Gender", "Age", "HasCrCard", "IsActiveMember"]

numerical_cols = ["CreditScore", "Age", "Tenure", "Balance", "NumOfProducts", "EstimatedSalary"]

preprocessor = ColumnTransformer(
            transformers=[
                ('cat', OrdinalEncoder(), categ_lst),
                ('num', StandardScaler(), numerical_cols)
            ]
        )

model = Pipeline([
            ('preprocessor', preprocessor),
            ('classifier', LogisticRegression(random_state=42))
        ])



df = pd.read_csv('bugs/data/Churn_Modelling.csv')

df = df.loc[df.notnull().all(1), :]
X, y = df.drop(columns=["Exited"]), df["Exited"]

model_trained = model.fit(X, y)

print(X.info())
print(X.head())
print(X.isna().sum())

# I ignore this for now since it does not work as expected. Always throws an error that some unexpected category values was found
data = {'CreditScore': [43743],
        'Geography': ["Spain"],
        'Gender': ["Male"],
        'Age': [34.],
        'Tenure': [13.],
        'Balance': [342.],
        'NumOfProducts': [4.],
        'HasCrCard': [1.],
        'IsActiveMember': [1.],
        'EstimatedSalary': [384972.0]
}

features = X.iloc[0, :] # pd.DataFrame(data)
features[categ_lst] = features[categ_lst].astype("string")
features[numerical_cols] = features[numerical_cols].astype("float")

print(features.head())
# print(f"The prediction of this sample is: {model_trained.predict(features)}")

explainer = shap.Explainer(model_trained.predict_proba, X)
transformed = model_trained["preprocessor"].transform(features)
transformed = pd.DataFrame(transformed, columns=df.drop(columns=["Exited"]).columns, dtype=float)

print(transformed)
shap_values = explainer(transformed)
shap.plots.waterfall(shap_values[0,:, 1], max_display = 10)
plt.show()

Would be great if you could help make this reproducible so that we can start working on a solution for the problem. As I see you are interested in fixing this, so we would need a reproducible example for the tests either way ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting feedback Indicates that further information is required from the issue creator
Projects
None yet
Development

No branches or pull requests

2 participants