-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: TypeError: ufunc 'isfinite' not supported for the input types #3656
Comments
I looked that X is not preprocessed, so I preprocessed it before calling the shap.Explainer().. but now I have this problem (that I think was not correct to preprocessed X...): Traceback (most recent call last): |
Hey, would help a lot if you could provide a complete example that we can copy and paste in order to reproduce the issue. Would be amazing if you could at least provide a couple of sample rows that reproduce the issue. |
Thanks @CloseChoice,
Now, the link for the dataset is: https://www.kaggle.com/datasets/shubhammeshram579/bank-customer-churn-prediction?resource=download To read the dataframe:
The sklearn pipeline apply is:
The list of string and numeric values:
|
Sorry, but your example is still not reproducible. I tried the following but this throws a different error: import shap
import pandas as pd
# from utils.model import ChurnModel
import matplotlib.pyplot as plt
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
categ_lst = ["Geography", "Gender", "Age", "HasCrCard", "IsActiveMember"]
numerical_cols = ["CreditScore", "Age", "Tenure", "Balance", "NumOfProducts", "EstimatedSalary"]
preprocessor = ColumnTransformer(
transformers=[
('cat', OrdinalEncoder(), categ_lst),
('num', StandardScaler(), numerical_cols)
]
)
model = Pipeline([
('preprocessor', preprocessor),
('classifier', LogisticRegression(random_state=42))
])
df = pd.read_csv('bugs/data/Churn_Modelling.csv')
df = df.loc[df.notnull().all(1), :]
X, y = df.drop(columns=["Exited"]), df["Exited"]
model_trained = model.fit(X, y)
print(X.info())
print(X.head())
print(X.isna().sum())
# I ignore this for now since it does not work as expected. Always throws an error that some unexpected category values was found
data = {'CreditScore': [43743],
'Geography': ["Spain"],
'Gender': ["Male"],
'Age': [34.],
'Tenure': [13.],
'Balance': [342.],
'NumOfProducts': [4.],
'HasCrCard': [1.],
'IsActiveMember': [1.],
'EstimatedSalary': [384972.0]
}
features = X.iloc[0, :] # pd.DataFrame(data)
features[categ_lst] = features[categ_lst].astype("string")
features[numerical_cols] = features[numerical_cols].astype("float")
print(features.head())
# print(f"The prediction of this sample is: {model_trained.predict(features)}")
explainer = shap.Explainer(model_trained.predict_proba, X)
transformed = model_trained["preprocessor"].transform(features)
transformed = pd.DataFrame(transformed, columns=df.drop(columns=["Exited"]).columns, dtype=float)
print(transformed)
shap_values = explainer(transformed)
shap.plots.waterfall(shap_values[0,:, 1], max_display = 10)
plt.show() Would be great if you could help make this reproducible so that we can start working on a solution for the problem. As I see you are interested in fixing this, so we would need a reproducible example for the tests either way ;) |
Issue Description
I parsed my dataset converting all column-values to float as defined by the library through a sklearn pipeline, but when I passed the entire dataframe to the explainer() this returned the error reported, here there is the schema of my single row passed:
Idx Column Non-Null Count Dtype
0 CreditScore 1 non-null float64
1 Geography 1 non-null float64
2 Gender 1 non-null float64
3 Age 1 non-null float64
4 Tenure 1 non-null float64
5 Balance 1 non-null float64
6 NumOfProducts 1 non-null float64
7 HasCrCard 1 non-null float64
8 IsActiveMember 1 non-null float64
9 EstimatedSalary 1 non-null float64
dtypes: float64(10)
memory usage: 208.0 bytes
here what I passed (a dataframe with one row):
CreditScore 0.0
Geography 2.0
Gender 0.348659
Age 1.5
Tenure 1.0
Balance 1.3
NumOfProducts 0.518667
HasCrCard 0.404255
IsActiveMember 0.0
EstimatedSalary 0.505253
Minimal Reproducible Example
Traceback
Expected Behavior
No response
Bug report checklist
Installed Versions
0.42.0
The text was updated successfully, but these errors were encountered: