Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models with NaN mode Max are incorrectly exported to code #2104

Open
pkhokhlov opened this issue May 29, 2022 · 0 comments
Open

Models with NaN mode Max are incorrectly exported to code #2104

pkhokhlov opened this issue May 29, 2022 · 0 comments

Comments

@pkhokhlov
Copy link

pkhokhlov commented May 29, 2022

Problem: Models with NaN mode Max are incorrectly exported to Python and C++ code
catboost version: 1.0.6
Operating System: Linux

Reproducible example:

from sklearn.datasets import make_classification
from math import isnan
import numpy as np
import catboost

# create dataset
np.random.seed(1)
n = 10000
p = 20
n_nan = int(0.1 * n)

X, y = make_classification(random_state=1, n_samples=n, n_features=p, n_informative=p, n_redundant=0)
X[np.argpartition(X[:, 0], -n_nan)[-n_nan:], 0] = np.nan
pool = catboost.Pool(X, y)

# train model
params = {'n_estimators': 10, 'loss_function': 'Logloss', 'nan_mode': 'Max', 'random_state': 10}
clf = catboost.CatBoostClassifier(**params)
clf.fit(pool)

# export to python code
clf.save_model('catboost_saved.py', format='python', pool=pool)

# import python code just saved
import catboost_saved

# evaluate original and exported model
exported_model_preds = [catboost_saved.apply_catboost_model(x) for x in X]
orig_model_preds = clf.predict(X, prediction_type='RawFormulaVal')

# fails but shouldn't
assert np.allclose(exported_model_preds, orig_model_preds)

I am preparing PR to fix this issue. Fix will have exported model check for NaN values so behavior matches NaN mode Max.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants