Models with NaN mode Max are incorrectly exported to code #2104

pkhokhlov · 2022-05-29T02:27:53Z

Problem: Models with NaN mode Max are incorrectly exported to Python and C++ code
catboost version: 1.0.6
Operating System: Linux

Reproducible example:

from sklearn.datasets import make_classification
from math import isnan
import numpy as np
import catboost

# create dataset
np.random.seed(1)
n = 10000
p = 20
n_nan = int(0.1 * n)

X, y = make_classification(random_state=1, n_samples=n, n_features=p, n_informative=p, n_redundant=0)
X[np.argpartition(X[:, 0], -n_nan)[-n_nan:], 0] = np.nan
pool = catboost.Pool(X, y)

# train model
params = {'n_estimators': 10, 'loss_function': 'Logloss', 'nan_mode': 'Max', 'random_state': 10}
clf = catboost.CatBoostClassifier(**params)
clf.fit(pool)

# export to python code
clf.save_model('catboost_saved.py', format='python', pool=pool)

# import python code just saved
import catboost_saved

# evaluate original and exported model
exported_model_preds = [catboost_saved.apply_catboost_model(x) for x in X]
orig_model_preds = clf.predict(X, prediction_type='RawFormulaVal')

# fails but shouldn't
assert np.allclose(exported_model_preds, orig_model_preds)

I am preparing PR to fix this issue. Fix will have exported model check for NaN values so behavior matches NaN mode Max.

The text was updated successfully, but these errors were encountered:

pkhokhlov added a commit to pkhokhlov/catboost that referenced this issue May 29, 2022

fix: NaN mode max model export to code catboost#2104

3c1f79b

pkhokhlov mentioned this issue May 29, 2022

fix: NaN mode max model export to code catboost#2104 #2105

Open

andrey-khropov added bug python labels May 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models with NaN mode Max are incorrectly exported to code #2104

Models with NaN mode Max are incorrectly exported to code #2104

pkhokhlov commented May 29, 2022 •

edited

Models with NaN mode Max are incorrectly exported to code #2104

Models with NaN mode Max are incorrectly exported to code #2104

Comments

pkhokhlov commented May 29, 2022 • edited

pkhokhlov commented May 29, 2022 •

edited