Support for sklearn Pipelines #171

MyNameIsFu · 2024-02-05T13:42:32Z

MCA is currently not able to be part of a sklearn Pipeline containing any preceding steps.
In my case I need an Imputer to fill any NaN values.

Working Example:

from sklearn.impute import SimpleImputer
from prince.mca import MCA

test_data = pd.DataFrame(data=np.random.random((10, 5)))
test = Pipeline(steps=[
    ("mca", MCA()),
])
test.fit_transform(test_data)

But including a SimpleImputer results in a numpy array that is being forwarded to the MCA:

from sklearn.impute import SimpleImputer
from prince.mca import MCA

test_data = pd.DataFrame(data=np.random.random((10, 5)))
test = Pipeline(steps=[
    ("impute", SimpleImputer()), # This Breaks the Pipeline since it returns an ndarray
    ("mca", MCA()),
])
test.fit_transform(test_data)

I've tried including a dummy transformer step betwen the imputer and MCA that forwards an arbitrary DataFrame with generic index and column labels, but it results in a KeyError with unknown Index labels being searched in the column list:

KeyError: "None of [Index(['Col_0_0.0', 'Col_0_1.0', 'Col_0_2.0', 'Col_0_3.0', 'Col_0_4.0',\n       'Col_0_5.0', 'Col_1_0.0', 'Col_1_1.0', 'Col_1_2.0', 'Col_2_0.0',\n       'Col_2_1.0', 'Col_3_0.0', 'Col_3_1.0'],\n      dtype='object')] are in the [columns]"

Any suggestions?

The text was updated successfully, but these errors were encountered:

MaxHalford · 2024-02-11T12:44:32Z

Hey there @MyNameIsFu!

I believe you can make this work using sklearn's set_output API:

from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from prince.mca import MCA
import numpy as np

test_data = pd.DataFrame(data=np.random.random((10, 5)))
test = Pipeline(steps=[
    ("impute", SimpleImputer()), # This Breaks the Pipeline since it returns an ndarray
    ("mca", MCA()),
])
test[0].set_output(transform="pandas")
test.fit_transform(test_data)

I hope this works for you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for sklearn Pipelines #171

Support for sklearn Pipelines #171

MyNameIsFu commented Feb 5, 2024 •

edited

MaxHalford commented Feb 11, 2024

Support for sklearn Pipelines #171

Support for sklearn Pipelines #171

Comments

MyNameIsFu commented Feb 5, 2024 • edited

MaxHalford commented Feb 11, 2024

MyNameIsFu commented Feb 5, 2024 •

edited