-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to map the features at the end of the pipeline back to the initial features #1328
Comments
The stacking estimator is defined here: https://github.com/EpistasisLab/tpot/blob/master/tpot/builtins/stacking_estimator.py effectively, what it does is takes the predictions of the model and appends it to the left of the inputted data X. If its a classifier with predict_proba, the all class probabilities are also included. If you have a binary class, that means that there would be two additional columns, one for each class. so in your case trans_x_t is [model 1 predicted labels, model 1 probability for class 0, model 1 probability for class 1, ] similarly trans_x_t1 would be [model 2 predicted labels, model 2 probability for class 0, model 2 probability for class 1, <trans_x_t>] |
initial num of features 581 but feature importance of the final pipeline has 587 features.
It looks like that at each of the 3 steps of the pipeline, the # of features increased from 581 -> 584 -> 587
Is there a way to map the 578 features at the end of the pipeline back to the original 581 features?
from sklearn.naive_bayes import GaussianNB
from sklearn.pipeline import make_pipeline, make_union
from tpot.builtins import StackingEstimator
from xgboost import XGBClassifier
exported_pipeline = make_pipeline(
StackingEstimator(estimator=XGBClassifier(learning_rate=0.01, max_depth=4, min_child_weight=6, n_estimators=100, n_jobs=1, subsample=0.15000000000000002, verbosity=0)),
StackingEstimator(estimator=GaussianNB()),
XGBClassifier(learning_rate=0.5, max_depth=2, min_child_weight=20, n_estimators=100, n_jobs=1, subsample=0.9000000000000001, verbosity=0)
)
exported_pipeline.fit(x_v, y_v)
trans_x_t = exported_pipeline[0].transform(x_t)
trans_x_t1 = exported_pipeline[1].transform(trans_x_t)
print(x_t.shape)
(677279, 581)
print(trans_x_t.shape)
(677279, 584)
print(trans_x_t1.shape)
(677279, 587)
exported_pipeline[-1].feature_importances_.shape
(587,)
The text was updated successfully, but these errors were encountered: