Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prediction failing with 1 row of test data #150

Open
meetu30 opened this issue Sep 1, 2023 · 0 comments
Open

Prediction failing with 1 row of test data #150

meetu30 opened this issue Sep 1, 2023 · 0 comments

Comments

@meetu30
Copy link

meetu30 commented Sep 1, 2023

Hi,
I am trying to create 100 rows of data, out of that i pass 99 in training, and only 1 in test data. But I am getting this error -
ValueError: Number of splits 10 is greater than the number of samples: 1.

Below is the code snippet:

create a list of base-models

def get_models():
models = list()
models.append(LinearRegression())
models.append(ElasticNet())
models.append(SVR(gamma='scale'))
models.append(DecisionTreeRegressor())
models.append(KNeighborsRegressor())
models.append(AdaBoostRegressor())
models.append(BaggingRegressor(n_estimators=10))
models.append(RandomForestRegressor(n_estimators=10))
models.append(ExtraTreesRegressor(n_estimators=10))
return models

cost function for base models

def rmse(yreal, yhat):
return sqrt(mean_squared_error(yreal, yhat))

create the super learner

def get_super_learner(X):
ensemble = SuperLearner(scorer=rmse, folds=10, shuffle=True, sample_size=len(X), random_state=42)
# add base models
models = get_models()
ensemble.add(models)
# add the meta model
ensemble.add_meta(LinearRegression())
return ensemble

from mlens.visualization import corr_X_y

create the inputs and outputs

X, y = make_regression(n_samples=100, n_features=4, noise=0.5)

split

X, X_val, y, y_val = train_test_split**(X, y, test_size=1,** random_state=42)
print('Train', X.shape, y.shape, 'Test', X_val.shape, y_val.shape)

create the super learner

ensemble = get_super_learner(X)

fit the super learner

ensemble.fit(X, y)

summarize base learners

print(ensemble.data)

evaluate meta model

yhat = ensemble.predict(X_val)
print('Super Learner: RMSE %.3f' % (rmse(y_val, yhat)))

Output is : Train (99, 4) (99,) Test (1, 4) (1,)
score-m score-s ft-m ft-s pt-m pt-s
layer-1 adaboostregressor 67.84 8.31 1.31 0.02 0.02 0.01
layer-1 baggingregressor 65.24 7.93 0.34 0.01 0.00 0.00
layer-1 decisiontreeregressor 80.64 16.22 0.11 0.01 0.00 0.00
layer-1 elasticnet 46.53 8.68 0.08 0.00 0.00 0.00
layer-1 extratreesregressor 56.78 10.63 0.79 0.04 0.00 0.00
layer-1 kneighborsregressor 51.99 13.06 0.00 0.00 0.00 0.00
layer-1 linearregression 0.53 0.07 0.00 0.00 0.00 0.00
layer-1 randomforestregressor 66.39 7.15 0.75 0.03 0.00 0.00
layer-1 svr 125.71 19.63 0.07 0.00 0.00 0.00
and then the value error
When I do the same using manual creation of libraries, as described here -
https://machinelearningmastery.com/super-learner-ensemble-in-python/
it totally works, but it DOES NOT work with Mlens.

  1. kindly help me fix this.
  2. Also, how can I use random_seed to get the same results? I am using it in train-test split, and then inside super learner, but its not working.
  3. How it picked linear regression in ensemble.add_meta(LinearRegression()) line?
    Kindly guide.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant