Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function: extract_relevant_features: throws AssertionError: X and y must contain the same number of samples. #945

Open
lthiess8 opened this issue May 21, 2022 · 2 comments
Labels

Comments

@lthiess8
Copy link

Hi, I get an assertion error when using the fuction extract_relevant_features().
When I print len(X) and len(y), I get the same values.

  • Python version: 3.8.5
  • tsfresh version: 0.19.0
  • Install method (conda, pip, source): pip

Thanks in advance!

36965
36965
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-5-59dfec12df74> in <module>
     24     print(len(df))
     25     print(len(target))
---> 26     extracted_relevant_features = extract_relevant_features(df, target, column_id='abgang', column_sort='time',  column_value = 'values', default_fc_parameters=EfficientFCParameters(), ml_task='classification')
     27     extracted_features = extract_features(df, column_id='abgang', column_sort='time',  column_value = 'values', default_fc_parameters=EfficientFCParameters(),n_jobs=8, disable_progressbar=True)
     28 

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/tsfresh/convenience/relevant_extraction.py in extract_relevant_features(timeseries_container, y, X, default_fc_parameters, kind_to_fc_parameters, column_id, column_sort, column_kind, column_value, show_warnings, disable_progressbar, profile, profiling_filename, profiling_sorting, test_for_binary_target_binary_feature, test_for_binary_target_real_feature, test_for_real_target_binary_feature, test_for_real_target_real_feature, fdr_level, hypotheses_independent, n_jobs, distributor, chunksize, ml_task)
    198     )
    199 
--> 200     X_sel = select_features(
    201         X_ext,
    202         y,

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/tsfresh/feature_selection/selection.py in select_features(X, y, test_for_binary_target_binary_feature, test_for_binary_target_real_feature, test_for_real_target_binary_feature, test_for_real_target_real_feature, fdr_level, hypotheses_independent, n_jobs, show_warnings, chunksize, ml_task, multiclass, n_significant)
    152     )
    153     assert len(y) > 1, "y must contain at least two samples."
--> 154     assert len(X) == len(y), "X and y must contain the same number of samples."
    155     assert (
    156         len(set(y)) > 1

AssertionError: X and y must contain the same number of samples.
@lthiess8 lthiess8 added the bug label May 21, 2022
@CelieDs
Copy link

CelieDs commented Jun 20, 2022

Hello! I encountered the same issue, did you manage to find a solution?
Thanks in advance

@lthiess8
Copy link
Author

Hello @CelieDs,

for some reason the indices of X and y did not match.
This notebook helped me to find the solution:
https://github.com/blue-yonder/tsfresh/blob/main/notebooks/advanced/05%20Timeseries%20Forecasting%20(multiple%20ids).ipynb

when i changed the code to the following, it worked for me:

target = df_melted.set_index("time").sort_index().label

target = target[target.index.isin(extracted_features.index)]
extracted_features = extracted_features[extracted_features.index.isin(target.index)]

features_selected = select_features(extracted_features, target, ml_task='classification')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants