Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract_features with kind_to_fc_parameters unable to produce the same features as extract_relevant_features #954

Open
ognjenantonijevic opened this issue Jul 19, 2022 · 2 comments
Labels

Comments

@ognjenantonijevic
Copy link

Long story short, extract_features is not working as expected, it produces a lot of invalid values > [np.nan, np.inf, -np.inf].

Steps to reproduce the issue:

  1. Have an input dataframe in the format used by tsfresh, with multiple time series of different variables. Also have another df with classes for the IDs in the input dataframe (I'm working on a multivariate time series classification problem). So, X_train_ts and y_train_ts
  2. Apply tsfresh code to extract relevant features for classification:
    features_filtered_direct = extract_relevant_features(X_train_ts, y_train_ts,column_id='ID', column_sort='week')
  3. Extract the settings object from the calculated relevant features:
    chosen_features = tsfresh.feature_extraction.settings.from_columns(features_filtered_direct)
  4. Now use this extracted settings on the same input X_train_ts to try and get the same features_filtered_direct object:
    features = extract_features(X_train_ts, column_id='ID',column_sort='week', kind_to_fc_parameters=chosen_features)
  5. The above command produces different df with a lot of invalid values:
    features.isin([np.nan, np.inf, -np.inf]).sum().sort_values()
    image
@ognjenantonijevic
Copy link
Author

tsfresh==0.19.0

@gkumarg
Copy link

gkumarg commented Aug 14, 2022

@ognjenantonijevic I noticed that the extract_relevant_features calls extract_features with impute_function=impute. This may be the difference that you are observing. See if it changes if you add that to your step #4. I got the same error as you when trying the steps on robot_execution_failures dataset and the mismatch went away when I added this step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants