Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting time series data with multiples labels on same ID to use tsfresh #958

Open
AmirSahil opened this issue Jul 28, 2022 Discussed in #957 · 1 comment
Open

Formatting time series data with multiples labels on same ID to use tsfresh #958

AmirSahil opened this issue Jul 28, 2022 Discussed in #957 · 1 comment

Comments

@AmirSahil
Copy link

Discussed in #957

Originally posted by AmirSahil July 28, 2022
Thank you for this amazing library that helps in extracting features for time series data.

I have studied the documentation for structuring data in a format that tsfresh can understand, but I believe I have formatted it correctly. I know that the data associated with each ID at each time is layered on top of one another. Based on the robot failure example, it is expected that each robot (with an ID of 1 to 88) represents either a successful execution with the label 'True' or an unsuccessful execution with the label 'False', but not both, given a different set of features. Each robot either tells if that was a successful execution or a failed execution.

Screenshot 2022-07-28 at 06 31 54

How is a dataset in which a robot with ID-1 represents successful execution (True) with one set of features and failure execution (False) with another set of features supplied into tsfresh for feature extraction to do multi-label classification?

Screenshot 2022-07-28 at 06 32 40

Could you please help me out with this?

@nils-braun
Copy link
Collaborator

Hi @AmirSahil !
Really sorry for the very delayed answer! Thank you for your question.
I am not 100% sure I understood your question. So you want to do prediction of the label column using the features I guess? But in your example, you only have a single "time-series" row to predict the label? Or should the time series until a specific row be used to predict the label? So for example id 1, time 1 + 2 have the label FALSE whereas id 1, time 1 + 2 + 3 have TRUE?
If this is true, I would recommend first "rolling" the time series (see docu) which will create multiple time series out of a single one, basically:
id 1, time 1, 2 ,3 will be turned into three time series:

  • id 1, time 1
  • id 1, time 1 + 2
  • id 1, time 1 + 2 + 3
    This will allow you to use tsfresh "as normal" because now you have a single label per time series.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants