Tsfresh: How can I manually select features from a dataframe of extracted features? (looking for a settings.to_columns)

Created on 7 Aug 2018 · 7Comments · Source: blue-yonder/tsfresh

I'm dealing with a ton of data and am trying to limit the number of times I have to extract features with tsfresh.

I'm storing extracted features as CSV files in a database and would like to be able to read this file, see what's already in there with settings.from_columns and figure out which features I don't have and need to compute. (My users are requesting different levels of extraction)

I'm a bit confused between default_fc_parameters and kind_fc_parameters. My users give a default_fc_parameters as input whereas settings.from_columns returns a kind_fc_parameters. Is there an easy way to transform one into the other?

However, once I figured which features I need to compute, I also need to return every features they requested (and only those). So I need to join what I just extracted with some columns from the precomputed CSV file. Does something like settings.to_columns exist?

EDIT: e.g. given a time series and a dataframe with its extracted "minimal" features, how can I use them to return "efficient" features.

Source

sheepNo

Most helpful comment

The parameter widths appears before coeff of the string so I end up with a KeyError when trying to obtain the column value__cwt_coefficients__coeff_10__w_20__widths_(2, 5, 10, 20).

I close this because I could not reproduce the error. We have a unit test that extracts features for comprehensive fc parameters and then parses them again. I don't know how you end up with that string value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_10__w_20

If you have any idea how I could achieve this simply that would be of great help. Without being able to do this I can't retrieve e.g. the columns corresponding to EfficientFCParameters from a CSV containing ComprehensiveFCParameters. Meaning I either have to save them as two separate CSV files or recompute Efficient even though I already have Comprehensive

you can just extract features for a mini dataset and use the columns of the result dataframe to access

fset = ComprehensiveFCParameters()
X_org = extract_features(pd.DataFrame({"value": [1, 2, 3], "id": [1, 1, 1]}),
                                 default_fc_parameters=fset,
                                 column_id="id", column_value="value",
                                 n_jobs=0)
X.loc[X_org.columns]

So here we used X_org columns to access the comprehensive fc parameters from X

MaxBenChrist on 29 Aug 2018

👍2

All 7 comments

Please read the documentation before asking questions: http://tsfresh.readthedocs.io/en/latest/text/feature_extraction_settings.html

MaxBenChrist on 7 Aug 2018

My question was a bit confused. I have already read the doc multiple times and I couldn't find a way to do the reverse of settings.from_columns. Should I simply copy the part of extract_features which kind of does this or you already have a function achieving this?

My confusion between kind_fc_parameters and default_fc_parameters was not really a concern and I understood the difference after writing this post.

sheepNo on 7 Aug 2018

Finally found convert_to_output_format. That's what I was looking for. It's absolutely not obvious in the doc, you might consider mentioning it alongside from_columns.

sheepNo on 8 Aug 2018

🎉2

Alright. If you want you can submit a pr with an improved docstring

MaxBenChrist on 8 Aug 2018

I am a bit confused with this sentence in the docstring of convert_to_output_format:

The parameters are sorted by their name and written out in the form (link)

I glanced at one of the columns in the output of extract_features (with EfficientFCParameters) and this is what I found:
value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_10__w_20

The parameter widths appears before coeff of the string so I end up with a KeyError when trying to obtain the column value__cwt_coefficients__coeff_10__w_20__widths_(2, 5, 10, 20).

I tried creating this string with append, prepend and sort by names but I always end up with some columns name returning a KeyError.

If you have any idea how I could achieve this simply that would be of great help. Without being able to do this I can't retrieve e.g. the columns corresponding to EfficientFCParameters from a CSV containing ComprehensiveFCParameters. Meaning I either have to save them as two separate CSV files or recompute Efficient even though I already have Comprehensive.

sheepNo on 8 Aug 2018

oh, that could be a bug. Maybe we miss a sort command before serialising the parameter in the column string.

I am quite busy the next two days but I will try to look over this weekend.

MaxBenChrist on 16 Aug 2018

The parameter widths appears before coeff of the string so I end up with a KeyError when trying to obtain the column value__cwt_coefficients__coeff_10__w_20__widths_(2, 5, 10, 20).

If you have any idea how I could achieve this simply that would be of great help. Without being able to do this I can't retrieve e.g. the columns corresponding to EfficientFCParameters from a CSV containing ComprehensiveFCParameters. Meaning I either have to save them as two separate CSV files or recompute Efficient even though I already have Comprehensive

you can just extract features for a mini dataset and use the columns of the result dataframe to access

fset = ComprehensiveFCParameters()
X_org = extract_features(pd.DataFrame({"value": [1, 2, 3], "id": [1, 1, 1]}),
                                 default_fc_parameters=fset,
                                 column_id="id", column_value="value",
                                 n_jobs=0)
X.loc[X_org.columns]

So here we used X_org columns to access the comprehensive fc parameters from X

MaxBenChrist on 29 Aug 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings