I'm dealing with a ton of data and am trying to limit the number of times I have to extract features with tsfresh.
I'm storing extracted features as CSV files in a database and would like to be able to read this file, see what's already in there with settings.from_columns and figure out which features I don't have and need to compute. (My users are requesting different levels of extraction)
I'm a bit confused between default_fc_parameters and kind_fc_parameters. My users give a default_fc_parameters as input whereas settings.from_columns returns a kind_fc_parameters. Is there an easy way to transform one into the other?
However, once I figured which features I聽need to compute, I also need to return every features they requested (and only those). So I need to join what I just extracted with some columns from the precomputed CSV file. Does something like settings.to_columns exist?
EDIT: e.g. given a time series and a dataframe with its extracted "minimal" features, how can I聽use them to return "efficient" features.
Please read the documentation before asking questions: http://tsfresh.readthedocs.io/en/latest/text/feature_extraction_settings.html
My question was a bit confused. I have already read the doc multiple times and I couldn't find a way to do the reverse of settings.from_columns. Should I simply copy the part of extract_features which kind of does this or you already have a function achieving this?
My confusion between kind_fc_parameters and default_fc_parameters was not really a concern and I understood the difference after writing this post.
Finally found convert_to_output_format. That's what I was looking for. It's absolutely not obvious in the doc, you might consider mentioning it alongside from_columns.
Alright. If you want you can submit a pr with an improved docstring
I am a bit confused with this sentence in the docstring of convert_to_output_format:
The parameters are sorted by their name and written out in the form (link)
I glanced at one of the columns in the output of extract_features (with EfficientFCParameters) and this is what I聽found:
value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_10__w_20
The parameter widths
appears before coeff
of the string so I end up with a KeyError when trying to obtain the column value__cwt_coefficients__coeff_10__w_20__widths_(2, 5, 10, 20)
.
I tried creating this string with append, prepend and sort by names but I always end up with some columns name returning a KeyError.
If you have any idea how I could achieve this simply that would be of great help. Without being able to do this I can't retrieve e.g. the columns corresponding to EfficientFCParameters from a CSV containing ComprehensiveFCParameters. Meaning I either have to save them as two separate CSV files or recompute Efficient even though I already have Comprehensive.
oh, that could be a bug. Maybe we miss a sort command before serialising the parameter in the column string.
I am quite busy the next two days but I will try to look over this weekend.
The parameter widths appears before coeff of the string so I end up with a KeyError when trying to obtain the column value__cwt_coefficients__coeff_10__w_20__widths_(2, 5, 10, 20).
I close this because I could not reproduce the error. We have a unit test that extracts features for comprehensive fc parameters and then parses them again. I don't know how you end up with that string value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_10__w_20
If you have any idea how I could achieve this simply that would be of great help. Without being able to do this I can't retrieve e.g. the columns corresponding to EfficientFCParameters from a CSV containing ComprehensiveFCParameters. Meaning I either have to save them as two separate CSV files or recompute Efficient even though I already have Comprehensive
you can just extract features for a mini dataset and use the columns of the result dataframe to access
fset = ComprehensiveFCParameters()
X_org = extract_features(pd.DataFrame({"value": [1, 2, 3], "id": [1, 1, 1]}),
default_fc_parameters=fset,
column_id="id", column_value="value",
n_jobs=0)
X.loc[X_org.columns]
So here we used X_org
columns to access the comprehensive fc parameters from X
Most helpful comment
I close this because I could not reproduce the error. We have a unit test that extracts features for comprehensive fc parameters and then parses them again. I don't know how you end up with that string
value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_10__w_20
you can just extract features for a mini dataset and use the columns of the result dataframe to access
So here we used
X_org
columns to access the comprehensive fc parameters fromX