Tsfresh: Support for NVIDIA RAPIDS

Created on 12 Oct 2018  路  9Comments  路  Source: blue-yonder/tsfresh

Could we have a time estimation of the execution time for data consisting of 16000 instances, each 6000 samples wide? Currently the algorithm has been running for nearly 2 days on a 6 core Intel i7 machine (n_jobs=4) and has completed only 40% of the work.

enhancement help wanted

Most helpful comment

Given this is built on Dask, RAPIDS integration "could" be somewhat straight forward to see if acceleration is of value.

All 9 comments

This highly depends on your time of data and the extraction settings. If you extract more features, it will take longer. Further, if the features are more complex, it will also take longer

Can it support GPU? I mean is there a way for TSFRESH to make python use the GPU to process the data?

No, we don't have GPU support (I don't think the calculation that tsfresh is doing would actually profit from a GPU...)

Given this is built on Dask, RAPIDS integration "could" be somewhat straight forward to see if acceleration is of value.

Hello guys,
Some feedback about supporting NVIDIA RAPIDS in the dev roadmap of tsfresh?
It would be very nice to accelerate the feature extraction using cuDF.
Today when I pass a cuDF dataframe instead of Pandas dataframe, i got the following error:
AttributeError: 'DataFrame' object has no attribute 'values'
this is normal, because .values does not exists on cuDF. There are a lot of Pandas functions that does not exists yet on cuDF.
Thanks!

full log:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<timed exec> in <module>

~/anaconda3/lib/python3.7/site-packages/tsfresh/feature_extraction/extraction.py in extract_features(timeseries_container, default_fc_parameters, kind_to_fc_parameters, column_id, column_sort, column_kind, column_value, chunksize, n_jobs, show_warnings, disable_progressbar, impute_function, profile, profiling_filename, profiling_sorting, distributor)
    152             column_id=column_id, column_kind=column_kind,
    153             column_sort=column_sort,
--> 154             column_value=column_value)
    155     # Use the standard setting if the user did not supply ones himself.
    156     if default_fc_parameters is None and kind_to_fc_parameters is None:

~/anaconda3/lib/python3.7/site-packages/tsfresh/utilities/dataframe_functions.py in _normalize_input_to_internal_representation(timeseries_container, column_id, column_sort, column_kind, column_value)
    323             sort = range(len(timeseries_container))
    324             timeseries_container = pd.melt(timeseries_container, id_vars=[column_id],
--> 325                                            value_name=column_value, var_name=column_kind)
    326             timeseries_container[column_sort] = np.repeat(sort, (len(timeseries_container) // len(sort)))
    327 

~/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/melt.py in melt(frame, id_vars, value_vars, var_name, value_name, col_level)
     82     mcolumns = id_vars + var_name + [value_name]
     83 
---> 84     mdata[value_name] = frame.values.ravel('F')
     85     for i, col in enumerate(var_name):
     86         # asanyarray will keep the columns as an Index

~/anaconda3/lib/python3.7/site-packages/cudf/dataframe/dataframe.py in __getattr__(self, key)
    288             return self[key]
    289 
--> 290         raise AttributeError("'DataFrame' object has no attribute %r" % key)
    291 
    292     def __getitem__(self, arg):

AttributeError: 'DataFrame' object has no attribute 'values'

Hello guys,
Some feedback about supporting NVIDIA RAPIDS in the dev roadmap of tsfresh?
It would be very nice to accelerate the feature extraction using cuDF.
Today when I pass a cuDF dataframe instead of Pandas dataframe, i got the following error:
AttributeError: 'DataFrame' object has no attribute 'values'
this is normal, because .values does not exists on cuDF. There are a lot of Pandas functions that does not exists yet on cuDF.
Thanks!

full log:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<timed exec> in <module>

~/anaconda3/lib/python3.7/site-packages/tsfresh/feature_extraction/extraction.py in extract_features(timeseries_container, default_fc_parameters, kind_to_fc_parameters, column_id, column_sort, column_kind, column_value, chunksize, n_jobs, show_warnings, disable_progressbar, impute_function, profile, profiling_filename, profiling_sorting, distributor)
    152             column_id=column_id, column_kind=column_kind,
    153             column_sort=column_sort,
--> 154             column_value=column_value)
    155     # Use the standard setting if the user did not supply ones himself.
    156     if default_fc_parameters is None and kind_to_fc_parameters is None:

~/anaconda3/lib/python3.7/site-packages/tsfresh/utilities/dataframe_functions.py in _normalize_input_to_internal_representation(timeseries_container, column_id, column_sort, column_kind, column_value)
    323             sort = range(len(timeseries_container))
    324             timeseries_container = pd.melt(timeseries_container, id_vars=[column_id],
--> 325                                            value_name=column_value, var_name=column_kind)
    326             timeseries_container[column_sort] = np.repeat(sort, (len(timeseries_container) // len(sort)))
    327 

~/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/melt.py in melt(frame, id_vars, value_vars, var_name, value_name, col_level)
     82     mcolumns = id_vars + var_name + [value_name]
     83 
---> 84     mdata[value_name] = frame.values.ravel('F')
     85     for i, col in enumerate(var_name):
     86         # asanyarray will keep the columns as an Index

~/anaconda3/lib/python3.7/site-packages/cudf/dataframe/dataframe.py in __getattr__(self, key)
    288             return self[key]
    289 
--> 290         raise AttributeError("'DataFrame' object has no attribute %r" % key)
    291 
    292     def __getitem__(self, arg):

AttributeError: 'DataFrame' object has no attribute 'values'

Hey @andrewssobral this is added as of the latest cuDF 0.11 where calling .values returns a cupy array (as opposed to a numpy array).

That being said it looks like you're calling Pandas functions directly here which don't have a dispatch function similar to numpy so you'll continually run into issues unless that's changed.

Thank you @kkraus14 for the update!

So just to be clear here: currently we do not have any one working on this and I also do not think we have someone in the future as no one of us has any experience with it. We are very happy for PRs on this subject :-)

I do have a small update on this: since version 0.16 we have additional dask bindings: you give a dask dataframe in, it will return a dask dataframe. You will find them here: https://github.com/blue-yonder/tsfresh/blob/master/tsfresh/convenience/bindings.py#L36 and in a recent blog entry here.

That being said: it will still do all the computations of the feature extraction in pandas/numpy and not use GPU for that (as Max pointed out: I actually think you will not gain much if your time series itself is not super long. In most use-cases however you have many time series). However, with the bindings it might be at least possible to feed in a dask dataframe and get one out (which might interact better with RAPIDS - I do not know :-)).

Was this page helpful?
0 / 5 - 0 ratings