Hi All,
I've got the following problem:
from tsfresh import extract_features
import pandas as pd
df = pd.read_csv('CV_50_100.csv')
feat = extract_features(df, column_id='T1')
Also breaks with:
from tsfresh import extract_features
import pandas as pd
df = pd.read_csv('CV_50_100.csv')
feat = extract_features(df, column_id='T1', column_sort='Timestamp')
I've spoken to @ MaxBenChrist on Gitter, he suggested opening this.
Edit: Typo in tsfresh version.
As it's a very long error, I decided to post it in a separate comment (so you can delete it if not needed):
Feature Extraction: 0%| | 0/6 [00:00, ?it/s]Traceback (most recent call last):
File "", line 1, in
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
Traceback (most recent call last):
File "", line 1, in
run_name="__mp_main__")
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 263, in run_path
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
pkg_name=pkg_name, script_name=fname)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 96, in _run_module_code
exitcode = _main(fd)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
mod_name, mod_spec, pkg_name, script_name)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 85, in _run_code
prepare(preparation_data)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
exec(code, run_globals)
File "C:\Shaun CSC\evertbase2\tstest.py", line 6, in
_fixup_main_from_path(data['init_main_from_path'])
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
feat = extract_features(df, column_id='T1', column_sort='Timestamp')
File "C:\ProgramData\Anaconda3\lib\site-packages\tsfresh\feature_extractionextraction.py", line 115, in extract_features
run_name="__mp_main__")column_id, column_value)File "C:\ProgramData\Anaconda3\lib\runpy.py", line 263, in run_path
File "C:\ProgramData\Anaconda3\lib\site-packages\tsfresh\feature_extractionextraction.py", line 152, in _extract_features_parallel_per_kind
pool = Pool(settings.n_processes)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
pkg_name=pkg_name, script_name=fname)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 96, in _run_module_code
context=self.get_context())
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 168, in __init__
mod_name, mod_spec, pkg_name, script_name)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 85, in _run_code
self._repopulate_pool()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 233, in _repopulate_pool
exec(code, run_globals)
File "C:\Shaun CSC\evertbase2\tstest.py", line 6, in
feat = extract_features(df, column_id='T1', column_sort='Timestamp')
File "C:\ProgramData\Anaconda3\lib\site-packages\tsfresh\feature_extractionextraction.py", line 115, in extract_features
w.start()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 105, in start
column_id, column_value)
self._popen = self._Popen(self) File "C:\ProgramData\Anaconda3\lib\site-packages\tsfresh\feature_extractionextraction.py", line 152, in _extract_features_parallel_per_kindFile "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
pool = Pool(settings.n_processes)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
context=self.get_context())prep_data = spawn.get_preparation_data(process_obj._name)File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 168, in __init__
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable. self._repopulate_pool()
This looks like a windows error related to the parallelization. can you try to run the same snippet on a linux or mac os machine?
I do not have access to any windows machine, so I can not debug this.
Your first snippet is causing error because tsfresh thinks that the time stamp column is a time series columns and is expecting floats instead of time stamps.
However the second one is passing.
I don't know if @jneuff or @nils-braun have a windows machine at their hand but I doubt it :D :D
I have tried this on a Windows 10 machine with the same results.
Thanks in advance,
Shaun
Hi @ShahuN-107,
finally I succeeded in setting up a windows environment. :D
The solution for your problem seems rather simple as explained here.
Just change your script to:
from tsfresh import extract_features
import pandas as pd
if __name__ == '__main__':
df = pd.read_csv('CV_50_100.csv')
feat = extract_features(df, column_id='T1')
Nevertheless, there is a failure when converting string to float, but this is not related to this issue.
Cheers,
Moritz
Thanks @moritzgelb, so you are now the tsfresh expert for windows? :D
Yes, seems so. :D
I think we should fix that globally:
See those threads
http://stackoverflow.com/questions/39468658/figure-out-if-called-from-function-without-main-guard
So, the multiprocessing library is spawning infinite child processes in a loop in windows. We should be able to catch that with a __name__ = '__main__' guard somewhere. However, I still have to think about where to put that guard. Maybe you got some ideas @moritzgelb @jneuff @nils-braun
@MaxBenChrist
I'm not sure if we should take care of this. As stated in the links you quoted, the multiprocessing failure on window can be avoided by using if __name__ == '__main__'
in the script importing the tsfresh functions.
And it's now also mentioned in the FAQ how to fix this.
I think the user experience suffers if one has to wrap the tsfresh calls by the if __name__ == '__main__'
guard. We should try to do it internally in tsfresh
I totally agree Max, that the user experience suffers, but as far as I have understood it is just technically not possible to do this on the library level. The script that calls extract
must handle this - but this script is written by the user and not us.
@MaxBenChrist
I suggest to close this issue, since the user should take care of this, as pointed out by nils.
Okay, I understand that a name == __main__ guard
look needs to be placed in the top level script. So the user has to add it.
Maybe we can inspect the trace inside extract_features
to prevent a flood of jobs to spawn? I will read into that
So let us keep this issue open until we have a technical argumentation why it is impossible to substitute the guard lock in the top level script
guys, what do you think of having a check when tsfresh is imported and trigger a warning if windows is detected?
In this warning we can recommend the main lock.
Hey there,
just to let you know: I just spent half a day trying to fix this for my case.
Although this is not an issue of this package, I think it's important to mention it in the documentation.
My solution: Put __everything__ in the file you're running within an if __name__ == __main__:
check.
__(including all imports)__
And maybe add a call to multiprocessing.freeze_support()
right after the check, too (it seems to depend on your actual machine whether you need this or not).
This worked for me, although not via IPython console, only via command line.
It is written in the FAQs. If this is still a problem for users and we need to make it more clear, feel free to reopen.
Got this error on macOS, conda, python3.8 (with or without main) - works in python3.6 though
Most helpful comment
Hi @ShahuN-107,
finally I succeeded in setting up a windows environment. :D
The solution for your problem seems rather simple as explained here.
Just change your script to:
Nevertheless, there is a failure when converting string to float, but this is not related to this issue.
Cheers,
Moritz