Tsfresh: Missing __name__ = '__main__' guard for Multiprocessing on windows

Created on 2 Apr 2017  路  18Comments  路  Source: blue-yonder/tsfresh

Hi All,
I've got the following problem:

  1. Windows 7: Ultimate
  2. tsfresh==0.7.0
  3. The data on which the problem occurred: CV_50_100.csv
    (have many more similar, but just uploading one)
    CV_50_100.zip
  4. 4.
from tsfresh import extract_features
import pandas as pd

df = pd.read_csv('CV_50_100.csv')

feat = extract_features(df, column_id='T1')

Also breaks with:

from tsfresh import extract_features
import pandas as pd

df = pd.read_csv('CV_50_100.csv')

feat = extract_features(df, column_id='T1', column_sort='Timestamp')

I've spoken to @ MaxBenChrist on Gitter, he suggested opening this.

Edit: Typo in tsfresh version.

bug

Most helpful comment

Hi @ShahuN-107,

finally I succeeded in setting up a windows environment. :D

The solution for your problem seems rather simple as explained here.
Just change your script to:

from tsfresh import extract_features
import pandas as pd

if __name__ == '__main__':
    df = pd.read_csv('CV_50_100.csv')
    feat = extract_features(df, column_id='T1')

Nevertheless, there is a failure when converting string to float, but this is not related to this issue.

Cheers,
Moritz

All 18 comments

As it's a very long error, I decided to post it in a separate comment (so you can delete it if not needed):

Feature Extraction: 0%| | 0/6 [00:00 File "", line 1, in
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
Traceback (most recent call last):
File "", line 1, in
run_name="__mp_main__")
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 263, in run_path
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
pkg_name=pkg_name, script_name=fname)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 96, in _run_module_code
exitcode = _main(fd)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
mod_name, mod_spec, pkg_name, script_name)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 85, in _run_code
prepare(preparation_data)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
exec(code, run_globals)
File "C:\Shaun CSC\evertbase2\tstest.py", line 6, in
_fixup_main_from_path(data['init_main_from_path'])
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
feat = extract_features(df, column_id='T1', column_sort='Timestamp')
File "C:\ProgramData\Anaconda3\lib\site-packages\tsfresh\feature_extractionextraction.py", line 115, in extract_features
run_name="__mp_main__")column_id, column_value)

File "C:\ProgramData\Anaconda3\lib\runpy.py", line 263, in run_path
File "C:\ProgramData\Anaconda3\lib\site-packages\tsfresh\feature_extractionextraction.py", line 152, in _extract_features_parallel_per_kind
pool = Pool(settings.n_processes)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
pkg_name=pkg_name, script_name=fname)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 96, in _run_module_code
context=self.get_context())
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 168, in __init__
mod_name, mod_spec, pkg_name, script_name)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 85, in _run_code
self._repopulate_pool()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 233, in _repopulate_pool
exec(code, run_globals)
File "C:\Shaun CSC\evertbase2\tstest.py", line 6, in
feat = extract_features(df, column_id='T1', column_sort='Timestamp')
File "C:\ProgramData\Anaconda3\lib\site-packages\tsfresh\feature_extractionextraction.py", line 115, in extract_features
w.start()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 105, in start
column_id, column_value)
self._popen = self._Popen(self) File "C:\ProgramData\Anaconda3\lib\site-packages\tsfresh\feature_extractionextraction.py", line 152, in _extract_features_parallel_per_kind

File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
pool = Pool(settings.n_processes)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
context=self.get_context())prep_data = spawn.get_preparation_data(process_obj._name)

File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 168, in __init__
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
self._repopulate_pool()

This looks like a windows error related to the parallelization. can you try to run the same snippet on a linux or mac os machine?

I do not have access to any windows machine, so I can not debug this.

Your first snippet is causing error because tsfresh thinks that the time stamp column is a time series columns and is expecting floats instead of time stamps.

However the second one is passing.

I don't know if @jneuff or @nils-braun have a windows machine at their hand but I doubt it :D :D

I have tried this on a Windows 10 machine with the same results.

Thanks in advance,
Shaun

Hi @ShahuN-107,

finally I succeeded in setting up a windows environment. :D

The solution for your problem seems rather simple as explained here.
Just change your script to:

from tsfresh import extract_features
import pandas as pd

if __name__ == '__main__':
    df = pd.read_csv('CV_50_100.csv')
    feat = extract_features(df, column_id='T1')

Nevertheless, there is a failure when converting string to float, but this is not related to this issue.

Cheers,
Moritz

Thanks @moritzgelb, so you are now the tsfresh expert for windows? :D

Yes, seems so. :D

I think we should fix that globally:

See those threads

http://stackoverflow.com/questions/29690091/python2-7-exception-the-freeze-support-line-can-be-omitted-if-the-program

http://stackoverflow.com/questions/39468658/figure-out-if-called-from-function-without-main-guard

So, the multiprocessing library is spawning infinite child processes in a loop in windows. We should be able to catch that with a __name__ = '__main__' guard somewhere. However, I still have to think about where to put that guard. Maybe you got some ideas @moritzgelb @jneuff @nils-braun

@MaxBenChrist

I'm not sure if we should take care of this. As stated in the links you quoted, the multiprocessing failure on window can be avoided by using if __name__ == '__main__' in the script importing the tsfresh functions.
And it's now also mentioned in the FAQ how to fix this.

I think the user experience suffers if one has to wrap the tsfresh calls by the if __name__ == '__main__' guard. We should try to do it internally in tsfresh

I totally agree Max, that the user experience suffers, but as far as I have understood it is just technically not possible to do this on the library level. The script that calls extract must handle this - but this script is written by the user and not us.

@MaxBenChrist
I suggest to close this issue, since the user should take care of this, as pointed out by nils.

Okay, I understand that a name == __main__ guard look needs to be placed in the top level script. So the user has to add it.

Maybe we can inspect the trace inside extract_features to prevent a flood of jobs to spawn? I will read into that

So let us keep this issue open until we have a technical argumentation why it is impossible to substitute the guard lock in the top level script

guys, what do you think of having a check when tsfresh is imported and trigger a warning if windows is detected?

In this warning we can recommend the main lock.

Hey there,

just to let you know: I just spent half a day trying to fix this for my case.

Although this is not an issue of this package, I think it's important to mention it in the documentation.

My solution: Put __everything__ in the file you're running within an if __name__ == __main__: check.
__(including all imports)__
And maybe add a call to multiprocessing.freeze_support() right after the check, too (it seems to depend on your actual machine whether you need this or not).

This worked for me, although not via IPython console, only via command line.

It is written in the FAQs. If this is still a problem for users and we need to make it more clear, feel free to reopen.

Got this error on macOS, conda, python3.8 (with or without main) - works in python3.6 though

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ironerumi picture ironerumi  路  12Comments

seanlaw picture seanlaw  路  21Comments

stefanKalabakov picture stefanKalabakov  路  9Comments

Sukanya2191 picture Sukanya2191  路  7Comments

michetonu picture michetonu  路  24Comments