Since PR https://github.com/pymc-devs/pymc3/pull/3011 I have been having troubles sampling multiple chains with multiple cores. In Jupyter notebook I get _random_ kernel shutdowns and therefore I haven't managed to pinpoint what is the problem (it seems that the more complicated the model is, the higher the crash rate). However, I found a systematic issue when using the python interpreter only (not the Jupyter kernel): if I sample more than one chain using more than 1 core (say, 2 chains and 2 cores) Python crashes. Sampling multiple chains with 1 core, or 1 chain with multiple cores is fine. On a Jupyter notebook I do not encounter any problems.
The minimal example is attached (please run it as a script, and not on a Jupyter kernel):
import numpy as np
import pandas as pd
import theano
import pymc3 as pm
print('*** Start script ***')
print(f'{pm.__name__}: v. {pm.__version__}')
print(f'{theano.__name__}: v. {theano.__version__}')
SEED = 20180730
np.random.seed(SEED)
# Generate data
mu_real = 0
sd_real = 1
n_samples = 1000
y = np.random.normal(loc=mu_real, scale=sd_real, size=n_samples)
# Bayesian modelling
with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=10)
sd = pm.HalfNormal('sd', sd=10)
# Likelihood
likelihood = pm.Normal('likelihood', mu=mu, sd=sd, observed=y)
trace = pm.sample(chains=2, cores=2, random_seed=SEED)
print('Done!')
Running with chains=2 and cores=2 throws the error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
Traceback (most recent call last):
File "test_multicore_multichain.py", line 28, in <module>
run_name="__mp_main__")
trace = pm.sample(chains=2, cores=2, random_seed=SEED) File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 263, in run_path
File "d:\dev\pymc3\pymc3\sampling.py", line 451, in sample
pkg_name=pkg_name, script_name=fname)
File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 96, in _run_module_code
trace = _mp_sample(**sample_args)
File "d:\dev\pymc3\pymc3\sampling.py", line 998, in _mp_sample
mod_name, mod_spec, pkg_name, script_name)
File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\moran\Desktop\test_multicore_multichain.py", line 28, in <module>
chain, progressbar)
trace = pm.sample(chains=2, cores=2, random_seed=SEED) File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in __init__
File "d:\dev\pymc3\pymc3\sampling.py", line 451, in sample
for chain, seed, start in zip(range(chains), seeds, start_points)
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in <listcomp>
trace = _mp_sample(**sample_args)
for chain, seed, start in zip(range(chains), seeds, start_points) File "d:\dev\pymc3\pymc3\sampling.py", line 998, in _mp_sample
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 182, in __init__
self._process.start()
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\process.py", line 105, in start
chain, progressbar)
self._popen = self._Popen(self) File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in __init__
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 223, in _Popen
for chain, seed, start in zip(range(chains), seeds, start_points)
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in <listcomp>
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 322, in _Popen
for chain, seed, start in zip(range(chains), seeds, start_points)
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 182, in __init__
return Popen(process_obj)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
self._process.start()
reduction.dump(process_obj, to_child) File "C:\Miniconda3\envs\bayes\lib\multiprocessing\process.py", line 105, in start
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\reduction.py", line 60, in dump
self._popen = self._Popen(self)
ForkingPickler(file, protocol).dump(obj) File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 223, in _Popen
BrokenPipeError: [Errno 32] Broken pipereturn _default_context.get_context().Process._Popen(process_obj)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
The interesting thing is that the print statements in the script are duplicated (which does not happen when chains=2 and cores=1, or chains=1 and cores=2)
*** Start script ***
pymc3: v. 3.5
theano: v. 1.0.2
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, mu]
*** Start script ***
pymc3: v. 3.5
theano: v. 1.0.2
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, mu]
I am on master on both PyMC3 and Theano.
Possible Windows related... @aseyboldt
Yes, this looks like an issue with multiprocessing on windows.
Can you try this:
import numpy as np
import pandas as pd
import theano
import pymc3 as pm
print('*** Start script ***')
print(f'{pm.__name__}: v. {pm.__version__}')
print(f'{theano.__name__}: v. {theano.__version__}')
if __name__ == '__main__':
SEED = 20180730
np.random.seed(SEED)
# Generate data
mu_real = 0
sd_real = 1
n_samples = 1000
y = np.random.normal(loc=mu_real, scale=sd_real, size=n_samples)
# Bayesian modelling
with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=10)
sd = pm.HalfNormal('sd', sd=10)
# Likelihood
likelihood = pm.Normal('likelihood', mu=mu, sd=sd, observed=y)
trace = pm.sample(chains=2, cores=2, random_seed=SEED)
print('Done!')
But I don't really understand why it has trouble in the notebook. Can you post the versions of pyzmq, jupyter and ipython?
If I use the _if_ statement then the sampling works. Still, the print statements are executed multiple times:
*** Start script ***
pymc3: v. 3.5
theano: v. 1.0.2
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, mu]
*** Start script ***
pymc3: v. 3.5
theano: v. 1.0.2
*** Start script ***
pymc3: v. 3.5
theano: v. 1.0.2
Sampling 2 chains: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 2000/2000 [00:02<00:00, 724.48draws/s] Done!
This particular script runs fine on Jupyter notebook (I crashed 1 time only after several attempts).
In general, however, the sampling with multiple cores got very unreliable. I have some more complicated models that won't run with multiple cores (in a fresh installed environment). For example, one notebook I am working on now (a softmax regression) crashes continuously when using multiple cores:
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [beta_]
Sampling 2 chains: 0%| | 0/8000 [00:00<?, ?draws/s]
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFC9C7F94C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFCD18B56FD Unknown Unknown Unknown
KERNEL32.DLL 00007FFCD38E3034 Unknown Unknown Unknown
ntdll.dll 00007FFCD4A11431 Unknown Unknown Unknown
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFC9C7F94C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFCD18B56FD Unknown Unknown Unknown
KERNEL32.DLL 00007FFCD38E3034 Unknown Unknown Unknown
ntdll.dll 00007FFCD4A11431 Unknown Unknown Unknown
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFC9C7F94C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFCD18B56FD Unknown Unknown Unknown
KERNEL32.DLL 00007FFCD38E3034 Unknown Unknown Unknown
ntdll.dll 00007FFCD4A11431 Unknown Unknown Unknown
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFC9C7F94C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFCD18B56FD Unknown Unknown Unknown
KERNEL32.DLL 00007FFCD38E3034 Unknown Unknown Unknown
ntdll.dll 00007FFCD4A11431 Unknown Unknown Unknown
[I 14:26:40.033 NotebookApp] Interrupted...
[I 14:26:40.033 NotebookApp] Shutting down 2 kernels
[I 14:26:40.135 NotebookApp] Kernel shutdown: eaa60eb4-6bae-4c91-82bf-6bd5648ddf35
[I 14:26:40.135 NotebookApp] Kernel shutdown: e41f13f3-e731-4812-8130-97a7a6220fd7
If I run the softmax regression script as python script (without the if __name__ == '__main__':) I get the error
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [beta_]
3.5
1.0.2
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [beta_]
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
Traceback (most recent call last):
File "test_softmax_multicore.py", line 38, in <module>
run_name="__mp_main__")
File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 263, in run_path
trace = pm.sample(draws=3000, tune=1000, chains=2, cores=4, random_seed=SEED)
File "d:\dev\pymc3\pymc3\sampling.py", line 451, in sample
pkg_name=pkg_name, script_name=fname)
File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 96, in _run_module_code
trace = _mp_sample(**sample_args)
File "d:\dev\pymc3\pymc3\sampling.py", line 998, in _mp_sample
mod_name, mod_spec, pkg_name, script_name)
File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)chain, progressbar)
File "D:\dev\GLM_with_PyMC3\notebooks\test_softmax_multicore.py", line 38, in <module>
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in __init__
trace = pm.sample(draws=3000, tune=1000, chains=2, cores=4, random_seed=SEED)
File "d:\dev\pymc3\pymc3\sampling.py", line 451, in sample
for chain, seed, start in zip(range(chains), seeds, start_points)
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in <listcomp>
trace = _mp_sample(**sample_args)
File "d:\dev\pymc3\pymc3\sampling.py", line 998, in _mp_sample
for chain, seed, start in zip(range(chains), seeds, start_points)
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 182, in __init__
self._process.start()chain, progressbar)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\process.py", line 105, in start
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in __init__
self._popen = self._Popen(self)
for chain, seed, start in zip(range(chains), seeds, start_points) File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 223, in _Popen
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in <listcomp>
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 322, in _Popen
for chain, seed, start in zip(range(chains), seeds, start_points)
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 182, in __init__
return Popen(process_obj)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
self._process.start()
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\process.py", line 105, in start
reduction.dump(process_obj, to_child)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\reduction.py", line 60, in dump
self._popen = self._Popen(self)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 223, in _Popen
ForkingPickler(file, protocol).dump(obj)
BrokenPipeErrorreturn _default_context.get_context().Process._Popen(process_obj):
[Errno 32] Broken pipe File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
If I wrap the script into if __name__ == '__main__': I get the error
sampling 2 chains: 0%| | 0/8000 [00:00<?, ?draws/s] You can find the C code in this temporary file: C:\Users\moran\AppData\Local\Temp\theano_compilation_error__a0g2s_m
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1082, in _constructor_Function
f = maker.create(input_storage, trustme=True)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1715, in create
input_storage=input_storage_lists, storage_map=storage_map)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\link.py", line 699, in make_thunk
storage_map=storage_map)[:3]
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\vm.py", line 1091, in make_all
impl=impl))
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 955, in make_thunk
no_recycling)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 858, in make_c_thunk
output_storage=node_output_storage)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1217, in make_thunk
keep_lock=keep_lock)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1157, in __compile__
keep_lock=keep_lock)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1620, in cthunk_factory
key=key, lnk=self, keep_lock=keep_lock)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 1181, in module_from_key
module = lnk.compile_cmodule(location)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1523, in compile_cmodule
preargs=preargs)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 2388, in compile_str
(status, compile_stderr.replace('\n', '. ')))
Exception: ('The following error happened while compiling the node', Softmax(Dot22.0), '\n', 'Compilation failed (return status=3): ', '[Softmax(<TensorType(float64, matrix)>)]')
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFC98B294C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFCD18B56FD Unknown Unknown Unknown
KERNEL32.DLL 00007FFCD38E3034 Unknown Unknown Unknown
ntdll.dll 00007FFCD4A11431 Unknown Unknown Unknown
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFC98B294C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFCD18B56FD Unknown Unknown Unknown
KERNEL32.DLL 00007FFCD38E3034 Unknown Unknown Unknown
ntdll.dll 00007FFCD4A11431 Unknown Unknown Unknown
So it seems that there are two issues here:
if __name__ == '__main__' in stand-alone scripts. I somehow managed to miss the fact that this impacts users when writing the new backend. I'm not really sure how bad this is right now. It is a backward incompatible change, and at the very least it should be mentioned in the documentation. Maybe we should even disable the new backend on windows for now. But on the upside, at least I understand why this is happening.I'm trying to reproduce this locally, can you send me an example that fails with the second error?
What is the output of np.__config__.show()?
I have some vague ideas where this might be coming from, and if my hunch is right, setting one of OMP_NUM_THREADS=1, MKL_THREADING_LAYER=sequential or MKL_THREADING_LAYER=GNU might help. To do that, execute
import os
# one of
os.environ['MKL_THREADING_LAYER'] = 'sequential'
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['MKL_THREADING_LAYER'] = 'GNU'
before you import anything else.
And thank you for reporting this :-)
The np.__config__.show() outputs:
mkl_info:
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include']
blas_mkl_info:
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include']
blas_opt_info:
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include']
lapack_mkl_info:
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include']
lapack_opt_info:
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include']
I tried to set the environment variables but it does not solve the issue, unfortunately.
I have attached the Jupyter notebook (+data) that keeps crashing on my side. It is based on the softmax regression on DBDA2 book.
test_softmax_multicore.zip
Thank _you_ for looking into this.
It works for me, but I have a different blas installed. How did you install python/numpy/pymc3?
Can you maybe also post the output of pip freeze and conda list?
I installed numpy (and scipy and all the PyMC3 dependencies) via conda (because it links the packages to the mkl library). Then I installed Theano and PyMC3 via pip git.
Conda list
conda list
# packages in environment at C:\Miniconda3\envs\bayes:
#
# Name Version Build Channel
backcall 0.1.0 py36_0
blas 1.0 mkl
bleach 2.1.3 py36_0
ca-certificates 2018.03.07 0
certifi 2018.4.16 py36_0
colorama 0.3.9 py36h029ae33_0
cycler 0.10.0 py36h009560c_0
cython 0.28.3 py36hfa6e2cd_0
decorator 4.3.0 py36_0
entrypoints 0.2.3 py36hfd66bb0_2
freetype 2.8 h51f8f2c_1
h5py 2.8.0 <pip>
html5lib 1.0.1 py36h047fa9f_0
icc_rt 2017.0.4 h97af966_0
icu 58.2 ha66f8fd_1
intel-openmp 2018.0.3 0
ipykernel 4.8.2 py36_0
ipython 6.4.0 py36_0
ipython_genutils 0.2.0 py36h3c5d0ee_0
ipywidgets 7.2.1 py36_0
jedi 0.12.0 py36_1
jinja2 2.10 py36h292fed1_0
joblib 0.12.0 <pip>
jpeg 9b hb83a4c4_2
jsonschema 2.6.0 py36h7636477_0
jupyter 1.0.0 py36_4
jupyter_client 5.2.3 py36_0
jupyter_console 5.2.0 py36h6d89b47_1
jupyter_core 4.4.0 py36h56e9d50_0
kiwisolver 1.0.1 py36h12c3424_0
libpng 1.6.34 h79bbb47_0
libpython 2.1 py36_0
libsodium 1.0.16 h9d3ae62_0
m2w64-binutils 2.25.1 5 msys2
m2w64-bzip2 1.0.6 6 msys2
m2w64-crt-git 5.0.0.4636.2595836 2 msys2
m2w64-gcc 5.3.0 6 msys2
m2w64-gcc-ada 5.3.0 6 msys2
m2w64-gcc-fortran 5.3.0 6 msys2
m2w64-gcc-libgfortran 5.3.0 6 msys2
m2w64-gcc-libs 5.3.0 7 msys2
m2w64-gcc-libs-core 5.3.0 7 msys2
m2w64-gcc-objc 5.3.0 6 msys2
m2w64-gmp 6.1.0 2 msys2
m2w64-headers-git 5.0.0.4636.c0ad18a 2 msys2
m2w64-isl 0.16.1 2 msys2
m2w64-libiconv 1.14 6 msys2
m2w64-libmangle-git 5.0.0.4509.2e5a9a2 2 msys2
m2w64-libwinpthread-git 5.0.0.4634.697f757 2 msys2
m2w64-make 4.1.2351.a80a8b8 2 msys2
m2w64-mpc 1.0.3 3 msys2
m2w64-mpfr 3.1.4 4 msys2
m2w64-pkg-config 0.29.1 2 msys2
m2w64-toolchain 5.3.0 7 msys2
m2w64-tools-git 5.0.0.4592.90b8472 2 msys2
m2w64-windows-default-manifest 6.4 3 msys2
m2w64-winpthreads-git 5.0.0.4634.697f757 2 msys2
m2w64-zlib 1.2.8 10 msys2
markupsafe 1.0 py36h0e26971_1
matplotlib 2.2.2 py36h153e9ff_1
mistune 0.8.3 py36hfa6e2cd_1
mkl 2018.0.3 1
mkl-service 1.1.2 py36h57e144c_4
mkl_fft 1.0.1 py36h452e1ab_0
mkl_random 1.0.1 py36h9258bd6_0
msys2-conda-epoch 20160418 1 msys2
nbconvert 5.3.1 py36h8dc0fde_0
nbformat 4.4.0 py36h3a5bc1b_0
notebook 5.5.0 py36_0
numpy 1.14.3 py36h9fa60d3_2
numpy-base 1.14.3 py36h5c71026_2
openssl 1.0.2o h8ea7d77_0
pandas 0.23.1 py36h830ac7b_0
pandoc 2.2.1 h1a437c5_0
pandocfilters 1.4.2 py36h3ef6317_1
parso 0.2.1 py36_0
patsy 0.5.0 py36_0
pickleshare 0.7.4 py36h9de030f_0
pip 10.0.1 py36_0
prompt_toolkit 1.0.15 py36h60b8f86_0
pygments 2.2.0 py36hb010967_0
pymc3 3.4.1 <pip>
pyparsing 2.2.0 py36h785a196_1
pyqt 5.9.2 py36h1aa27d4_0
python 3.6.6 hea74fb7_0
python-dateutil 2.7.3 py36_0
pytz 2018.4 py36_0
pywinpty 0.5.4 py36_0
pyzmq 17.0.0 py36hfa6e2cd_1
qt 5.9.6 vc14h62aca36_0 [vc14]
qtconsole 4.3.1 py36h99a29a9_0
scipy 1.1.0 py36h672f292_0
seaborn 0.8.1 py36h9b69545_0
send2trash 1.5.0 py36_0
setuptools 39.2.0 py36_0
simplegeneric 0.8.1 py36_2
sip 4.19.8 py36h6538335_0
six 1.11.0 py36h4db2310_1
sqlite 3.24.0 h7602738_0
statsmodels 0.9.0 py36h452e1ab_0
terminado 0.8.1 py36_1
testpath 0.3.1 py36h2698cfe_0
Theano 1.0.2+26.gd0420e3d9 <pip>
tornado 5.0.2 py36_0
tqdm 4.23.4 <pip>
traitlets 4.3.2 py36h096827d_0
vc 14 h0510ff6_3
vs2015_runtime 14.0.25123 3
wcwidth 0.1.7 py36h3d5aa90_0
webencodings 0.5.1 py36h67c50ae_1
wheel 0.31.1 py36_0
widgetsnbextension 3.2.1 py36_0
wincertstore 0.2 py36h7fe50ca_0
winpty 0.4.3 4
zeromq 4.2.5 hc6251cf_0
zlib 1.2.11 h8395fce_2
pip freeze
backcall==0.1.0
bleach==2.1.3
certifi==2018.4.16
colorama==0.3.9
cycler==0.10.0
Cython==0.28.3
decorator==4.3.0
entrypoints==0.2.3
h5py==2.8.0
html5lib==1.0.1
ipykernel==4.8.2
ipython==6.4.0
ipython-genutils==0.2.0
ipywidgets==7.2.1
jedi==0.12.0
Jinja2==2.10
joblib==0.12.0
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.3
jupyter-console==5.2.0
jupyter-core==4.4.0
kiwisolver==1.0.1
MarkupSafe==1.0
matplotlib==2.2.2
mistune==0.8.3
mkl-fft==1.0.0
mkl-random==1.0.1
nbconvert==5.3.1
nbformat==4.4.0
notebook==5.5.0
numpy==1.14.3
pandas==0.23.1
pandocfilters==1.4.2
parso==0.2.1
patsy==0.5.0
pickleshare==0.7.4
prompt-toolkit==1.0.15
Pygments==2.2.0
-e git+https://github.com/JackCaster/pymc3.git@98545be7ddad700b5fb02be2893d2fedae22c110#egg=pymc3
pyparsing==2.2.0
python-dateutil==2.7.3
pytz==2018.4
pywinpty==0.5.4
pyzmq==17.0.0
qtconsole==4.3.1
scipy==1.1.0
seaborn==0.8.1
Send2Trash==1.5.0
simplegeneric==0.8.1
six==1.11.0
statsmodels==0.9.0
terminado==0.8.1
testpath==0.3.1
Theano==1.0.2+26.gd0420e3d9
tornado==5.0.2
tqdm==4.23.4
traitlets==4.3.2
wcwidth==0.1.7
webencodings==0.5.1
widgetsnbextension==3.2.1
wincertstore==0.2
I did some digging. I found out that the error forrtl: error (200): program aborting due to control-C event that makes the kernel crash is not unusual (see here). In the comments, they suggest to set the environment variable FOR_DISABLE_CONSOLE_CTRL_HANDLER to "1" or "T". I did so, and when the notebook crashes (because it still does ;( ), the traceback is:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1082, in _constructor_Function
f = maker.create(input_storage, trustme=True)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1715, in create
Traceback (most recent call last):
File "<string>", line 1, in <module>
input_storage=input_storage_lists, storage_map=storage_map)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\link.py", line 699, in make_thunk
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 115, in _main
storage_map=storage_map)[:3] self = reduction.pickle.load(from_parent)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\vm.py", line 1091, in make_all
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1082, in _constructor_Function
impl=impl))
f = maker.create(input_storage, trustme=True) File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 955, in make_thunk
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1715, in create
no_recycling)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 858, in make_c_thunk
input_storage=input_storage_lists, storage_map=storage_map)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\link.py", line 699, in make_thunk
output_storage=node_output_storage)storage_map=storage_map)[:3]
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1217, in make_thunk
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\vm.py", line 1091, in make_all
impl=impl))
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 955, in make_thunk
keep_lock=keep_lock)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1157, in __compile__
no_recycling)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 858, in make_c_thunk
keep_lock=keep_lock)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1620, in cthunk_factory
output_storage=node_output_storage)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1217, in make_thunk
key=key, lnk=self, keep_lock=keep_lock)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 1151, in module_from_key
keep_lock=keep_lock)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1157, in __compile__
with compilelock.lock_ctx(keep_lock=keep_lock):
File "C:\Miniconda3\envs\bayes\lib\contextlib.py", line 81, in __enter__
keep_lock=keep_lock)return next(self.gen)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1620, in cthunk_factory
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\compilelock.py", line 40, in lock_ctx
get_lock(lock_dir=lock_dir, **kw)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\compilelock.py", line 86, in _get_lock
lock(get_lock.lock_dir, **kw)
key=key, lnk=self, keep_lock=keep_lock) File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\compilelock.py", line 273, in lock
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 1181, in module_from_key
time.sleep(random.uniform(min_wait, max_wait))
KeyboardInterrupt
module = lnk.compile_cmodule(location)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1523, in compile_cmodule
preargs=preargs)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 2343, in compile_str
p_out = output_subprocess_Popen(cmd)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\misc\windows.py", line 80, in output_subprocess_Popen
out = p.communicate()
File "C:\Miniconda3\envs\bayes\lib\subprocess.py", line 843, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "C:\Miniconda3\envs\bayes\lib\subprocess.py", line 1092, in _communicate
self.stdout_thread.join(self._remaining_time(endtime))
File "C:\Miniconda3\envs\bayes\lib\threading.py", line 1056, in join
self._wait_for_tstate_lock()
File "C:\Miniconda3\envs\bayes\lib\threading.py", line 1072, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt
[I 11:43:03.371 NotebookApp] Interrupted...
[I 11:43:03.371 NotebookApp] Shutting down 1 kernel
[I 11:43:08.431 NotebookApp] Kernel shutdown: cb25a99e-15f2-4f7f-b3c0-9706ab711a70
I hope this helps to shed light on the issue.
I have similar error (windows 2012 + pymc3 3.5(master) + theano 1.0.3 (master)
Here are ways, that can "work around" this situation for me^
n_jobs=1init='advi' optionmode = FAST_COMPILE in .theanorcI also have a short program that blows up (with a "broken pipe" message) as soon as I set chains > 1. I have a multicore machine (but then who doesn't). The code:
from theano import shared
from numpy import ones, array
from pymc3 import Model, Normal, Deterministic, Binomial, Metropolis, sample
from pymc3.math import invlogit
log_dosage = shared(array([-.86, -.3, -.05, .73]))
sample_size = shared(5 * ones(4, dtype=int))
deaths = array([0, 1, 3, 5])
with Model() as bioassay_model:
alpha = Normal('alpha', 0, sd=100)
beta = Normal('beta', 0, sd=100)
theta = Deterministic("theta", invlogit(alpha + beta * log_dosage))
observed_deaths = Binomial('observed_deaths', n=sample_size, p=theta, observed=deaths)
trace = sample(draws=10000, start={"alpha":0.5}, step=Metropolis(), chains=2)
I have a GeForce GTX 1050 GPU running CUDA 8.0, CUDNN 7.1.3, theano 1.0.3, pymc3 3.5, python 3.6.6
my theano.rc:
[global]
device = cuda
force_device=True
optimizer = fast_run
optimizer_including=cudnn
mode=FAST_RUN
[nvcc]
fastmath = True
allow_gc=True
[lib]
cnmem = 0.8
[gpuarray]
preallocate=0.7
[scan]
allow_gc=True
allow_output_prealloc=True
BrokenPipeError Traceback (most recent call last)
<ipython-input-1-fb96fbe5f1ac> in <module>
13 theta = Deterministic("theta", invlogit(alpha + beta * log_dosage))
14 observed_deaths = Binomial('observed_deaths', n=sample_size, p=theta, observed=deaths)
---> 15 trace = sample(draws=10000, start={"alpha":0.5}, step=Metropolis(), chains=2)
~\AppData\Local\conda\conda\envs\pymc3\lib\site-packages\pymc3\sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, discard_tuned_samples, live_plot_kwargs, compute_convergence_checks, use_mmap, **kwargs)
447 _print_step_hierarchy(step)
448 try:
--> 449 trace = _mp_sample(**sample_args)
450 except pickle.PickleError:
451 _log.warning("Could not pickle model, sampling singlethreaded.")
~\AppData\Local\conda\conda\envs\pymc3\lib\site-packages\pymc3\sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, use_mmap, **kwargs)
994 sampler = ps.ParallelSampler(
995 draws, tune, chains, cores, random_seed, start, step,
--> 996 chain, progressbar)
997 try:
998 with sampler:
~\AppData\Local\conda\conda\envs\pymc3\lib\site-packages\pymc3\parallel_sampling.py in __init__(self, draws, tune, chains, cores, seeds, start_points, step_method, start_chain_num, progressbar)
273 ProcessAdapter(draws, tune, step_method,
274 chain + start_chain_num, seed, start)
--> 275 for chain, seed, start in zip(range(chains), seeds, start_points)
276 ]
277
~\AppData\Local\conda\conda\envs\pymc3\lib\site-packages\pymc3\parallel_sampling.py in <listcomp>(.0)
273 ProcessAdapter(draws, tune, step_method,
274 chain + start_chain_num, seed, start)
--> 275 for chain, seed, start in zip(range(chains), seeds, start_points)
276 ]
277
~\AppData\Local\conda\conda\envs\pymc3\lib\site-packages\pymc3\parallel_sampling.py in __init__(self, draws, tune, step_method, chain, seed, start)
180 draws, tune, seed)
181 # We fork right away, so that the main process can start tqdm threads
--> 182 self._process.start()
183
184 @property
~\AppData\Local\conda\conda\envs\pymc3\lib\multiprocessing\process.py in start(self)
103 'daemonic processes are not allowed to have children'
104 _cleanup()
--> 105 self._popen = self._Popen(self)
106 self._sentinel = self._popen.sentinel
107 # Avoid a refcycle if the target function holds an indirect
~\AppData\Local\conda\conda\envs\pymc3\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
--> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):
~\AppData\Local\conda\conda\envs\pymc3\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
--> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):
~\AppData\Local\conda\conda\envs\pymc3\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
63 try:
64 reduction.dump(prep_data, to_child)
---> 65 reduction.dump(process_obj, to_child)
66 finally:
67 set_spawning_popen(None)
~\AppData\Local\conda\conda\envs\pymc3\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #
BrokenPipeError: [Errno 32] Broken pipe
I'm pretty sure that is not the same issue as the original (which is windows related). About the original bug: I've been trying to reproduce this on my own machine for some time, but so far I haven't managed to do that. This makes it rather hard to fix.
@Jeff-Winchell About your problem:
I'd guess that this might be gpu related. Does it also happen if you use the cpu? Using a gpu for a problem like that doesn't make the slightest bit of sense by the way.
As a general note: Before starting to write strange emails when you don't get a reply to a bug report, it could help to do a bit more work yourself first:
Have you run the very short code example I gave and replicated the bug? If you have, its not clear why most of your post was written. If you haven't run it, it's unclear why any of your post was written.
I was frankly taken aback by your post, but maybe you don't see why. I'm a software engineer, not a hacker. My teachers (and LinkedIn connections) include Ward Cunningham, Bertrand Meyer, Meilir Page-Jones, Gerry Weinberg, James Bach, Andy Hunt. I don't ship code to production code with known bugs in it. Ever.
If you can't replicate the bug I'd be happy to help come up with ideas why not. Otherwise, it's unproductive.
@Jeff-Winchell have you try running the suggestion by @aseyboldt? These are all valid suggestions, what would be productive is that you try to follow these suggestions first. Also, name-dropping is not a valid way to have a productive conversation.
We do not appreciate these hostile attitudes towards our developers/users, if you keep doing this (either privately or publicly) I will have to block and report you according to our community guidelines.
The first message to me was more hostile than my response was. Different people have different ideas about name dropping. So I guess you can ban me for saying my address is [email protected].
What else was hostile besides making it clear that I know a lot more than the first poster assumed I did when asking me to do a bunch of things that aren't useful?
FYI, none of those names I mentioned would even DREAM of banning someone for posting the message I did.
So go ahead and block me. The mere threat you made about doing so, so frivolously makes me want to challenge bullies publicly, just like they challenge me.
Related discourse thread
I looked at that thread. If I move ONLY the pymc3.sample function into a if __name__=='__main__' block AND I make sure my GPU is globally turned off, then it won't crash. As I ran into the same problem with some other code that uses the NUTS sampler, I saw that the same workaround corrects that.
However, disabling the GPU globally is not a great solutions, so the GPU problem needs to be fixed, and I don't know how more complex code can be managed with the if __name__ workaround. The real solution is to change the pycm3/theano/whatever code so that it executes under both LINUX and Windows instead of only worrying about Linux and ignoring the most widely used OS from the company with the largest market capitalization in the world.
The main problem is that the broken pipe error is not helpful for debugging. We have seen that the broken pipe is raised by the main process. When it tries to spawn the worker pool that should do the sampling, the workers raise exceptions before they have spawned and were created, so they don't manage to communicate their failure to the main process, and once the main process tries to communicate with the pool, it finds the communication pipe broken. The main issue that we are focusing to fix first is to capture the exceptions raised during the spawning of the worker pool. These exceptions are the keys to debug the sources of the failures. Some of them were caused by the lack of the if name main block, and others were caused because of functions that were not pickleable. Once we sort that out, we will be able to help better with whatever is happening because of the GPU.
Following commit 98fd63e18179ffb28734c73c459ccdaf04121b92, I ran again the script that kept failing under Windows. The script under test is:
import pymc3 as pm
print(pm.__version__)
import theano.tensor as tt
import theano
print(theano.__version__)
import patsy
import pandas as pd
import numpy as np
SEED = 20180727
df = pd.read_csv(r'https://gist.githubusercontent.com/JackCaster/d74b36a66c172e80d1bdcee61d6975bf/raw/a2aab8690af7cebbe39ec5e5b425fe9a9b9a674d/data.csv',
dtype={'Y':'category'})
_, X = patsy.dmatrices('Y ~ 1 + X1 + X2', data=df)
# Number of categories
n_cat = df.Y.cat.categories.size
# Number of predictors
n_pred = X.shape[1]
with pm.Model() as model:
## `p`--quantity that I want to model--needs to have size (n_obs, n_cat).
## Because `X` has size (n_obs, n_pred), then `beta` needs to have size (n_pred, n_cat)
# priors for categories 1-2, excluding reference category 0 which is set to zero below (see DBDA2 p. 651 for explanation).
beta_ = pm.Normal('beta_', mu=0, sd=50, shape=(n_pred, n_cat-1))
# add prior values zero for reference category 0. (add a column)
beta = pm.Deterministic('beta', tt.concatenate([tt.zeros((n_pred, 1)), beta_], axis=1))
# The softmax function will squash the values in the range 0-1
p = tt.nnet.softmax(tt.dot(np.asarray(X), beta))
likelihood = pm.Categorical('likelihood', p=p, observed=df.Y.cat.codes.values)
trace = pm.sample(chains=2, cores=2)
print('DONE')
Unfortunately, the sampling still fails with cores > 1 (pymc3 v. 3.6, theano v. 1.0.3). The jupyter kernel shuts down as soon as the sampling begins:
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [beta_]
Sampling 2 chains: 0%| | 0/2000 [00:00<?, ?draws/s]
The traceback, which points to a compilation error, was:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\compile\function_module.py", line 1082, in _constructor_Function
f = maker.create(input_storage, trustme=True)
File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\compile\function_module.py", line 1715, in create
input_storage=input_storage_lists, storage_map=storage_map)
File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\link.py", line 699, in make_thunk
storage_map=storage_map)[:3]
File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\vm.py", line 1091, in make_all
impl=impl))
File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\op.py", line 955, in make_thunk
no_recycling)
File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\op.py", line 858, in make_c_thunk
output_storage=node_output_storage)
File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cc.py", line 1217, in make_thunk
keep_lock=keep_lock)
File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cc.py", line 1157, in __compile__
keep_lock=keep_lock)
File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cc.py", line 1620, in cthunk_factory
key=key, lnk=self, keep_lock=keep_lock)
File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cmodule.py", line 1181, in module_from_key
module = lnk.compile_cmodule(location)
File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cc.py", line 1523, in compile_cmodule
preargs=preargs)
File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cmodule.py", line 2391, in compile_str
(status, compile_stderr.replace('\n', '. ')))
Exception: ('The following error happened while compiling the node', InplaceDimShuffle{1,0}(Softmax.0), '\n', 'Compilation failed (return status=3): ', '[InplaceDimShuffle{1,0}(<TensorType(float64, matrix)>)]')
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFE414794C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFE79672763 Unknown Unknown Unknown
KERNEL32.DLL 00007FFE7ABD7E94 Unknown Unknown Unknown
ntdll.dll 00007FFE7D2CA251 Unknown Unknown Unknown
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFE414794C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFE79672763 Unknown Unknown Unknown
KERNEL32.DLL 00007FFE7ABD7E94 Unknown Unknown Unknown
ntdll.dll 00007FFE7D2CA251 Unknown Unknown Unknown
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFE414794C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFE79672763 Unknown Unknown Unknown
KERNEL32.DLL 00007FFE7ABD7E94 Unknown Unknown Unknown
ntdll.dll 00007FFE7D2CA251 Unknown Unknown Unknown
[I 18:43:13.302 NotebookApp] Interrupted...
[I 18:43:13.303 NotebookApp] Shutting down 1 kernel
[I 18:43:13.403 NotebookApp] Kernel shutdown: f6d274f4-ffbf-428a-a996-751cd821bd4a
The temporary, compiled C code reports in the last line
Problem occurred during compilation with the command line below:
"C:\Miniconda3\envs\intro_to_pymc3\Library\mingw-w64\bin\g++.exe" -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -march=haswell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=haswell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -DMS_WIN64 -I"C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\numpy\core\include" -I"C:\Miniconda3\envs\intro_to_pymc3\include" -I"C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\c_code" -L"C:\Miniconda3\envs\intro_to_pymc3\libs" -L"C:\Miniconda3\envs\intro_to_pymc3" -o "C:\Users\moran\AppData\Local\Theano\compiledir_Windows-10-10.0.17763-SP0-Intel64_Family_6_Model_69_Stepping_1_GenuineIntel-3.6.6-64\tmpujapb2d5\m885ff006a95d626dac547a7bdfdb471bbf058622ece2b4435e42316c4012ea56.pyd" "C:\Users\moran\AppData\Local\Theano\compiledir_Windows-10-10.0.17763-SP0-Intel64_Family_6_Model_69_Stepping_1_GenuineIntel-3.6.6-64\tmpujapb2d5\mod.cpp" -lpython36
Does this shed more light on this matter?
EDIT: I also confirmed (as suggested by @elfwired) that setting theano.config.mode = 'FAST_COMPILE' allows to run the sampler successfully---but the sampling becomes very slow. I tried to fiddle with theano.config.mode, theano.config.optimizer, and theano.config.linker without much success.
This looks like a Theano problem, can you open an issue there? It looks very archaic to me.
This looks like a Theano problem, can you open an issue there? It looks very archaic to me.
Done, let's see 馃
EDIT: Just a note. When there is a compilation error, the traceback points to the temporary C code. At the end of that code, there is a line saying:
Problem occurred during compilation with the command line below:
"C:\Miniconda3\envs\intro_to_pymc3\Library\mingw-w64\bin\g++.exe" -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -march=haswell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=haswell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -DMS_WIN64 -I"C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\numpy\core\include" -I"C:\Miniconda3\envs\intro_to_pymc3\include" -I"C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\c_code" -L"C:\Miniconda3\envs\intro_to_pymc3\libs" -L"C:\Miniconda3\envs\intro_to_pymc3" -o "C:\Users\moran\AppData\Local\Theano\compiledir_Windows-10-10.0.17763-SP0-Intel64_Family_6_Model_69_Stepping_1_GenuineIntel-3.6.6-64\tmpujapb2d5\m885ff006a95d626dac547a7bdfdb471bbf058622ece2b4435e42316c4012ea56.pyd" "C:\Users\moran\AppData\Local\Theano\compiledir_Windows-10-10.0.17763-SP0-Intel64_Family_6_Model_69_Stepping_1_GenuineIntel-3.6.6-64\tmpujapb2d5\mod.cpp" -lpython36
I tried to run the command post-mortem, but the temp folder ...\tmpujapb2d5\... that does not exist (but a bunch of others do). I am wondering if there is a problem on how the multiprocessing pool is instantiated.
I got similar error for this snippet in MCMC application
with pm.Model() as sleep_model:
# Create the alpha and beta parameters
alpha = pm.Normal('alpha', mu=0.0, tau=0.01, testval=0.0)
beta = pm.Normal('beta', mu=0.0, tau=0.01, testval=0.0)
# Create the probability from the logistic function
p = pm.Deterministic('p', 1. / (1. + tt.exp(beta * time + alpha)))
# Create the bernoulli parameter which uses the observed dat
observed = pm.Bernoulli('obs', p, observed=sleep_obs)
# Starting values are found through Maximum A Posterior estimation
# start = pm.find_MAP()
# Using Metropolis Hastings Sampling
step = pm.Metropolis()
# Sample from the posterior using the sampling method
#sleep_trace = pm.sample(N_SAMPLES, step=step, njobs=2);
sleep_trace = pm.sample(N_SAMPLES, step=step);
Error message:
Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
>Metropolis: [beta]
>Metropolis: [alpha]
BrokenPipeError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py in __init__(self, draws, tune, step_method, chain, seed, start)
241 try:
--> 242 self._process.start()
243 except IOError as e:
C:\ProgramData\Anaconda3\lib\multiprocessing\process.py in start(self)
111 _cleanup()
--> 112 self._popen = self._Popen(self)
113 self._sentinel = self._popen.sentinel
C:\ProgramData\Anaconda3\lib\multiprocessing\context.py in _Popen(process_obj)
222 def _Popen(process_obj):
--> 223 return _default_context.get_context().Process._Popen(process_obj)
224
C:\ProgramData\Anaconda3\lib\multiprocessing\context.py in _Popen(process_obj)
321 from .popen_spawn_win32 import Popen
--> 322 return Popen(process_obj)
323
C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
88 reduction.dump(prep_data, to_child)
---> 89 reduction.dump(process_obj, to_child)
90 finally:
C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
BrokenPipeError: [Errno 32] Broken pipe
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
<ipython-input-26-4ad3b5446758> in <module>
18 # Sample from the posterior using the sampling method
19 #sleep_trace = pm.sample(N_SAMPLES, step=step, njobs=2);
---> 20 sleep_trace = pm.sample(N_SAMPLES, step=step);
C:\ProgramData\Anaconda3\lib\site-packages\pymc3\sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, **kwargs)
435 _print_step_hierarchy(step)
436 try:
--> 437 trace = _mp_sample(**sample_args)
438 except pickle.PickleError:
439 _log.warning("Could not pickle model, sampling singlethreaded.")
C:\ProgramData\Anaconda3\lib\site-packages\pymc3\sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, **kwargs)
963 sampler = ps.ParallelSampler(
964 draws, tune, chains, cores, random_seed, start, step,
--> 965 chain, progressbar)
966 try:
967 try:
C:\ProgramData\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py in __init__(self, draws, tune, chains, cores, seeds, start_points, step_method, start_chain_num, progressbar)
359 draws, tune, step_method, chain + start_chain_num, seed, start
360 )
--> 361 for chain, seed, start in zip(range(chains), seeds, start_points)
362 ]
363
C:\ProgramData\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py in <listcomp>(.0)
359 draws, tune, step_method, chain + start_chain_num, seed, start
360 )
--> 361 for chain, seed, start in zip(range(chains), seeds, start_points)
362 ]
363
C:\ProgramData\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py in __init__(self, draws, tune, step_method, chain, seed, start)
249 # all its error message
250 time.sleep(0.2)
--> 251 raise exc
252 raise
253
RuntimeError: The communication pipe between the main process and its spawned children is broken.
In Windows OS, this usually means that the child process raised an exception while it was being spawned, before it was setup to communicate to the main process.
The exceptions raised by the child process while spawning cannot be caught or handled from the main process, and when running from an IPython or jupyter notebook interactive kernel, the child's exception and traceback appears to be lost.
A known way to see the child's error, and try to fix or handle it, is to run the problematic code as a batch script from a system's Command Prompt. The child's exception will be printed to the Command Promt's stderr, and it should be visible above this error and traceback.
Note that if running a jupyter notebook that was invoked from a Command Prompt, the child's exception should have been printed to the Command Prompt on which the notebook is running.
Running on Windows 10 with latest packages of everything.
*same thing for me (windows 10, spyder, installed through anaconda)
setting cores=1 in pm.sample() runs fine
*
Multiprocess sampling (4 chains in 4 jobs)
BinaryGibbsMetropolis: [rain, sprinkler]
Traceback (most recent call last):
File "
trace = pm.sample(20000, step=[pm.BinaryGibbsMetropolis([rain, sprinkler])], tune=tune, random_seed=124)
File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\sampling.py", line 437, in sample
trace = _mp_sample(**sample_args)
File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\sampling.py", line 965, in _mp_sample
chain, progressbar)
File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py", line 361, in __init__
for chain, seed, start in zip(range(chains), seeds, start_points)
File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py", line 361, in
for chain, seed, start in zip(range(chains), seeds, start_points)
File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py", line 251, in __init__
raise exc
RuntimeError: The communication pipe between the main process and its spawned children is broken.
In Windows OS, this usually means that the child process raised an exception while it was being spawned, before it was setup to communicate to the main process.
The exceptions raised by the child process while spawning cannot be caught or handled from the main process, and when running from an IPython or jupyter notebook interactive kernel, the child's exception and traceback appears to be lost.
A known way to see the child's error, and try to fix or handle it, is to run the problematic code as a batch script from a system's Command Prompt. The child's exception will be printed to the Command Promt's stderr, and it should be visible above this error and traceback.
Note that if running a jupyter notebook that was invoked from a Command Prompt, the child's exception should have been printed to the Command Prompt on which the notebook is running.
Same for me. Windows 10. cores=1 works fine. Theano with cuda.
I used vscode to write the code, but ran it via cmd.
I am just getting into pymc and was following along the code on Osvaldo Martin's book.
This was the code I tried.
import numpy as np
from scipy import stats
import pymc3 as pm
np.random.seed(123)
if __name__ == "__main__":
trials = 4
theta_real = 0.35
data = stats.bernoulli.rvs(p=theta_real, size=trials)
with pm.Model() as our_first_model:
theta = pm.Beta("theta", alpha=1., beta=1.)
y = pm.Bernoulli("y", p=theta, observed=data)
trace = pm.sample(1000, random_seed=123)
The following is the trace
Traceback (most recent call last):
File "test.py", line 16, in <module>
trace = pm.sample(1000, random_seed=123)
File "C:\Anaconda\lib\site-packages\pymc3\sampling.py", line 437, in sample
trace = _mp_sample(**sample_args)
File "C:\Anaconda\lib\site-packages\pymc3\sampling.py", line 965, in _mp_sample
chain, progressbar)
File "C:\Anaconda\lib\site-packages\pymc3\parallel_sampling.py", line 361, in __init__
for chain, seed, start in zip(range(chains), seeds, start_points)
File "C:\Anaconda\lib\site-packages\pymc3\parallel_sampling.py", line 361, in <listcomp>
for chain, seed, start in zip(range(chains), seeds, start_points)
File "C:\Anaconda\lib\site-packages\pymc3\parallel_sampling.py", line 251, in __init__
raise exc
RuntimeError: The communication pipe between the main process and its spawned children is broken.
In Windows OS, this usually means that the child process raised an exception while it was being spawned, before it was setup to communicate to the main process.
The exceptions raised by the child process while spawning cannot be caught or handled from the main process, and when running from an IPython or jupyter notebook interactive kernel, the child's exception and traceback appears to be lost.
A known way to see the child's error, and try to fix or handle it, is to run the problematic code as a batch script from a system's Command Prompt. The child's exception will be printed to the Command Promt's stderr, and it should be visible above this error and traceback.
Note that if running a jupyter notebook that was invoked from a Command Prompt, the child's exception should have been printed to the Command Prompt on which the notebook is running.
I am facing the same issue on a Debian machine. In particular the default ones on Google Dataproc https://cloud.google.com/compute/docs/images#debian 1.5-debian.
Setting one of:
os.environ['MKL_THREADING_LAYER'] = 'sequential'
os.environ['OMP_NUM_THREADS'] = '1'
allowed me to make the thing run but I suspect this is preventing me to scale things up. Indeed I noticed single chains appear to use just one cpu each. Is this a known issue for certain linux distributions? Is there a linux distro where multiprocessing is know to work well?
Most helpful comment
@Jeff-Winchell have you try running the suggestion by @aseyboldt? These are all valid suggestions, what would be productive is that you try to follow these suggestions first. Also, name-dropping is not a valid way to have a productive conversation.
We do not appreciate these hostile attitudes towards our developers/users, if you keep doing this (either privately or publicly) I will have to block and report you according to our community guidelines.