Stockfish 11 64 (installed from official packages on openSUSE Tumbleweed) crashes after these commands:
setoption name Threads value 1
eval
Note: value can by any number, it still crashes. Other commands, like "d" or "go" doesn't cause SIGSEGV.
confirmed on master...
Thread 1 "stockfish" received signal SIGSEGV, Segmentation fault.
0x000055555556b9d9 in Eval::trace[abi:cxx11](Position const&) (pos=...) at evaluate.cpp:859
859 pos.this_thread()->contempt = SCORE_ZERO; // Reset any dynamic contempt
(gdb) bt
#0 0x000055555556b9d9 in Eval::trace[abi:cxx11](Position const&) (pos=...) at evaluate.cpp:859
#1 0x00005555555a92dd in UCI::loop (argc=1, argv=0x7fffffffde18) at uci.cpp:241
#2 0x00005555555773f2 in main (argc=1, argv=0x7fffffffde18) at main.cpp:49
as a workaround, you need to specify a position after setting the number of threads:
setoption name Threads value 1
position startpos
eval
This is triggering a corner case on how eval is working.
@vondele This happens because we're destroying and recreating new threads in ThreadPool::set(), which invalidates all pointers to the just destroyed threads (also the MainThread), right?
While this seems no problem when using a GUI, a GUI will probably always send a new position command anyways before proceeding, it doesn't look like desired from a programmer's point-of-view.
So it looks like uiThread, which was removed in 12d58adc68b1aa084d383d06bc47abbb3495ce3e (#2299), did not just exist to work around data races.
I hope we can find away to fix this that does not involve reverting to create a dummy thread.
Well, I'm no expert, but we only destroy the main thread in ThreadPool::set() because we want to get rid of the binding.
If I understand correctly, there is a simple method in Texel in numa.cpp to do this.
Numa::disable() {
threadToNode.clear();
}
Maybe we can do something similar for Stockfish which would make deleting the main thread obsolete.
I think part of the issue is that we call eval from the ui thread, which we should not do, it should be done from a search thread IMO. Similarly, the pos should be setup cleanly, so that it is bound the an existing thread. This is all not so trivial, so if anybody feels like trying to come up with a patch ....
Meanwhile, I don't think this is a thing that needs urgent fixing, it really is a corner case.
Maybe something like this https://github.com/official-stockfish/Stockfish/pull/540 might be helpful?
Unfortunately, last time this was discussed it seemed to be a noticeable slowdown for some ...
that's definitely a useful pointer. However, I also recall that thread local storage might be slow on windows... might be worth retesting. It think it would really require to run eval on a search thread however (but I also need to look a little more carefully).
If this only happens for "eval", which is not a UCI command, is there a problem? If it really needs to be fixed, let eval execute a "position" command?
(Sorry if I am missing something obvious.)
will be fixed once https://github.com/official-stockfish/Stockfish/pull/2877 is merged into master
No crash after "Add NNUE evaluation", looks fixed.
it is, thanks for reminding.
Most helpful comment
I think part of the issue is that we call eval from the ui thread, which we should not do, it should be done from a search thread IMO. Similarly, the pos should be setup cleanly, so that it is bound the an existing thread. This is all not so trivial, so if anybody feels like trying to come up with a patch ....
Meanwhile, I don't think this is a thing that needs urgent fixing, it really is a corner case.