Stockfish hang in TCEC S16 Division P Game 51

Created on 9 Sep 2019  ·  234Comments  ·  Source: official-stockfish/Stockfish

In the TCEC Archive here

Reproduces with a few seconds to move for me:

Engine master(1) failed to respond to ping
Terminating process of engine master(1)
Warning: Unexpected result claim from master: 1/2-1/2 {Draw by stalled connection}
Engine master(0) failed to respond to ping
Terminating process of engine master(0)

Most helpful comment

I also support that No, because it was not necessary.
I reported the problem on Jul 10th and got very little / no support.

All 234 comments

I used this as the opening book:

[FEN "8/4kp2/2Bp1p2/b1pP1P1p/P1P4P/6P1/8/6K1 b - - 19 57"]
[Result "1/2-1/2"]

1/2-1/2

can you give the precise sequence of uci commands to reproduce the hang + info on the system you run on.

I entered that FEN (but without any "Result") and SF_19082608 is up to d=82, 9.5 billion nodes, and no crash yet.

Apologies, stall was due to bad memory setting by me :(

right now, I see no issues with this particular fen... will need more info to see if this is an issue / what it is.

When it happened at TCEC I would guess it was a hang, there was a msg from cutechess terminating the engine process. It had just under 17 minutes to use, but the longest recent thinks were around 400 and 300 seconds, depths 86/94 and 90/106, 33bn and 22bn nodes.

well, recent multi-threaded tests are all crash free, for 100000s of games. So if it is a real issue, will be hard to find.

Sure. Let's wait to see what TCEC say ...

Perhaps an issue linked to low memory available in comparison with high multithread needs ?

Edit : specially breadcrumbs ? don't know how many memory they need

breadcrumbs use 16K memory, independent of the number of threads ... generally threads don't need much memory (mostly the history tables). It is also no reason to hang.

Just curious, how did TCEC know it was hung as the result appears to be adjudicated on TCEC and did the TCEC stop play before the time on the clock was used up. If it was hung, it would have lost by time forfeit eventually ( I would think), and there would have been no need to adjudicated the result. So I am curious as to why the result was adjudicated, and if they adjudicated the result before time expired?

It ran out of time.

Edit: and was then unresponsive to cutechess so cutechess killed it.

That was unfortunate as it was a dead draw at that point.

No issue here on my machine ( slightly modified)

dep score   nodes   time    (not shown:  tbhits knps    seldep)
100  -0.26  21.7G   11:25.26    Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Bc6 Kh6 Kf3 Ba5 g4 hxg4+ Kxg4 Kg7 Kg3 Be1+ Kg2 Bc3 Bb5 Bd2 Kf3 Bc3 Bd7 Kf8 Ke2 Kg7 Bc8 Kh6 Kf3 Kh5 Kg3 Be1+ Kh3 Kh6 Ba6 Ba5 Kg4 Kg7 Kg3 Be1+ Kf3 Bc3 B 
 99  -0.26  19.6G   10:16.29    Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Bc6 Kh6 Kf3 Ba5 g4 hxg4+ Kxg4 Kg7 Kg3 Be1+ Kg2 Bc3 Bb5 Bd2 Kf3 Bc3 Bd7 Kf8 Ke2 Kg7 Bc8 Kh6 Kf3 Kh5 Kg3 Be1+ Kh3 Kh6 Ba6 Ba5 Kg4 Kg7 Kg3 Be1+ Kf3 Bc3 B 
 98  -0.26  19.5G   10:11.33    Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Bc6 Kh6 Kf3 Ba5 g4 hxg4+ Kxg4 Kg7 Kg3 Be1+ Kg2 Bc3 Bb5 Bd2 Kf3 Bc3 Bd7 Kf8 Bc8 Bd2 Kg4 Ke7 Kh5 Kf8 Bd7 Kg7 Bc6 Bc3 Be8 Be1 Kg4 Bd2 Kh3 Be1 Bd7 Kh6 Be6 
 97  -0.26  18.8G   9:50.48 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Bc6 Kh6 Kf3 Ba5 g4 hxg4+ Kxg4 Kg7 Kg3 Be1+ Kg2 Bc3 Bb5 Bd2 Kf3 Bc3 Bd7 Kf8 Bc8 Bd2 Kg4 Ke7 Kh5 Kf8 Bd7 Kg7 Bc6 Bc3 Be8 Be1 Kg4 Bd2 Kh3 Be1 Bd7 Kh6 Be6  
 96  -0.26  17.9G   9:23.04 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Bc6 Kh6 Kf3 Ba5 g4 hxg4+ Kxg4 Kg7 Kg3 Be1+ Kg2 Bc3 Bb5 Bd2 Kf3 Bc3 Bd7 Kf8 Bc8 Bd2 Kg4 Ke7 Kh5 Kf8 Bd7 Kg7 Be8 Kf8 Bc6 Kg8 Bb7 Be1 Bc8 Kf8 Bd7 Ke7 Bb5  
 95  -0.26  16.7G   8:44.15 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bd7 Ke7 Bb5 Kf8 Ke3 Be1 Kf3 Kg7 g4 hxg4+ Kxg4 Bc3 Kh3 Be1 Bd7 Kh6 Be8 Kg7 Kg2 Kf8 Bb5 Kg7 Bd7 Bd2 Bc8 Be1 Bb7 Ba5 Kf3 Bd2 Kg4 Kh6 Bc8 B 
 94  -0.26  8.93G   4:34.86 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bd7 Ke7 Bb5 Kf8 Kf3 Ke7 g4 hxg4+ Kxg4 Bc3 Kf4 Be1 Bc6 Kf8 Kg4 Ke7 Kf3 Bc3 Kf4 Be1 Kg4 Bd2 Kh5 Kf8 Bb5 Ke7 a5 Bxa5 Kh6 Be1 h5 Kf8 Kh7 Bd 
 93  -0.26  6.47G   3:17.75 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bd7 Ke7 Bb5 Kf8 Kf3 Ke7 g4 hxg4+ Kxg4 Bc3 Kf4 Be1 Bc6 Kf8 Kg4 Ke7 Kf3 Bc3 Kf4 Be1 Kg4 Bd2 Kh5 Kf8 Bb5 Be1 Bd7 Kg7 Bc6 Kf8 Bb7 Kg7 Kg4 B 
 92  -0.26  5.96G   3:02.11 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bd7 Ke7 Bb5 Kf8 Kf3 Ke7 g4 hxg4+ Kxg4 Bc3 Kf4 Be1 Bc6 Kf8 Kg4 Ke7 Kh5 Kf8 Kh6 Kg8 Be8 Bd2+ Kh5 Kg7 Kg4 Bc3 Kg3 Be1+ Kh3 Ba5 Bd7 Bd2 Kg4 
 91  -0.26  5.05G   2:34.12 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bd7 Ke7 Bb5 Kf8 Kf3 Ke7 g4 hxg4+ Kxg4 Bc3 Kf4 Be1 Bc6 Kf8 Kg4 Ke7 Kf3 Kf8 Bd7 Kg7 Kf4 Bd2+ Kg4 Kf8 Kf3 Be1 Kg2 Kg7 Bb5 Bc3 Kf2 Bd2 Ke2  
 90  -0.26  3.83G   1:56.73 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bd7 Ke7 Bb5 Kf8 Kf3 Ke7 g4 hxg4+ Kxg4 Bc3 Kf4 Be1 Bc6 Kf8 Kg4 Ke7 Kh5 Kf8 Bb5 Bd2 Kg4 Ke7 Kg3 Be1+ Kh3 Kd8 Bc6 Ke7 Kg4 Bc3 Kf3 Be1 Ke2  
 89  -0.26  3.69G   1:52.41 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bd7 Ke7 Bb5 Kf8 Kf3 Ke7 g4 hxg4+ Kxg4 Bc3 Kf4 Be1 Bc6 Kf8 Kg4 Ke7 Kh5 Kf8 Kh6 Kg8 Bd7 Bb4 Kh5 Kg7 Kg4 Bc3 Be8 Bd2 Bc6 Be1 Kf3 Bc3 Ke4 B 
 88  -0.26  3.33G   1:41.81 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bd7 Ke7 Bb5 Kf8 Kf3 Ke7 g4 hxg4+ Kxg4 Bc3 Kf4 Be1 Bc6 Kf8 Kg4 Ke7 Kh5 Kf8 Kh6 Kg8 Bd7 Ba5 Kh5 Kg7 Bb5 Kh7 Kg4 Kg7 Kf3 Be1 Ke2 Bb4 Kf2 K 
 87  -0.26  3.15G   1:36.58 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bd7 Ke7 Bb5 Kf8 Kf3 Ke7 g4 hxg4+ Kxg4 Bc3 Kf4 Be1 Bc6 Kf8 Kg4 Ke7 Kh5 Kf8 Kh6 Kg8 Bd7 Bd2+ Kh5 Kg7 Bc8 Kf8 Kg4 Ke7 Kf3 Kd8 Bb7 Ke7 Bc6  
 86  -0.26  2.85G   1:27.58 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bd7 Ke7 Bb5 Kf8 Kf3 Ke7 g4 hxg4+ Kxg4 Bc3 Kf4 Be1 Bc6 Kf8 Kg4 Ke7 Kh5 Kf8 Kh6 Kg8 Bd7 Bd2+ Kh5 Kg7 Bc8 Kf8 Kg4 Ke7 Kf3 Kd8 Bb7 Ke7 Ke2  
 85  -0.26  2.81G   1:26.25 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bd7 Ke7 Bb5 Kf8 Kf3 Ke7 g4 hxg4+ Kxg4 Bc3 Kf4 Be1 Bc6 Kf8 Kg4 Ke7 Kh5 Kf8 Kh6 Kg8 Bd7 Bd2+ Kh5 Kg7 Bc8 Kf8 Kg4 Ke7 Kf3 Kd8 Bb7 Ke7 Ke2  
 84  -0.26  2.79G   1:25.71 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bd7 Ke7 Bb5 Kf8 Kf3 Ke7 g4 hxg4+ Kxg4 Bc3 Kf4 Be1 Bc6 Kf8 Kg4 Ke7 Kh5 Kf8 Kh6 Kg8 Bd7 Bd2+ Kh5 Kg7 Bc8 Kf8 Kg4 Ke7 Kf3 Kd8 Bb7 Ke7 Ke2  
 83  -0.26  2.79G   1:25.57 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bd7 Ke7 Bb5 Kf8 Kf3 Ke7 g4 hxg4+ Kxg4 Bc3 Kf4 Be1 Bc6 Kf8 Kg4 Ke7 Kh5 Kf8 Kh6 Kg8 Bd7 Bd2+ Kh5 Kg7 Bc8 Kf8 Kg4 Ke7 Kf3 Kd8 Bb7 Ke7 Ke2  
 82  -0.26  2.78G   1:25.34 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bd7 Ke7 Bb5 Kf8 Kf3 Ke7 g4 hxg4+ Kxg4 Bc3 Kf4 Be1 Bc6 Kf8 Kg4 Ke7 Kh5 Kf8 Kh6 Kg8 Bd7 Bd2+ Kh5 Kg7 Bc8 Kf8 Kg4 Ke7 Kf3 Kd8 Bb7 Ke7 Ke2  
 81  -0.26  2.77G   1:25.18 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bb5 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Bd7 Bc3 Kh5 Kf8 Bc6 Kg7 Bb5 Bd2 Ba6 Be1 Bc8 Bd2 Bb7 Bb4 Ba8 Bd2 Kg4 Kh6 Bb7 Kg7 Kf3 Ba5 Kg3 Be1+ Kg2  
 80  -0.26  1.17G   0:36.15 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bb5 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Bd7 Bc3 Bc6 Ba5 Kg3 Be1+ Kf3 Ba5 Bb5 Be1 Bd7 Bd2 Ke4 Ba5 Kf4 Be1 Be6 Bb4 Kg4 Be1 Bc8 Bd2 Bb7 Bc3 Kf3  
 79  -0.26  1.08G   0:33.38 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bb5 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Bd7 Bc3 Bc6 Ba5 Kh3 Bc3 Bb5 Ba5 Kg3 Be1+ Kf3 Bc3 Ke4 Bd2 Kd3 Be1 Ke2 Bc3 Kf3 Be1 Ba6 Ba5 Kg4 Be1 Bc8  
 78  -0.26  1.02G   0:31.62 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bb5 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Bd7 Bc3 Bc6 Ba5 Kh3 Bc3 Bb5 Ba5 Bd7 Bd2 Be8 Be1 Kg2 Ba5 Kf2 Bc3 Ke3 Be1 Kd3 Ba5 Kc2 Kf8 Bc6 Be1 Bd7 B 
 77  -0.26  994.9M  0:30.78 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bb5 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Be8 Bc3 Bd7 Bd2 Kh5 Be1 Be6 Bd2 Bc8 Be1 Bb7 Bc3 Kg4 Be1 Kf3 Bd2 Bc6 Bc3 Ke3 Be1 Ke2 Ba5 Be8 Bc3 Bb5  
 76  -0.26  725.4M  0:22.10 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bb5 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Kf3 Be1 Ke3 Ba5 Bd7 Bc3 Bc6 Be1 Kd3 Kh6 Ke2 Bc3 Kf3 Kh5 Kg3 Be1+ Kh3 Kh6 Be8 Kg7 Kg4 Bc3 Kh5 Bb4 Bb5 
 75  -0.26  687.3M  0:20.81 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bb5 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Kf3 Be1 Ke3 Ba5 Bd7 Bc3 Bc6 Be1 Be8 Bc3 Ke2 Bb4 Bb5 Ba5 Ke3 Be1 Kd3 Ba5 Be8 Be1 Bd7 Ba5 Be6 Kf8 Ke2  
 74  -0.26  654.8M  0:19.77 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bb5 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Kf3 Be1 Ke3 Ba5 Bd7 Bc3 Bc6 Be1 Be8 Bc3 Ke2 Bb4 Bb5 Ba5 Ke3 Be1 Kd3 Ba5 Be8 Be1 Bd7 Ba5 Be6 Kf8 Ke2  
 73  -0.26  645.2M  0:19.46 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bb5 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Be8 Be1 Kf4 Bd2+ Ke4 Be1 Ke3 Bc3 Ke2 Bb4 Bb5 Ba5 Ke3 Be1 Kd3 Ba5 Ke2 Bb4 Kf3 Bc3 Bc6 Bb4 Kg4 Be1 Be8 
 72  -0.26  617.1M  0:18.59 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bb5 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Kf3 Kf8 Ke2 Ba5 Ke3 Kg7 Bd7 Kh6 Kf2 Kg7 Bc6 Bd2 Bb7 Kf8 Kf3 Bc3 Bc6 Kg7 Kg4 Be1 Bb7 Bb4 Bc8 Be1 Kh5  
 71  -0.26  396.3M  0:11.38 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bb5 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Kh5 Ba5 Bc6 Bb4 Bb7 Bd2 Kg4 Be1 Kf3 Bb4 Ba6 Be1 Bc8 Ba5 Ke4 Bd2 Kd3 Be1 Ke2 Bc3 Bd7 Ba5 Bb5 Kf8 Ke3  
 70  -0.26  353.8M  0:10.12 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Be8 Be1 Kd3 Kf8 Ke2 Bb4 Bb5 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Kh5 Ba5 Bc6 Bb4 Bb7 Bd2 Kg4 Be1 Kf3 Bb4 Ba6 Be1 Bc8 Kh6 Ke2 Bb4 Kf2 Bd2 Bd7 Kh5 Kg3 Be1+ Kh3 Kh6 Be8 
 69  -0.26  345.1M  0:09.88 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke3 Be1 Bd7 Bb4 Kf3 Bd2 g4 hxg4+ Kxg4 Bc3 Kg3 Be1+ Kf3 Ba5 Bc6 Be1 Ke3 Kf8 Kd3 Kg7 Be8 Ba5 Ke4 Be1 Ke3 Kf8 Bc6 Ke7 Kf3 Ba5 Ke4 Bc3 Kf4 Ba5 Kf3 Be1 Kg4 Bd2 Kh3 Be1 Bb7 
 68  -0.26  263.7M  0:07.53 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Be8 Be1 Ke2 Bc3 Kf3 Bb4 g4 Kf8 Bb5 hxg4+ Kxg4 Kg7 Kf3 Bd2 Kg3 Be1+ Kh3 Ba5 Ba6 Be1 Kg4 Bd2 Kh5 Ba5 Bc8 Bc3 Kg4 Kh6 Bb7 Ba5 Kg3 Kh5 Kh3 Kh6 Bc8 Bd2 Bd7 Ba5 Kg4 Be1 Be8 
 67  -0.26  216.3M  0:06.12 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Be8 Be1 Ke2 Bc3 Kf3 Bb4 g4 Kf8 Bb5 hxg4+ Kxg4 Kg7 Kf3 Bd2 Ke2 Ba5 Bc6 Kh6 Kf2 Kh5 Kg3 Be1+ Kh3 Kh6 Bd7 Kg7 Kg4 Bc3 Kf3 Bd2 Ke2 Ba5 Ke3 Kh6 Kf2 Kh5 Kg3 Be1+ Kh3 Kh6 Bc 
 66  -0.26  212.0M  0:05.99 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Be8 Be1 Ke2 Bc3 Kf3 Bb4 Bc6 Bd2 g4 hxg4+ Kxg4 Kh6 Bd7 Ba5 Kf4 Kh5 Kg3 Be1+ Kh3 Kh6 Bb5 Kg7 Kg4 Bd2 Kf3 Be1 Ke2 Bc3 Bd7 Ba5 Bc6 Kh6 Kf2 Kh5 Kg3 Be1+ Kh3 Kh6 Bd7 Bd2 Be 
 65  -0.26  205.2M  0:05.79 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Be8 Be1 Ke2 Bc3 Kf3 Bb4 Bc6 Bd2 g4 hxg4+ Kxg4 Kh6 Bd7 Ba5 Be8 Kg7 Kf4 Bb4 Kf3 Bd2 Bb5 Kh6 Ke2 Bb4 Kf2 Kh5 Kg3 Be1+ Kh3 Kh6 Kg4 Ba5 Bd7 Bd2 Kg3 Be1+ Kg2 Kh5 Kh3 Kh6 Bb 
 64  -0.26  144.0M  0:04.07 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Bc6 Be1 Kf3 Kh6 g4 Ba5 Bb5 hxg4+ Kxg4 Be1 Bd7 Ba5 Be8 Kg7 Kf4 Bb4 Kf3 Bd2 Bd7 Be1 Kg4 Bc3 Bc6 Kh6 Kf4 Bd2+ Kf3 Kh5 Kg3 Be1+ Kh3 Kh6 Be8 Kg7 Bb5 Bc3 Kg3 Be1+ K 
 63  -0.26  117.5M  0:03.35 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Ke4 Ba5 Bc6 Be1 Kf3 Kh6 g4 Ba5 Bb5 hxg4+ Kxg4 Be1 Bd7 Ba5 Be8 Kg7 Kf4 Bc3 Ke4 Bb4 Bc6 Ba5 Kf3 Be1 Kg4 Kf8 Kh3 Kg7 Be8 Ba5 Bb5 Bd2 Kg4 Kf8 Kf3 Be1 Ke2 Bc3 Kd3 Be1 Bc6  
 62  -0.26  103.2M  0:02.96 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kc1 Bc3 Be8 Kf8 Kc2 Be1 Kd1 Bb4 Bb5 Ba5 Bc6 Bb4 Kc2 Be1 Kd3 Ke7 Ke2 Ba5 Kf2 Kf8 Bb5 Bc3 Bd7 Bd2 Ke2 Bc3 Kf3 Kg7 g4 hxg4+ Kxg4 Kh6 Kf3 Kh5 Kg3 Be1+ Kh3 
 61  -0.26  88.5M   0:02.53 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kc1 Bc3 Be8 Kf8 Kc2 Be1 Kd1 Bb4 Bb5 Ba5 Bc6 Bb4 Kc2 Be1 Kd3 Ke7 Ke2 Ba5 Kf3 Kf8 g4 hxg4+ Kxg4 Kg7 Kf4 Bc3 Ke4 Kh6 Kf3 Kh5 Kg3 Be1+ Kh3 Kh6 Be8 Kg7 Bd 
 60  -0.26  80.7M   0:02.31 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kc1 Bc3 Be8 Kf8 Kc2 Be1 Kd1 Bb4 Bb5 Ba5 Bc6 Bb4 Kc2 Be1 Kd3 Ke7 Ke2 Ba5 Kf3 Kf8 Bb5 Be1 g4 hxg4+ Kxg4 Bd2 Kf3 Kg7 Bc6 Bc3 Kg4 Bd2 Kg3 Be1+ Kh3 Bd2 Kg 
 59  -0.26  67.6M   0:01.93 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kc1 Bc3 Be8 Kf8 Kc2 Be1 Kd1 Bb4 Bb5 Ba5 Bc6 Bb4 Kc2 Be1 Kd3 Ke7 Ke2 Ba5 Kf3 Kf8 Bb5 Kg7 g4 hxg4+ Kxg4 Bd2 Be8 Be1 Kh5 Bd2 Bc6 Kf8 Kg4 Be1 Kf3 Bc3 Bd7 
 58  -0.26  63.6M   0:01.81 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kc1 Bc3 Be8 Kf8 Kc2 Be1 Kd1 Bb4 Bb5 Ba5 Bc6 Bb4 Kc2 Be1 Kd3 Ke7 Ke2 Ba5 Kf3 Kf8 Bb5 Kg7 g4 hxg4+ Kxg4 Bd2 Kf3 Be1 Ke3 Bc3 Bc6 Kh6 Kf3 Kg7 Kg4 Ba5 Kf4 
 57  -0.26  55.7M   0:01.56 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kc1 Bc3 Be8 Kf8 Kc2 Be1 Kd1 Bb4 Bb5 Ba5 Bc6 Bb4 Kc2 Be1 Kd3 Ke7 Ke2 Ba5 Kf3 Kf8 Bb5 Be1 g4 hxg4+ Kxg4 Bb4 Kf3 Be1 Bd7 Bb4 Ke2 Bc3 Kd3 Be1 Bc6 Bb4 Bb5 
 56  -0.26  48.0M   0:01.33 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kc1 Bc3 Be8 Kf8 Kc2 Be1 Kd1 Bb4 Bb5 Ba5 Bc6 Bb4 Kc2 Be1 Kd3 Ke7 Ke2 Bb4 Kf3 Kf8 Bb5 Be1 g4 hxg4+ Kxg4 Bb4 Bc6 Kg7 Kf3 Be1 Bb5 Kf8 Ke2 Ba5 Ke3 Kg7 Ke4 
 55  -0.26  45.9M   0:01.27 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kc1 Bc3 Be8 Kf8 Kc2 Be1 Kd1 Bb4 Bb5 Ba5 Bc6 Bb4 Kc2 Be1 Kd3 Ke7 Ke2 Bb4 Kf3 Kf8 Bb5 Be1 g4 hxg4+ Kxg4 Bb4 Bc6 Ba5 Kg3 Be1+ Kf3 Kg7 Kg4 Bb4 Be8 Kf8 Bb 
 54  -0.26  45.3M   0:01.25 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kc1 Bc3 Be8 Kf8 Kc2 Be1 Kd1 Bb4 Bb5 Ba5 Bc6 Bb4 Kc2 Be1 Kd3 Ke7 Ke2 Bb4 Kf3 Kf8 Bb5 Be1 g4 hxg4+ Kxg4 Kg7 Be8 Bb4 Kf3 Kf8 Bb5 Kg7 Kf2 Bd2 Bd7 Bc3 Bc6 
 53  -0.26  41.4M   0:01.14 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kc1 Bc3 Be8 Kf8 Kc2 Be1 Kd1 Bb4 Bb5 Ba5 Bc6 Bb4 Kc2 Be1 Kd3 Kg7 Bd7 Ba5 Ke4 Kf8 g4 hxg4 Kf4 Be1 Kxg4 Kg7 Be8 Ba5 Kf4 Kg8 Bc6 Kg7 Kf3 Be1 Ke2 
 52  -0.26  31.5M   0:00.86 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kc1 Bc3 Be8 Kf8 Kc2 Be1 Kd1 Bb4 Bb5 Ba5 Bc6 Bb4 Bb7 Kg7 Ke2 Bc3 Kf2 Bb4 Kf3 Bd2 g4 hxg4+ Kxg4 Be1 Bc6 Kh6 Bb5 Ba5 Kf3 Bc3 Be8 Kg7 Kf4 
 51  -0.26  24.7M   0:00.66 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kc1 Bc3 Be8 Kf8 Kc2 Be1 Kd1 Bb4 Bb5 Ba5 Kc2 Ke7 Bc6 Bb4 Kd3 Kf8 Ke4 Kg7 g4 hxg4 Kf4 Bd2+ Kxg4 Ba5 Kf4 Bd2+ Ke4 Kh6 Kf3 Be1 Ke2 Ba5 Kf2 Kh5 Kg3 Be1+ K 
 50  -0.26  21.6M   0:00.58 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kc1 Bc3 Be8 Kf8 Kc2 Be1 Kd1 Bb4 Bb5 Ba5 Kc2 Ke7 Bc6 Bb4 Kd3 Ba5 Ke2 Bb4 Kf3 Kf8 g4 hxg4+ Kxg4 Bd2 Kf3 Be1 Ke2 Ba5 Kd3 Be1 Bb5 Ke7 h5 Kf8 Bc6 Kg7 
 49  -0.26  18.8M   0:00.51 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kc1 Bc3 Be8 Kf8 Kc2 Be1 Kd1 Bb4 Bb5 Ba5 Kc2 Ke7 Bc6 Bb4 Kd3 Ba5 Ke2 Kf8 g4 hxg4 Kf2 Bd2 Kg3 Kg7 Kxg4 Bc3 Bb5 Bd2 Kf3 Kh6 Be8 Kg7 
 48  -0.26  15.8M   0:00.43 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Bb5 Bb4 Kd3 Be1 Ke4 Kf8 Bc6 Ba5 Kf4 Bd2+ Kf3 Kg7 g4 hxg4+ Kxg4 Bc3 Bb5 Be1 Be8 Bc3 Kf3 Kf8 Bd7 Bd2 Ke2 Bc3 Kf2 Bb4 Kg3 Be1+ Kg4 Kg7 Bb5 Bc3 
 47  -0.26  13.9M   0:00.38 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kd1 Ba5 Bb5 Bc3 Ke2 Ba5 Be8 Kf8 Bd7 Kg7 Kf3 Bb4 g4 hxg4+ Kxg4 Ba5 Kf3 Kf8 Bb5 Bb4 Kg4 Ba5 Kh3 Bd2 Kg3 Be1+ Kf4 Kg7 Bd7 Bb4 Kg3 
 46  -0.26  11.8M   0:00.32 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kd1 Ba5 Bb5 Bc3 Ke2 Ba5 Be8 Bb4 Kf2 Bc3 Kf3 Bb4 g4 Kf8 Bb5 hxg4+ Kxg4 Bd2 Kf3 Ba5 Ke2 Kg7 Bd7 Kh6 Be8 Kh5 Kf3 Kxh4 
 45  -0.26  10.5M   0:00.29 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Kd1 Ba5 Bb5 Bc3 Ke2 Ba5 Be8 Bb4 Kf2 Ba5 Bc6 Bd2 Ke2 Ba5 Kf3 Kh6 Bd7 Kg7 g4 hxg4+ Kxg4 Bb4 Kf4 Bd2+ Kg3 Ba5 Kf3 Kh6 Bc6 
 44  -0.26  9.27M   0:00.25 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Bb5 Bb4 Kd3 Be1 Bd7 Bb4 Ke4 Kf8 Kf3 Ba5 g4 hxg4+ Kxg4 Kg7 Bc6 Be1 Be8 Bc3 Kf3 Ba5 Kf2 Bd2 Ke2 Ba5 Bc6 Kh6 
 43  -0.26  7.59M   0:00.21 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Bb5 Bb4 Kd3 Be1 Bd7 Bb4 Ke4 Kf8 Kf3 Ba5 g4 hxg4+ Kxg4 Kg7 Bc6 Be1 Kf3 Kh6 Ke2 Ba5 Kf2 Kh5 Kg3 Be1+ Kh3 Kh6 Be8 
 42  -0.26  7.12M   0:00.20 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Bb5 Bb4 Kd3 Be1 Bd7 Bb4 Ke4 Kf8 Kf3 Ba5 g4 hxg4+ Kxg4 Kg7 Kf4 Bd2+ Kf3 Bb4 Bb5 Bd2 Ke2 Bb4 Kf2 Ba5 Be8 
 41  -0.26  6.06M   0:00.16 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Bb5 Bb4 Kd3 Be1 Bd7 Bb4 Ke4 Bd2 Bc6 Be1 Kf3 Kh6 Bd7 Bc3 g4 hxg4+ Kxg4 Kg7 Kf3 Kf8 Bb5 Bb4 Bc6 Kg7 Ke2 
 40  -0.26  5.44M   0:00.15 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Bb5 Bb4 Kd3 Be1 Bd7 Ba5 Bc6 Bb4 Ke2 Bc3 Kf2 Bb4 Kf3 Kh6 g4 hxg4+ Kxg4 Bd2 Bb5 Ba5 Be8 Kg7 Kf3 
 39  -0.26  4.89M   0:00.13 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Bb5 Bb4 Kd3 Be1 Bd7 Ba5 Bc6 Bb4 Ke4 Be1 Kf3 Kh6 Bd7 Kg7 g4 hxg4+ Kxg4 Bd2 Be8 Be1 Kf3 Kf8 
 38  -0.26  4.29M   0:00.12 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Kc2 Be1 Bb5 Bb4 Kd3 Be1 Bd7 Ba5 Bc6 Bb4 Ke4 Be1 Kf3 Bd2 g4 hxg4+ Kxg4 Be1 Bd7 Ba5 Kf3 Bd2 Ke2 
 37  -0.26  3.91M   0:00.11 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Ke4 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Bb4 Kf3 Ba5 Be8 Bd2 Ke2 Bb4 Bc6 Bc3 Kf2 Kh6 Be8 Kg7 Bd7 Kf8 Kf3 
 36  -0.26  3.28M   0:00.09 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Ke4 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Bc3 Kf3 Bb4 Ke2 Kh6 Kf2 Kh5 Kg3 Be1+ Kh3 Kh6 Be8 Kg7 Bb5 Ba5 Kg2 Bd2 
 35  -0.26  2.94M   0:00.08 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Ke4 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Bc3 Kf3 Be1 Be8 Bd2 Kf2 Ba5 Ke2 Bb4 Kd1 Kf8 Bd7 Kg7 Kc2 
 34  -0.26  2.62M   0:00.07 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Ke4 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Be1 Be8 Bd2 Bb5 Bc3 Kg3 Be1+ Kf3 Ba5 Bc6 Bd2 Ke2 
 33  -0.26  2.37M   0:00.07 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Ke4 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Be1 Be8 Bc3 Bd7 Bd2 Kf3 Ba5 Bc6 Bb4 Bb5 Bd2 
 32  -0.26  2.17M   0:00.06 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Ke4 Bc3 Kf3 Be1 g4 hxg4+ Kxg4 Bb4 Bd7 Bd2 Kf3 Bc3 Bb5 Ba5 Bc6 Bd2 
 31  -0.26  2.02M   0:00.06 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Ke4 Bc3 Kf3 Be1 g4 hxg4+ Kxg4 Bb4 Bd7 Kf8 Bb5 Kg7 Kf4 Bd2+ Kf3 Ba5 Bc6 
 30  -0.26  1.88M   0:00.05 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Ke4 Bc3 Kf3 Be1 g4 hxg4+ Kxg4 Bd2 Be8 Bb4 Kf4 Bd2+ Kf3 Ba5 Ke4 Kf8 
 29  -0.26  1.74M   0:00.05 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Ke4 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Ba5 Bb5 Be1 Be8 Ba5 Kf3 
 28  -0.26  1.66M   0:00.05 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Ke4 Bc3 Kf3 Be1 g4 hxg4+ Kxg4 Ba5 Bb5 Be1 Be8 Bd2 Bd7 Be1 Kf3 Bd2 
 27  -0.26  1.60M   0:00.05 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Ke4 Bc3 Kf3 Be1 g4 hxg4+ Kxg4 Bb4 Bb5 Ba5 Be8 Be1 Kf4 
 26  -0.26  1.52M   0:00.04 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Ke4 Bc3 Kf3 Be1 g4 hxg4+ Kxg4 Kf8 Bb5 Kg7 Be8 Bd2 Kf3 
 25  -0.26  1.42M   0:00.04 Kf8 Kf2 Bb4 Kf3 Bd2 Bb5 Kg7 Ke2 Bc3 Kd1 Ba5 Kc2 Be1 Kd3 Bb4 Bc6 Ba5 Be8 Kf8 Bb5 Bb4 Ke2 Bc3 Bc6 Bb4 Kf2 Kg7 Kf3 Bc3 Be8 Ba5 Bd7 Bc3 g4 hxg4+ Kxg4 Bd2 
 24  -0.27  1.24M   0:00.04 Kf8 Kf2 Bb4 Kf3 Bd2 g4 hxg4+ Kxg4 Be1 Kf3 Kg7 Be8 Ba5 Kf4 Bd2+ Kg3 Ba5 Bb5 Bd2 Kf3 Be1 h5 Ba5 Be8 Bd2 Kg3 Kf8 Bb5 
 23  -0.27  1.00M   0:00.03 Kf8 Kf2 Bb4 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Be8 Be1 Kf3 Ba5 Ke4 Kf8 Bc6 Kg7 Bb5 Kh6 Kf4 Bd2+ Kf3 Kg7 Ke2 Ba5 Kf2 Bc3 
 22  -0.27  926537  0:00.03 Kf8 Kf2 Bb4 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Be8 Be1 Kf3 Ba5 Ke4 Kf8 Bb5 Kg7 Kf4 Be1 Kf3 Bd2 Ke2 Ba5 Bc6 Kh6 Bd7 Kh5 
 21  -0.27  897020  0:00.03 Kf8 Kf2 Bb4 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Bd7 Be1 Kf3 Bd2 Bb5 Be1 Bc6 Kh6 Kg4 Kg7 Bd7 Kf8 Kf4 Bd2+ Kg3 Kg7 Kf3 Be1 
 20  -0.27  867981  0:00.03 Kf8 Kf2 Bb4 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Be8 Be1 Kf3 Ba5 Ke4 Kf8 Bb5 Kg7 Kf4 Be1 Kf3 Bd2 
 19  -0.28  846102  0:00.03 Kf8 Kf2 Bd2 Kf3 Kg7 g4 hxg4+ Kxg4 Be1 Be8 Ba5 Kf4 Be1 Ke4 Ba5 h5 Bd2 Kf3 Be1 Bc6 Ba5 
 18  -0.31  610642  0:00.02 Kf8 Kf2 Bd2 Kf3 Be1 g4 hxg4+ Kxg4 Kg7 Bd7 Kf8 Kf3 Ba5 Bb5 Be1 Bc6 Kg7 Be8 Ba5 Kf4 Be1 Ke4 
 17  -0.31  586951  0:00.02 Kf8 Kf2 Bb4 Bb5 Kg7 Kf3 Ba5 Be8 Kf8 Bc6 Be1 g4 hxg4+ Kxg4 Kg7 Bd7 Bd2 Be8 Ba5 Kf4 Be1 Kf3 Bc3 
 16  -0.27  378013  0:00.01 Kf8 Kf2 Bb4 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Be8 Kf8 Bb5 Kg7 Bc6 Ba5 Kf4 Kh6 Be8 Kh5 Bxf7+ Kxh4 
 15  -0.30  315089  0:00.01 Kf8 Kf2 Bb4 Kf3 Bd2 g4 hxg4+ Kxg4 Kg7 Be8 Kf8 Bb5 Be1 Kf4 Kg7 Bc6 Kh6 Kg4 Kg7 Be8 
 14  -0.31  264021  0:00.01 Kf8 Kf1 Kg7 Kg2 Bb4 Kf3 Be1 g4 hxg4+ Kxg4 Bd2 h5 Be1 Be8 Bd2 Kg3 Kg8 Bb5 Kg7 
 13  -0.31  241285  0:00.01 Kf8 Kg2 Bd2 Kf3 Be1 Bd7 Kg7 Be8 Kf8 Bc6 Kg7 Bb7 Bd2 g4 hxg4+ Kxg4 Bc3 
 12  -0.32  171209  0:00.01 Kf8 Kf2 Kg7 Kg2 Bc3 Kf3 Be1 g4 hxg4+ Kxg4 Bd2 h5 Bc3 Kf4 
 11  -0.33  140075  0:00.01 Kf8 Kf2 Kg7 Kf3 Bd2 g4 hxg4+ Kxg4 Kh6 Be8 Kg7 h5 
 10  -0.36  88917   0:00.00 Kf8 Kf1 Kg7 Kf2 Bb4 Kf3 Bc3 g4 hxg4+ Kxg4 Kh6 Bd7 
  9  -0.32  66143   0:00.00 Kf8 Kf1 Kg7 Ke2 Bc3 Kf3 Bd2 g4 hxg4+ Kxg4 Kh6 
  8  -0.44  52361   0:00.00 Kf8 Kf2 Kg8 Kf3 Bd2 g4 hxg4+ Kg3 Kh7 
  7  -0.42  30033   0:00.00 Kf8 g4 hxg4 Kf2 g3+ Kxg3 Kg7 
  6  -0.38  14349   0:00.00 Kd8 Kf2 Kc7 Be8 Bc3 Bxf7 
  5  -0.31  7651        0:00.00 Be1 Kg2 Bd2 Kf3 Be1 
  4  -0.27  6157        0:00.00 Kd8 Kf2 Kc7 
  3  -0.37  4237        0:00.00 Be1 Kh2 Bd2 
  2  -0.20  2508        0:00.00 Kd8 Kg2 
  1  -0.08  951         0:00.00 Be1 
  0 # 

In the absence of other evidence TCEC have ruled it as a stockfish crash (well, hang really).

A viewer said this was the last output, posted after about 22s of thought:

time 21992 pv e7f8 g1f2 a5c7 f2f3 f8e7 c6b7 c7a5 g3g4 h5g4 f3g3 a5c3 g3g4 c3a5 b7a6 a5c3 a6b5 c3d2 b5c6 d2e1 c6a8 e1a5 g4h5 a5d2 a8c6 d2e1 h5g4 e1d2 c6b5 d2a5 g4h3 a5d2 b5a6 d2c3 a6c8 c3e1 c8b7 e1d2 h3g2 d2c3 b7c6 c3d2 g2f1 d2c3 c6a8 c3d2 f1f2 e7f8 f2e2 d2b4 e2f3 f8e7 f3f2 b4d2 a8b7 e7f8 f2f3 d2e1 b7c6 f8e7 f3e2 e1b4 c6b5 ............

Roughly 16 mins later, after running out of time:

Stockfish 190826(13): stop Terminating process of engine Stockfish 190826(13)

So it sounds like something happened while it was in its normal window of thinking time that made it not send any more output to cutechess. That could have been when it tried to send its move, of course, or it could be some kind of internal infinite loop, or ...

In fact previous moves took 23/22/36/22/21 seconds, so 22s sounds like when it tried to send a move back to cutechess?

In fact previous moves took 23/22/36/22/21 seconds, so 22s sounds like when it tried to send a move back to cutechess?

This explanation seems the most logic. Specially the position has no reason to lead to infinite loops. But is it a SF issue or cutechess issue. I don't know ... Perhaps SF sended the move but cutechess didn't receive it.

Whole log for this game is at https://tcec-chess.com/crash/s16divp_51.7z (= http://tinyurl.com/y597lyyp listed in Crash info)

Specially the position has no reason to lead to infinite loops.

Maybe not infinite loops, but search explosion?

The last output received from Stockfish was at depth 82 and selective depth 114 after (almost) 22 seconds, and then nothing happened in like 15 minutes before Stockfish was flagged:

82128711 <Stockfish 190826(13): info depth 83 seldepth 114 multipv 1 score cp -41 nodes 2661493724 nps 121020995 hashfull 97 tbhits 1495869 time 21992 pv e7f8 g1f2 a5c7 f2f3 f8e7 c6b7 c7a5 g3g4 h5g4 f3g3 a5c3 g3g4 c3a5 b7a6 a5c3 a6b5 c3d2 b5c6 d2e1 c6a8 e1a5 g4h5 a5d2 a8c6 d2e1 h5g4 e1d2 c6b5 d2a5 g4h3 a5d2 b5a6 d2c3 a6c8 c3e1 c8b7 e1d2 h3g2 d2c3 b7c6 c3d2 g2f1 d2c3 c6a8 c3d2 f1f2 e7f8 f2e2 d2b4 e2f3 f8e7 f3f2 b4d2 a8b7 e7f8 f2f3 d2e1 b7c6 f8e7 f3e2 e1b4 c6b5 b4a5 b5a6 a5c3 a6c8 c3b4 e2f2 e7f8 f2f3 b4c3 c8e6 f8g7 f3g4 c3d2 g4g3 d2c3 e6c8 c3e1 g3f3 g7h6 f3g4 e1a5 c8b7 a5e1 b7c6 h6g7 c6b5 e1a5 g4f4 g7f8 b5d7 a5d2 f4f3 d2a5 h4h5 f8g7 h5h6 g7h6

But already at depth 29 we were reaching selective depth 113:

82106994 <Stockfish 190826(13): info depth 29 seldepth 113 multipv 1 score cp -41 nodes 852215 nps 85221500 tbhits 0 time 10 pv e7f8 g1f2 a5c7 f2f3 f8e7 c6b7 c7a5 g3g4 h5g4 f3g3 a5c3 g3g4 c3e1 g4h5 e1b4 h5h6 e7f8 b7c8 b4e1 c8d7 e1d2 h6h5 d2e1 d7c6 e1d2 c6b7 f8e7 b7a6 d2b4 a6b5 b4e1 h5g4 e1a5 b5c6 a5d2 g4f3 d2e1 f3e2 e1b4 e2d1 b4a5 c6b7 a5c3 b7c8 e7f8 d1e2 f8e7 c8a6 c3b4 h4h5 b4a5 e2f2

Maybe I am wrong, but the sort of position when this happened (shuffling in a draw opposite colored bishops endgame with score = 0.41, stockfish having not yet shown 0.00) seems to hint that maybe we observed a search explosion problem somewhere, maybe related to the shuffling patch?

I wasn't really active when the shuffling patch was developed, but did we had problems like that during its development or tuning?

It went fine through depth 85. Then I ran it again and I got some sort of memory leak at depth 79. Speed went down to a crawl. Third time seems to be just fine again. I am using Fritz 16 interface.

unlikely that shuffle extensions are related. Threads will quit search if they observe the signal stop. I.e. pass at line 1199 in search. Similarly, stop will be signaled by the mainthread if that one reaches line 1828. It isn't very easy to prove this happens always, but maybe there is some very rare case (that I can't see, but why is the cycle detection code before the time check)?

Memory corruption could of course cause anything, e.g. if some random write causes ponder to be true, this would happen.

@Mindbreaker1 with or without TB ?

With 6-man and less

I had the hash set at 12+ GB
Tried a few more times. Haven't got it to do it again.

Threads will quit search if they observe the signal stop

Hmmm, maybe our code is not robust enough for time emergency, by the way. It is true that we exit the search in line 1199 in any thread if the main thread has signaled in check_time(), but line 1199 is after closing a subtree, so there could still be an (infinite) chain of subtree openings before we stop.

I would prefer to add these lines around line 1199 when entering the search function, to be honest:

if (Threads.stop.load(std::memory_order_relaxed)) return VALUE_ZERO;

Of course that would just be to be 100% sure that our time emergency procedure is correct in case of explosion problem, but that doesn't fix the underlying problem.

I really think this wasn't a time emergency. max_time (which is checked by check_time) must have been much shorter than the available time on the clock (would need to verify, but I'm rather sure, additionally with have the 1s moveoverhead).

I agree we should be defensive in the code, but before we rush to a solution we should understand what is going on.

Well, I got another unusual result. It stayed on 79 much longer, but everything else looked fine.

@vondele What makes me really suspiscious is the depth 29 / selective depth 119 reported in the logs

I had a 10 minute gap in writes to the log while testing (11 slow cores, but still):

Note the ^C because nothing seemed to be happening and the date at that point compared to output of ls:

$ tail -f sfmaster.log
<< info depth 87 currmove a5d2 currmovenumber 2
<< info depth 87 currmove a5b4 currmovenumber 3
<< info depth 87 currmove a5c3 currmovenumber 4
<< info depth 87 currmove a5e1 currmovenumber 5
<< info depth 87 currmove e7d8 currmovenumber 6
<< info depth 87 currmove a5d8 currmovenumber 7
<< info depth 87 currmove a5c7 currmovenumber 8
<< info depth 87 currmove a5b6 currmovenumber 9
<< info depth 87 seldepth 140 multipv 1 score cp -42 nodes 18627970197 nps 17266666 hashfull 1000 tbhits 0 time 1078840 pv e7f8 g1f2 f8g7 c6d7 g7f8 f2e3 f8e7 d7b5 a5c3 e3e2 e7f8 b5a6 f8e7 e2f2 c3a5 f2e3 a5e1 e3f3 e7d8 a6b7 e1a5 f3g2 d8c7 b7c6 c7d8 c6b5 d8e7 g2h3 e7d8 b5a6 d8e7 a6b7 e7d7 h3h2 d7e7 h2g2 a5e1 b7c6 e1a5 g2f3 a5d2 f3e4 e7d8 c6b5 d8e7 b5a6 d2c3 a6b7 c3a5 b7c6 a5e1 e4e3 e1c3 e3d3 c3e1 d3e2 e1c3 c6a8 c3a5 e2f1 e7d8 f1g1 d8e7 g1h1 e7d8 a8b7 d8c7 b7a6 c7d8 h1g2 d8c7 g2f2 c7d7 f2f3 a5e1 a6b5 d7d8 a4a5 e1a5 f3f2 a5b4 b5a4 b4c3 f2g2 d8e7 a4d1 e7d7 d1h5
<< info depth 88 currmove e7f8 currmovenumber 1
^C
$ ls -lt
total 6814112
-rw------- 1 sf sf 48842 Sep 9 16:19 sfmaster.log
-rw------- 1 sf sf 297 Sep 9 16:01 sfnew.log
-rw------- 1 sf sf 656 Sep 9 16:01 cute_2h_master.out
drwx------ 3 sf sf 24576 Sep 9 16:01 ./
-rwx------ 1 sf sf 1620 Sep 9 16:01 cute_2h*
...
$ date
Mon 9 Sep 16:29:44 BST 2019

Output continued very shortly after:

$ tail -f sfmaster.log
<< info depth 87 currmove a5d2 currmovenumber 2
<< info depth 87 currmove a5b4 currmovenumber 3
<< info depth 87 currmove a5c3 currmovenumber 4
<< info depth 87 currmove a5e1 currmovenumber 5
<< info depth 87 currmove e7d8 currmovenumber 6
<< info depth 87 currmove a5d8 currmovenumber 7
<< info depth 87 currmove a5c7 currmovenumber 8
<< info depth 87 currmove a5b6 currmovenumber 9
<< info depth 87 seldepth 140 multipv 1 score cp -42 nodes 18627970197 nps 17266666 hashfull 1000 tbhits 0 time 1078840 pv e7f8 g1f2 f8g7 c6d7 g7f8 f2e3 f8e7 d7b5 a5c3 e3e2 e7f8 b5a6 f8e7 e2f2 c3a5 f2e3 a5e1 e3f3 e7d8 a6b7 e1a5 f3g2 d8c7 b7c6 c7d8 c6b5 d8e7 g2h3 e7d8 b5a6 d8e7 a6b7 e7d7 h3h2 d7e7 h2g2 a5e1 b7c6 e1a5 g2f3 a5d2 f3e4 e7d8 c6b5 d8e7 b5a6 d2c3 a6b7 c3a5 b7c6 a5e1 e4e3 e1c3 e3d3 c3e1 d3e2 e1c3 c6a8 c3a5 e2f1 e7d8 f1g1 d8e7 g1h1 e7d8 a8b7 d8c7 b7a6 c7d8 h1g2 d8c7 g2f2 c7d7 f2f3 a5e1 a6b5 d7d8 a4a5 e1a5 f3f2 a5b4 b5a4 b4c3 f2g2 d8e7 a4d1 e7d7 d1h5
<< info depth 88 currmove e7f8 currmovenumber 1
<< info depth 88 currmove a5d2 currmovenumber 2
<< info depth 88 currmove a5e1 currmovenumber 3
<< info depth 88 currmove a5b4 currmovenumber 4
<< info depth 88 currmove a5c3 currmovenumber 5
<< info depth 88 currmove e7d8 currmovenumber 6
<< info depth 88 currmove a5d8 currmovenumber 7
<< info depth 88 currmove a5b6 currmovenumber 8
<< info depth 88 currmove a5c7 currmovenumber 9
<< info depth 88 seldepth 96 multipv 1 score cp -42 nodes 32787387402 nps 16933010 hashfull 1000 tbhits 0 time 1936300 pv e7f8 g1f2 f8g7 c6d7 g7f8 f2e3 f8e7 d7b5 a5c3 e3e2 e7f8 b5a6 f8e7 e2f2 c3b4 f2g2 b4a5 a6b5 e7f8 g2f3 a5c3 f3e4 c3a5 e4e3 f8e7 e3f3 a5d2 b5c6 d2c3 f3e2 e7f8 e2d3 c3e1 d3e3 f8g7 c6b7 g7h6 e3d3 e1a5 b7c6 h6g7 c6d7 a5e1 d3e4 g7f8 e4e3 f8e7 d7c6 e1c3 e3f3 c3e1 f3e4 e1a5 c6b7 a5e1 e4f4 e1d2 f4f3 d2c3 f3f2 c3a5 b7c6 e7d8 f2e3 d8e7 c6a8 a5e1 e3f4 e1d2 f4f3 d2c3 f3f2 c3a5 a8b7 e7d8 b7a6 d8c7 f2e3 a5e1 g3g4 h5g4 h4h5 g4g3 e3f3 e1d2 f3g3 c7d7 g3f2 d7e7 f2e2 d2b4
<< info depth 89 currmove e7f8 currmovenumber 1

Does hashful 1000 indicate a full hash? I was using 8GB (option.Hash=8192)

Yes hashfull 1000 is 100% utilization of hash. Hashfull 123 is 12.3% full.

I put some debug output in and maximumtime looks to be around say 15-20% of total time, so if the check is made, it should stop us in plenty of time.
This matches my experience watching TCEC where I would say sf takes a max of about 33% of available time. (Some engines occasionally get up towards 50%)

@vondele What makes me really suspiscious is the depth 29 / selective depth 119 reported in the logs

that's quite usual with the shuffle extensions, and was discussed in depth. This should still not cause the time_check mechanism to fail (if that's what is going on).

OK, the memory leak happened again. But it happened at depth 90.
I doubt the stuff I have pasted below is useful, but what do I know:

Stockfish 030919 64 POPCNT
8/4kp2/2Bp1p2/b1pP1P1p/P1P4P/6P1/8/6K1 b - - 0 1

Analysis by Stockfish 030919 64 POPCNT:

57...Kf8 58.Kg2 Be1 59.Kf3 Kg7 60.Bd7 Bc3 61.Ke2 Kh7 62.Kd3 Be1 63.Bc6 Kg7 64.Bb7 Bb4 65.Bc8 Be1 66.Ke2 Bc3 67.Ba6 Bb4 68.Kf3 Bc3 69.Ke4 Be1 70.Bb7 Kh6 71.Kd3 Kg7 72.Kc2 Kh6 73.Bc6 Kg7 74.Bb5 Kh6 75.Be8 Kg7 76.Kd1 Bc3 77.Ke2 Kf8 78.Bb5 Ke7 79.Bc6 Ba5 80.Ke3 Be1 81.Kf3 Kf8 82.Ke2 Ba5 83.Kd3 Kg7 84.Bb5 Be1 85.Ke2 Bb4 86.Ba6 Bc3 87.Kf3 Ba5 88.Bb7 Bb4 89.Kf2 Bc3 90.Bc6 Bb4 91.Be8 Bc3 92.Bb5 Kf8 93.Kf3 Kg7 94.Bd7 Bb4 95.Be8 Kf8 96.Bb5 Kg7 97.Kg2 Bc3 98.Bc6 Ba5 99.Kh2 Be1 100.Bb5 Bb4 101.Bd7 Kf8 102.Kh3 Kg7 103.Be8 Bc3 104.Bc6 Kh6 105.g4 hxg4+ 106.Kxg4 Kg7
White is slightly better: +/= (0.45) Depth: 89/99 00:22:37 25076MN, tb=68103924
(, 09.09.2019)

Things started to go wonky at about 5 minutes past this output.

@Mindbreaker1 ... if you say memory leak, how do you measure / observe that ?

What is sometimes seen is that the OS caches access to TB files, but that's more of an OS 'feature'. Do you see the same behavior without TB ?

As a test of the timing mechanism, I run the position on a similar setup (~threads, hash, fen, but no TB), and do a go movetime 25000. This uses exactly the same mechanism to quit search as maximumTime. With 10 tests, search finishes within 3ms of the time limit, which is OK:

time 25001 
time 25003 
time 25001 
time 25003 
time 25001 
time 25003 
time 25002 
time 25001 
time 25001 
time 25001 

Edit: similar statistics for ~100 runs:

     76 time 25001 
     18 time 25002 
      2 time 25003 
      2 time 25004 

The Windows Task Manager showed far less memory in the "details" tab for Stockfish than it was showing earlier and the memory for everything seemed to be reducing. Everything was glitching. I noticed and investigated after the music I was listening to started to sputter. The machine has 16 GB of RAM. Fritz allowed a maximum of 12288 MB and that is what I chose.

This is still with the 6-man tables.

It looks like I allowed the GUI table access too.

I am using 8 of 16 threads.

I have had this Fritz 16 version for less than a week, so it could just be glitches in their program. I did sit on some positions for hours though with no issues.

Log not updated for 22+ mins (16GB hash now, still waiting as I type):

<< info depth 93 currmove e7d8 currmovenumber 9
<< info depth 93 seldepth 128 multipv 1 score cp -42 nodes 34207263908 nps 16336010 hashfull 1000 tbhits 0 time 2093979 pv e7f8 g1f1 f8g8 f1f2 a5d2 f2e2 d2a5 e2f3 g8f8 f3e4 f8g7 e4d3 g7h6 d3c2 a5b4 c6b5 b4a5 b5e8 h6g7 e8d7 g7f8 c2d1 a5c3 d7c8 f8e8 d1e2 e8f8 e2d3 c3b4 c8b7 b4e1 d3e2 e1c3 e2f3 f8g7 f3e4 c3e1 e4e3 g7f8 b7c6 f8e7 e3e4 e1b4 c6b7 e7d8 e4e3 b4e1 e3f3 e1c3 b7c6 d8e7 f3e2 c3a5 e2f2 e7f8 c6b5 f8g7 b5d7 a5c3 f2f3 g7h6 f3g2 h6g7 d7b5 g7f8 b5a6 c3a5 g2f1 a5c3 f1f2 f8g8 f2e2 g8g7 a6b5 g7h6 e2f2 c3d2 f2f3 h6h7 g3g4 h5g4 f3g4 h7h6 b5c6 d2c3 c6e8 h6g7 g4f3 g7f8 e8b5 f8g8 b5c6 g8h7 c6e8
<< info depth 94 currmove e7f8 currmovenumber 1
^C
$
$ lh -2
total 6814120
-rw------- 1 sf sf 52673 Sep 9 20:37 sfmaster.log
$
$ date
Mon 9 Sep 20:59:45 BST 2019

@xoto10 if this is happening could you see if SF is responsive to issuing a stop ? It should react instantly with a bestmove output.

No, I'm running via cutechess :( It is still using cpu, and experience suggests it is still working, it just takes 2 or 3 minutes inbetween iterations sometimes (and then occasioanlly much longer). I have wondered if there is buffering of the output, but this file is named directly to cutechess so I don't think so.
Edit: this is using tc=inf
cutechess-cli -openings file=crash1_20190909.pgn -resign movecount=3 score=400 -draw movenumber=40 movecount=5 score=5 -games 1 -rounds 1 -pgnout cute_2h_master.pgn -ratinginterval 100
-engine cmd=stockfish_master name=master 'option.Debug Log File=sfnew.log' option.Hash=16
-engine cmd=stockfish_master name=master 'option.Debug Log File=sfmaster.log' option.Hash=16384
-each tc=inf proto=uci option.Threads=11 'option.Move Overhead=50' 'option.Minimum Thinking Time=5'
-concurrency 1

I think I set up Arena to use the server remotely once, let me look ...

@xoto10 : did you set a maximum time + increment ?

@snicolet : I don't think it is related to shuffle extension. In generale these hangs can be reproduced easily for the same position and can be explained. In all cases I have seen, it never exceed maximum time as @vondele explained. I think that SF didn't send the bestmove information for a reason that I can't understand.
sync_cout << "bestmove " << UCI::move(bestThread->rootMoves[0].pv[0], rootPos.is_chess960());

Or it is another problem in cutechess or interface with cutechess. I don't know also if different compiled can have a "special" wrong behavior.

It's moved on now, started depth 96 at 21:11

@xoto10 .... you should try the command line :-) just paste something like:

setoption name Hash value 16384
setoption name Move Overhead value 1000
setoption name Threads value 31
isready
ucinewgame
setoption name Ponder value false
position startpos moves c2c4 e7e5 g2g3 g8f6 b1c3 f8c5 f1g2 b8c6 e2e3 e8g8 g1e2 d7d6 d2d4 c5b6 e1g1 c8f5 h2h3 f8e8 a2a3 a7a5 b2b3 h7h6 a1a2 f5h7 a2d2 d8d7 c1b2 b6a7 f1e1 g8h8 d1c1 h8g8 c3d5 f6e4 d2d1 e5d4 e3d4 c6e7 e2c3 e7d5 c3e4 h7e4 g2e4 d5f6 e4b7 a8b8 e1e8 f6e8 b7g2 b8b3 c1c2 a5a4 d1d3 b3b8 h3h4 d7f5 d3d2 f5c2 d2c2 c7c5 d4d5 a7b6 g2h3 e8f6 b2f6 g7f6 c2b2 b6c7 b2b8 c7b8 h3d7 g8g7 f2f4 b8c7 f4f5 c7a5 d7a4 a5c3 a4c6 g7f8 g1g2 f8g7 c6b5 g7f8 g2f3 f8g7 a3a4 c3d2 b5e8 d2e1 f3e2 e1c3 e8d7 h6h5 d7e6 c3a5 e2f3 a5d2 e6d7 d2e1 d7e8 e1c3 e8c6 c3a5 f3e4 a5e1 e4e3 g7f8 e3e2 e1a5 e2f2 f8e7 f2g1
isready
go movetime 25000

however, I think the long time between output is reasonable at depth 93... roughly the same time as the time used so far would be OK IMO (so about 2093979ms in your example).

Seems to respond to stop - came back with the pv for the next depth and all cpu activity stopped. Will look at it some more tomorrow ...

Yeah, I did think of running sf directly, wasn't sure I would get the list of commands right :)

Looking at the so-called "Livelog" tab of TCEC during the game Stockfish-KomodoMCTS, I saw the following example:

23:39:01
12981570 Stockfish 190826(2): info depth 45 seldepth 75 multipv 1 score cp 113 lowerbound nodes 29825896348 nps 58630104 hashfull 997 tbhits 1454396 time 508713 pv h7h1
12981570 Stockfish 190826(2): info depth 43 currmove h7h1 currmovenumber 1

23:39:07
12988418 Stockfish 190826(2): info depth 45 seldepth 75 multipv 1 score cp 129 lowerbound nodes 30230948523 nps 58636882 hashfull 997 tbhits 1533367 time 515562 pv h7h1
12988419 Stockfish 190826(2): info depth 42 currmove h7h1 currmovenumber 1

23:40:31
13072130 Stockfish 190826(2): info depth 45 seldepth 75 multipv 1 score cp 150 lowerbound nodes 35120560008 nps 58605276 hashfull 997 tbhits 2314258 time 599273 pv h7h1
13072130 Stockfish 190826(2): info depth 41 currmove h7h1 currmovenumber 1

Why are the "currmove" lines reporting depths less than the previous "lowerbound" lines?

Edit: info depth 43 currmove h7h1 currmovenumber 1 is the main thread, while info depth 45 seldepth 75 multipv 1 score cp 113 lowerbound [...] is from the best auxiliary thread. Doesn't explain why Stockfish would skip line 399 in MainThread::search().

https://github.com/official-stockfish/Stockfish/blob/8fec8834715a440ac18e24e130888c2c60bab352/src/search.cpp#L1254-L1260

In case of a singular extension it may have searched nothing, if a fail low/high is returned does this not cause a not mainThread to infinite loop?

@snicolet currmove uses the actual depth (rootDepth - failedHighCnt) not rootDepth

Why are the "currmove" lines reporting depths less than the previous "lowerbound" lines?

Normal behaviour. Committed in SF on 25th october 2018.

See this code in search.cpp :
Depth adjustedDepth = std::max(ONE_PLY, rootDepth - failedHighCnt * ONE_PLY);

BTW, given the issue we have open here: https://github.com/official-stockfish/Stockfish/issues/2229 and the reply given here https://github.com/msys2/MINGW-packages/issues/5610#issuecomment-511189219 (including issues mentioned there) pointing at potential deadlocks in libwinpthread-1.dll do we know which version of libwinpthread is being used at TCEC, and if any of these bugs in mingw packages could affect us?

@vondele Does sf at TCEC with pthreads for Windows (not -static pgo build with native threads)???

It's dated July 5th. I attached it to this message (just rename the .txt to .dll after retrieving it) so it can be tested/compared.

This seems by far the most likely explanation.

libwinpthread-1.txt

@d3vv I don't know how it is produced, and I don't run on windows, so'll need to pass. The build I think is provided in the forum https://groups.google.com/d/msg/fishcooking/PF3dcA8aPh8/Kahr3LufBAAJ but it does include libwinpthread, which is what triggered my remark above.

@vondele Quick and simple way ask tcec-team about all binaries which was provided them from sf-team.

So those binaries are needed for "true" test. It is very strange that you are trying to reproduce an issue via own builds

It doesn't seem very likely to be the same issue as https://github.com/official-stockfish/Stockfish/issues/2229? That issue seems to break sf completely, being reproducible even in the bench, while this issue is extremely rare.

@vondele

" _The build I think is provided in the forum https://groups.google.com/d/msg/fishcooking/PF3dcA8aPh8/Kahr3LufBAAJ but it does include libwinpthread,_"

All the libs including libwinpthread-1.dll have been provided. Please check again. Also Sf wont run if any of the libs is missing ... one can verify anytime ...

P.S. - it's compiled using gcc version 9.2.0 posix

@Krgp sorry for the confusion, I meant, 'and it does include libwinpthread'.

Just for completeness and my understanding can you provide the precise toolchain info (i.e. more precise versions (i.e.is this mingw-gcc-xxx, which version of dlls do we link to etc, and compilation options used, maybe using pacman -Qe if that's how you have setup the system).

Also, would it be possible to run a multithreaded match using cutechess on your machine with this binary (for example 4 threads at short TC, e.g. 5+0.05), and see if cutechess observes any hangs/crashes ? [This would can be done by anybody with windows, and would be useful info to have]

@vondele
(Msys2)
asciidoc 8.6.10-2
autoconf 2.69-5
autoconf2.13 2.13-2
autogen 5.18.16-1
automake-wrapper 11-1
automake1.10 1.10.3-3
automake1.11 1.11.6-3
automake1.12 1.12.6-3
automake1.13 1.13.4-4
automake1.14 1.14.1-3
automake1.15 1.15.1-1
automake1.6 1.6.3-2
automake1.7 1.7.9-2
automake1.8 1.8.5-3
automake1.9 1.9.6-2
bash 4.4.023-1
bash-completion 2.9-1
bison 3.4.1-1
bsdcpio 3.4.0-4
bsdtar 3.4.0-4
bzip2 1.0.8-1
coreutils 8.31-1
crypt 1.3-1
curl 7.65.3-1
dash 0.5.10.2-1
diffstat 1.62-1
diffutils 3.7-1
dos2unix 7.4.0-1
file 5.37-1
filesystem 2018.12-1
findutils 4.6.0-1
flex 2.6.4-1
gawk 5.0.1-1
gcc-libs 9.1.0-2
gdb 8.2.1-3
gettext-devel 0.19.8.1-1
git 2.22.0-1
gperf 3.1-1
grep 3.0-2
groff 1.22.4-1
gzip 1.10-1
help2man 1.47.10-1
inetutils 1.9.4-2
info 6.6-1
intltool 0.51.0-2
lemon 3.21.0-1
less 551-1
libtool 2.4.6-7
libunrar 5.7.5-1
libunrar-devel 5.7.5-1
lndir 1.0.3-1
make 4.2.1-1
man-db 2.8.6.1-1
mercurial 5.1-1
mingw-w64-i686-binutils 2.32-3
mingw-w64-i686-cmake 3.15.2-1
mingw-w64-i686-crt-git 7.0.0.5491.fe45801e-1
mingw-w64-i686-gcc 9.2.0-1
mingw-w64-i686-gcc-fortran 9.2.0-1
mingw-w64-i686-gcc-libgfortran 9.2.0-1
mingw-w64-i686-gcc-libs 9.2.0-1
mingw-w64-i686-gdb 8.3-9
mingw-w64-i686-headers-git 7.0.0.5490.9ec54ed1-1
mingw-w64-i686-libmangle-git 7.0.0.5230.69c8fad6-1
mingw-w64-i686-libwinpthread-git 7.0.0.5480.e14d23be-1
mingw-w64-i686-make 4.2.1-4
mingw-w64-i686-pkg-config 0.29.2-1
mingw-w64-i686-tools-git 7.0.0.5479.8db8dd5a-1
mingw-w64-i686-winpthreads-git 7.0.0.5480.e14d23be-1
mingw-w64-i686-winstorecompat-git 7.0.0.5479.8db8dd5a-1
mingw-w64-x86_64-binutils 2.32-3
mingw-w64-x86_64-clang 8.0.1-3
mingw-w64-x86_64-cmake 3.15.2-1
mingw-w64-x86_64-crt-git 7.0.0.5491.fe45801e-1
mingw-w64-x86_64-gcc 9.2.0-1
mingw-w64-x86_64-gcc-fortran 9.2.0-1
mingw-w64-x86_64-gcc-libgfortran 9.2.0-1
mingw-w64-x86_64-gcc-libs 9.2.0-1
mingw-w64-x86_64-gdb 8.3-9
mingw-w64-x86_64-headers-git 7.0.0.5490.9ec54ed1-1
mingw-w64-x86_64-libmangle-git 7.0.0.5230.69c8fad6-1
mingw-w64-x86_64-libwinpthread-git 7.0.0.5480.e14d23be-1
mingw-w64-x86_64-make 4.2.1-4
mingw-w64-x86_64-pkg-config 0.29.2-1
mingw-w64-x86_64-tools-git 7.0.0.5479.8db8dd5a-1
mingw-w64-x86_64-winpthreads-git 7.0.0.5480.e14d23be-1
mingw-w64-x86_64-winstorecompat-git 7.0.0.5479.8db8dd5a-1
mintty 1~3.0.2-1
msys2-keyring r9.397a52e-1
msys2-launcher-git 0.3.32.56c2ba7-2
msys2-runtime 3.0.7-6
ncurses 6.1.20190615-1
pacman 5.1.3-3
pacman-mirrors 20180604-2
pactoys-git r2.07ca37f-1
patch 2.7.6-1
patchutils 0.3.4-1
pax-git 20161104.2-1
perl 5.30.0-1
pkg-config 0.29.2-1
pkgfile 19-1
quilt 0.66-2
rcs 5.9.4-2
rebase 4.4.4-1
scons 3.1.1-1
sed 4.7-1
subversion 1.12.2-1
swig 4.0.0-1
texinfo 6.6-1
texinfo-tex 6.6-1
tftp-hpa 5.2-3
time 1.9-1
ttyrec 1.0.8-2
tzcode 2019.a-1
unrar 5.7.5-1
util-linux 2.34-1
which 2.21-2
xmlto 0.0.28-2

Hope this helps.

P.S. Default make file with only one change -msse3 is replaced by -msse4
make profile-build ARCH=x86-64-bmi2
Compiled on Windows 10 Pro 64 Bit
Haswell 4790k, Asus Hero VI Motherboard, GSkill Trident RAM @ 2400, 10-12-12-31

@Krgp Could you explain - Why do TCEC-build need non-static build for Windows?

against:
make profile-build ARCH=x86-64-bmi2 COMP=mingw

This has nothing to do with static or dynamic linked builds.

From disassembling I can see the libwinpthread-1.dll library provided above is after the https://github.com/msys2/MINGW-packages/issues/5610#issuecomment-511189219 fix.

So perhaps that change introduced more problems than it was trying to fix?
Reading the message board of theirs, it seems not many people understand how synchronization code works internally. From SF's side it seems that linking with the new version of the library had issues even during bench runs.

I guess the first mystery to solve is how this build doesn't break like the others with the new library changes.

@d3vv A static build on Windows doesn't get LTO (unless cross-compiled on linux with modified make file as done in case of 'abrok' builds) ... and Lto gives a considerable speed up.

Now a crash of another engine against Houdini:

https://www.tcec-chess.com/archive.html?season=16&div=p&game=61

So Houdini might have consumed too much resources (eg EGTB caching) which caused an issue for its opponents?

Now a crash of another engine against Houdini:

ScorpioNN runs on the remote GPU server, so the crashes are almost certainly unrelated. Stalls due to network issues are also common for these matches.

@noobpwnftw

In case of a singular extension it may have searched nothing, if a fail low/high is returned does this not cause a not mainThread to infinite loop?

Can you elaborate a bit about this? Or better, modify the master singular code to always perform your suspected bad behavior?

@snicolet
My observations are essentially one thing for both:

  1. The most likely case is the winpthread library, if you look at the changes they introduced as a "fix" to the deadlocks: by removing the offending locks! How does that supposed to work and if those are indeed redundant, why did people wrote them in the first place?
  1. In Stockfish code I mentioned, obviously the code was once there and now commented to end the function when there is a stop signal, the effects of it, aside from not writing what has been searched into TT, forces a VALUE_DRAW return value, since as I understand only check for stop signal after a search iteration, and in qsearch there is no such check, with higher depth limit could it be possible for the search to stuck somewhere due to fail high/lows? And the same question, why it used to have such code?

Maybe now somebody is motivated to analyse
https://github.com/official-stockfish/Stockfish/issues/2229 ?
Until we know if this is relevant or not it would be better to only builds made with the downgraded libwinpthread (when builds are made with the MSYS2 environment).

@CoffeeOne #2229 has been analyzed and reported upstream as https://github.com/msys2/MINGW-packages/issues/5610 (by you, thanks!). It really is up to the package devs of MINGW to address this, even though one can try to help them.

However, we should still establish firmly that this is the cause for the observed tcec hang, which will mean carefully test the binary with the employed dll, ideally on the same version of the OS. Since the hang still seems very rare (contrary to your report in #2229, for unknown reasons), this might take a while. If we confirm that this is the root cause (right now it is speculation), we should fallback to a working version of the toolchain targeting windows.

Finally, this highlights the risk of using latest binaries in a process that involves little integrated testing. If we want to support windows, we should have more CI/CD for this platform, i.e. test the whole toolchain and the resulting binaries.

Actually just noticed that https://github.com/ianfab/Fairy-Stockfish/issues/29 provides a more in-depth analysis of crashing on windows... seems at least that this has been analyzed by @ianfab and @ppigazzini so next to the threading, also the default stack size has been increased (I'm not sure that's the right fix in the threaded case, BTW).

Edit: the stacksize issue was with a chess variant, but it might be that we're nevertheless close to or exceeding the limit.

In fact previous moves took 23/22/36/22/21 seconds, so 22s sounds like when it tried to send a move back to cutechess?

Looking at the code, if we assume mainthread finished processing there are 2 places where it could loop before sending bestmove:

  while (!Threads.stop && (ponder || Limits.infinite))
  {} // Busy wait for a stop or a ponder reset

and

  // Wait until all threads have finished
  for (Thread* th : Threads)
      if (th != this)
          th->wait_for_search_finished();

Would it be prudent to output some messages before these loops so that if this happens again we have an idea if these loops were reached? My guess is that threads.stop was probably set, so the second loop is the main candidate ... and that's threading, like the bug report people have been talking about. Or alternatively, maybe one thread got stuck in some kind of search explosion / infinite loop / other bug and mainthread waited forever for it. Either way that th->wait_for_search_finished() seems to be a candidate.
Perhaps we could put a test for Time.maximum() in that thread wait somehow?

Edit: the stacksize issue was with a chess variant, but it might be that we're nevertheless close to or exceeding the limit.

Perhaps some windows users could try a long running test on some endgame positions that reach high depths? It would be nice to confirm that seldepths up to 200+ can be handled ok. I presume the recursive search() calls use a lot of stack at high depths?

There were several versions of Stockfish changes that could cause a crash if something went wrong with the stack, during initialization, or stack overflow. Marco made several changes for instance limiting the maximum number of moves. Maybe some interesting reading material, all references to 'stack' in Fiscooking:
https://groups.google.com/forum/#!searchin/fishcooking/stack;context-place=forum/fishcooking
There was one was about a crash that only happened in SMP for instance 'postFut crashes in SMP'

The adjudication says "Black's connection stalls".
And in the log that Aloril linked to above Stockfis actually seems to send a 'stop' which I think is an UCI message, if I read it right? So it did not crash or get into an infinite loop or search explosion? If Stockfish thought it was low on time, it had 21 minutes left on the clock, seems to have used up about 21 seconds in this search, but would it have sent a 'stop' even if it had thought it should stop the search beccause low on time, or one of the threads thought it had to stop?

82128710 82128710 82128711 82128711 83107715 >Stockfish 190826(13): stop
Terminating process of engine Stockfish 190826(13)
83202319 >Houdini 6.03(12): quit
Finished game 51 (Houdini 6.03 vs Stockfish 190826): 1-0 {Black's connection stalls}

And in the log that Aloril linked to above Stockfis actually seems to send a 'stop' which I think is an UCI message, if I read it right?

The stop message was sent by cutechess, notice it says >Stockfish instead of <Stockfish. The output after 22s was the last sign of life, then nothing happened for 16 minutes until it ran out of time and was killed by cutechess.

Ah, thanks Morten! I did not read it right,,, I did notice the time had gone up though. So we know Cutechess did the right thing here, and Stockfish did not send anymore output, after that last PV. A changed version of Crystal I think does this too, I don't think it really crashes, just no new output to my Shredder GUI at a certain depth. I'm not 100% sure no crash of the engine but I think I see the processor still working then. To me that looks like an exploding search but I don't know if that is what happened to Stockfish.

I think we should try to stay on track here... I think most needed is still somebody who spends time and effort in trying to reproduce the problem with the binary package that has been sent.

To me (knowing this part of the code rather well), it seems very unlikely it is related to basic properties of search (like extensions). For me, the prime candidate is still an issue internal to libwinpthread.

Additionally, I would like to know what the stack size is on this platform (I see 1Mb mentioned online, but I don't know windows+mingw enough to check if this is true).

@Krgp this script hanged 5/5 using your binary (and hangs w/ any binary built w/ recent gcc by msys2) and works fine w/ binaries built using the msys2 toolchain and the gcc by MinGW-W64 project (check the wiki instruction).
I wrote this script some weeks ago to investigated the strange hangs of the windows workers during my tests w/ fishtest, I thought than the culprit was the python worker code before finding the great bug analysis by @CoffeeOne

Set the max number of threads for your CPU to have fast hangs.

#!/usr/bin/python
from __future__ import print_function

import datetime
import json
import os
import glob
import stat
import subprocess
import shutil
import sys
import tempfile
import threading
import time
import traceback
import platform
import struct


def enc(s):
  return s.encode('utf-8')

def verify_signature():
  engine='stockfish_250819.exe'
  concurrency = 48
  if concurrency > 1:
    with open(os.devnull, 'wb') as f:
      busy_process = subprocess.Popen([engine], stdin=subprocess.PIPE, stdout=f, stderr=subprocess.STDOUT)
      busy_process.stdin.write(enc('setoption name Threads value %d\n' % (concurrency-1)))
      busy_process.stdin.write(enc('go infinite\n'))
      busy_process.stdin.flush()

  try:
    bench_sig = ''
    print('Verifying signature of %s ...' % (os.path.basename(engine)))

    with open(os.devnull, 'wb') as f:
      p = subprocess.Popen([engine, 'bench'], stderr=subprocess.PIPE, stdout=f, universal_newlines=True)
      p_out, p_err = p.communicate()

    for line in p_err.splitlines():
      if 'Nodes searched' in line:
        bench_sig = line.split(': ')[1].strip()
      if 'Nodes/second' in line:
        bench_nps = float(line.split(': ')[1].strip())

    if p.returncode != 0:
      raise Exception('Bench exited with non-zero code %d' % (p.returncode))

  finally:
    if concurrency > 1:
      busy_process.communicate(enc('quit\n'))
      busy_process.stdin.close()

  return bench_nps

def main():
  print(platform.python_version())
  n_iter=100
  for i in range(n_iter):
    print('Iteration %d of %d' % (i+1, n_iter))
    bench_nps=verify_signature()
    print('bench_nps=%d' % bench_nps)
  return 0

if __name__ == '__main__':
  main()

output:

bench_nps=935310
Iteration 21 of 100
Verifying signature of stockfish_250819.exe ...
bench_nps=877788
Iteration 22 of 100
Verifying signature of stockfish_250819.exe ...
bench_nps=874132
Iteration 23 of 100
Verifying signature of stockfish_250819.exe ...
Traceback (most recent call last):
  File "test.py", line 67, in <module>
    main()
  File "test.py", line 62, in main
    bench_nps=verify_signature()
  File "test.py", line 39, in verify_signature
    p_out, p_err = p.communicate()
  File "c:\Python27\lib\subprocess.py", line 794, in communicate
    stderr = _eintr_retry_call(self.stderr.read)
  File "c:\Python27\lib\subprocess.py", line 476, in _eintr_retry_call
    return func(*args)
KeyboardInterrupt

Do you guys think that the Stockfish binary that we have sent to the TCEC team would work, if we asked TCEC to use an older version of the DLL (keeping the same Stockfish binary)?

@snicolet the python script worked fine using the Kiran binary and the libwinpthread 7.0.0.5325 suggested by @CoffeeOne . Started a new run w/ 1000 iterations.

Iteration 98 of 100
Verifying signature of stockfish_250819.exe ...
bench_nps=888941
Iteration 99 of 100
Verifying signature of stockfish_250819.exe ...
bench_nps=886732
Iteration 100 of 100
Verifying signature of stockfish_250819.exe ...
bench_nps=843149

D:\___test\Sf Tcec 16>c:\Python27\python.exe test.py
2.7.9
Iteration 1 of 1000
Verifying signature of stockfish_250819.exe ...
bench_nps=851195
Iteration 2 of 1000
Verifying signature of stockfish_250819.exe ...

Replacing of just the dll libwinpthread-1.dll works for me, too. I cannot test with the original TCEC 16 build, because I don't have bmi2 hardware available, so I tested with self-made builds.
libwinpthread-1.txt

I attached the working dll (renamed from .dll to .txt)

But we should wait for the test of @ppigazzini to be finished of course.

@CoffeeOne The binary submitted to TCEC is accessible in this forum thread: https://groups.google.com/d/msg/fishcooking/PF3dcA8aPh8/Kahr3LufBAAJ

@snicolet finished

Iteration 996 of 1000
Verifying signature of stockfish_250819.exe ...
bench_nps=877572
Iteration 997 of 1000
Verifying signature of stockfish_250819.exe ...
bench_nps=835646
Iteration 998 of 1000
Verifying signature of stockfish_250819.exe ...
bench_nps=916806
Iteration 999 of 1000
Verifying signature of stockfish_250819.exe ...
bench_nps=865440
Iteration 1000 of 1000
Verifying signature of stockfish_250819.exe ...
bench_nps=874990

D:\___test\Sf Tcec 16>

The bench signature is the same w/ both libwinpthread-1.dll (Nodes searched : 3568210)

@snicolet match vs stockfish popcnt

D:\___test\Sf Tcec 16> .\cutechess-cli.exe -repeat -rounds 100 -tournament gauntlet -resign movecount=3 score=400 -draw movenumber=34 movecount=8 score=20 -concurrency 24 -engine cmd=stockfish_250819.exe option.Hash=64 -engine cmd=stockfish_120919_w64.exe option.Hash=64 -each proto=uci tc=10+0.1 -openings file=2moves_v1.pgn format=pgn order=random plies=16
Indexing opening suite...
Warning: 2 opening repetitions vs 1 games per encounter
Started game 1 of 100 (Stockfish 260819 64 BMI2 vs Stockfish 110919 64 POPCNT)
Started game 2 of 100 (Stockfish 110919 64 POPCNT vs Stockfish 260819 64 BMI2)
Started game 3 of 100 (Stockfish 260819 64 BMI2 vs Stockfish 110919 64 POPCNT)

...

Finished game 100 (Stockfish 110919 64 POPCNT vs Stockfish 260819 64 BMI2): 1/2-1/2 {Draw by adjudication}
Score of Stockfish 260819 64 BMI2 vs Stockfish 110919 64 POPCNT: 26 - 29 - 44  [0.485] 99
Finished game 99 (Stockfish 260819 64 BMI2 vs Stockfish 110919 64 POPCNT): 1/2-1/2 {Draw by adjudication}
Score of Stockfish 260819 64 BMI2 vs Stockfish 110919 64 POPCNT: 26 - 29 - 45  [0.485] 100
Elo difference: -10.4 +/- 50.8, LOS: 34.3 %, DrawRatio: 45.0 %
Finished match
D:\___test\Sf Tcec 16> 

@ppigazzini Thanks for the tests!

OK, same thing happened in game Alliestein-Stockfish, see the crash info and logs in the "crash-info" tab on http://tcec-chess.com

Crash dump generated manually while CPU usage was at 0%, attached with log

This time Aloril42 was online and used procdump to generate some dump files before Stockfish ran out of time (thanks!) : https://docs.microsoft.com/en-us/sysinternals/downloads/procdump
It was exciting because we had only a couple of seconds left on the clock :-)

@snicolet crash using the original version of libwinpthread ?

@ppigazzini
Yes. Nothing is changed yet on the TCEC machine, compared to the start of the division.

They are very strict with the crashing rules: we would have to make an official request, and then there would be a rule committee decision. It is not clear to me at the moment if they would accept to downgrade the libwinpthread library.

Some quick napkin math: Sf has now stalled twice in 19 games. With 23 games remaining, that gives us a 92% chance to stall at least one more time, and be kicked out of TCEC season 16.

At this point, requesting that TCEC downgrades libwinpthread seems to be the best course of action to try to salvage the tournament, even if we are not 100% certain that's the issue. It can't really get any worse at this point.

Not to create a panic or anything, buut:
Game #79 "Stockfish 190826" vs "Houdini 6.03" is estimated to start on Thu, 12 Sep 2019 22:43:06 GMT

There were 10 test games at 30+5 and no crashes in them. Maybe that alters the odds of a crash, or adds info on the possible cause.
If we could play remaining games at 30+5 maybe that would be more reliable?

Requesting a dll downgrade is probably the best course of action, since we seem to be reasonably sure that's what's causing the crashes. Hopefully someone is getting in contact with TCEC.

There were 10 test games at 30+5 and no crashes in them.

Are you sure they were 30+5? I could only find 9+1.5 games in the archive, which is too short TC to change crash frequency very much. Expected crash rate at TCEC would be 2/20 instead of 2/19 I guess.

If the cause is indeed the external library, then:
It would affect every run of search, the chance of a hang would happen is decided by the number of moves, not time controls and it can potentially hang even with one search thread.

For one thing, the quality of some open source software seem to be degrading, given that we may want to refrain from using any unreliable compilers/toolchains in attempt to stretch a little to none performance increase in release and other submissions(working on compiler flags only is fine).

I could only find 9+1.5 games in the archive,

Check carefully - Leela was playing with 30% of the time of the Div P contestants - you've seen Leela's time (it threw me at first!), sf had 30+5.

A question for people who know Windows, are the crash dumps useful for finding out the cause of this problem?

@ppigazzini thanks for the careful testing. So, this is now rather clear the bug is in a mingw provided threading library and not the chess engine. Changing TC or number of threads makes no sense, only replacing the library is a proper fix (or as TCEC handles the crashes caused by network software, just resume games).

This is TCEC's decision :

Official statement by the the TD of TCEC: after a second Stockfish crash (hang) there is a high possibility that the engine gets a third strike and is disqualified. The rules do not allow any update to the code submitted, no matter if that is the actual binary or a third party .dll as is the current case. However, TCEC also realizes the hard effort put into every line of code by every participant. As the rules do not allow us to accept a change, we will put the decision in the hands of the participants themselves.

A poll will be created with the following question, "Should Stockfish team be allowed to replace libwinpthread-1.dll with non-buggy version?". Each engine author participant in Div P will have the right to vote with the right to veto. If one or more votes come as "no" the question is dismissed

If there are zero "no" votes then Stockfish team can submit a new .dll, but no other changes to the engine. Then a second poll will be created, where the majority will be counted, with the question, "Should crashes games up to that point be resumed at point of crash?"

Note: if Stockfish crashes even after .dll update, that will count as third crash and the engine will be disqualified

It is better than I expected. This whole episode is very unfortunate and no one here is at fault. Simply bad luck with a faulty third party dll. Perhaps there is a way we can avoid using 3rd party dlls even if we sacrifice some speed since that will almost always be impossible to test beforehand. I rather be safe and a tad slower than sorry.

Allie and Scorpio devs voted yes. There is an ongoing vote at Leela's discord, and the result seems very uncertain. The "no" voters mostly argues that this should have been tested better.

Well yea ofc the Leela people might want to vote no. After all SF is Leela's biggest competitor.

Yes, SF is LC0's biggest competitor, but I think they would rather LC0 win by beating SF than through SF crashing out. And I think they have good sportsmanship as well.

It’s their choice, they can choose the high road or they can choose the low road. We’ll see what type of character they are made of regardless.

Just for info about leela vote, Leela vote will be decided by Leela community (from approx 3000 members in discords) not by Leela developers. Traditionally, Leela's decision ( network choice for tournaments etc) were done by community vote and they keep doing the same tradition for this vote. (current status - 61 yes update, 59- no)

TCEC happens every few months now, so why not just pull out? I don't really see the point of going through this, given the situation. With all due respect.

It is better than I expected. This whole episode is very unfortunate and no one here is at fault. Simply bad luck with a faulty third party dll. Perhaps there is a way we can avoid using 3rd party dlls even if we sacrifice some speed since that will almost always be impossible to test beforehand. I rather be safe and a tad slower than sorry.

@MichaelB7
For clarification:
The given libs are used anyway. The difference is only if the linked static or dynamic. Per default SF uses in its makefile static linking. For a faster executable LTO can be used (what Kirian does) but the mingw compiler can't combine this option with static linking, so dynamic linking has to be used. So the DLL's are separate files from the SF executable. In static linking the DLL's would be combined with the SF code in a single executable. So using this libs changes nothing at the speed but using LTO does. And which version of the lib is used depends on your mingw installation.

Thanks @locutus2, apparently there are times where it’s better to be on macOS. Of course , we have different issues , but nothing like the one you described. Static linking is not an endorsed option for macOS.

If one of the competitors vote “no” , I would support a pull out. I believe we would be duty bound to stay in if they all vote “yes”, for better or for worse.

I’m actually a member of Leela Discord , but it’s not apparent where to vote. Not 100% sure I will vote - but I would like see where it’s at.

I thought Leela has itself intelligence and can vote without help :-)

Anyway, it will be better if we can test the sended files for next competitions like 1000 games with TC and settings close to competition ones.

@MichaelB7 , go to "dev-log" channel

Do we have a link to the lib we want to use if allowed? Has it already been sent to TCEC to avoid any delays if they allow us to change it?

Aloril asked me this :

In case vote is all yes, do you have replacement DLL?

@ppigazzini @CoffeeOne Do you confirm that the .dll version (attached as .txt) earlier in this thread by CoffeeOne is not buggy ?

Can anybody else who can test on windows confirm it is not buggy ?

If we get "yes" answers from other engine authors, I'd really want to make sure we don't mess up by sending a wrong replacement .dll !

I confirm that I attached the right version, maybe @ppigazzini can download the txt file from here and make a binary comparison, too. Of course I do not confirm that the dll is not buggy, nobody can do that.

I have extracted the library from his link (tar.xz) and it is identical to yours :)

Edit:

$ cmp libwinpthread-1.txt mingw64/bin/libwinpthread-1.dll
$

Wow someone actually voted no. I guess now we hope SF can survive till the end. Edit: Houdart probly voted no. He said no to K a while back too when it had that speed issue.

I also support that No, because it was not necessary.
I reported the problem on Jul 10th and got very little / no support.

@CoffeeOne Little did we know. You are correct of course - this does appears to be the exact same bug I believe. But at the time , no one thought it was the dll - it did appear to be one off related to Windows - and here it is , TCEC , running Windows, runs into the exact same issue. With one No vote , I would withdraw.

If Houdart was the one voting No , it might be a while before I upgrade Houdini. It just doesn’t sit right with me.

I see no reason to withdraw. Everyone wants to see Stockfish play. It is not like it is going to aggravate an injury. If it happens again...whatever. And certainly, if Stockfish qualifies anyway, it deserves to advance.

@snicolet updated the wiki script to cross compile with Ubuntu 18.04

https://github.com/glinscott/fishtest/wiki/Building-stockfish-on-Windows#cross-compilation-with-ubuntu-1804

Thanks!

There is a reference to libwinpthread-1.dll in that wiki page, should we link to a non-buggy version?

@snicolet I will update that page in the weekend ;)

@ppigazzini good info in that cross compile page (the wine trick is nice), that should allow us to include some of it in travis... will have a look.

I think it should be possible to do it without a sed on the Makefile, will let you know.

@ppigazzini the build instruction without a 'sed' on the Makefile is:

make profile-build ARCH=x86-64-modern COMP=mingw PGOBENCH="wine ./stockfish.exe bench" -j

@vondele wiki script updated w/ your suggestion and using a couple of functions to build SF for different platforms ('pgo/no pgo' build according to the builder CPU architecture), here a snippet of code (I prefer to have readable code in the wiki).

#!/bin/bash
# functions to build stockfish
_build_sf () {
make build ARCH=x86-64$1 COMP=mingw -j                                      
strip stockfish.exe                                                                                                     
mv stockfish.exe ../../stockfish-x64$1.exe                                                                                
make clean                                                                                                              
}

_build_sf_pgo () {
make profile-build ARCH=x86-64$1 COMP=mingw PGOBENCH="wine ./stockfish.exe bench" -j                                      
strip stockfish.exe                                                                                                     
mv stockfish.exe ../../stockfish-x64$1-pgo.exe                                                                                
make clean                                                                                                              
}

# function calls
_build_sf_pgo
_build_sf_pgo  -modern
_build_sf      -bmi2

Aside from the library issue, is it also possible that one of those threads had a stack overflow due to high depth and exited without crashing the entire program?
The stack size reservation on the provided binary is 2MB.

I guess as a safe measure we should raise it to match x86-64 Linux default, which is 8MB.

I figured that it seems it may be one of those C++11 loopholes, there is no way to specify std::thread stack size!

@noobpwnftw we've fixed a low stack issue in macOS before see (NativeThread in thread_win32_osx.h), so we could presumably do that in mingw as well.

Using -fstack-usage in gcc, the maximum stack usage of search is currently 4128 bytes and qsearch is 3568. Given that search can recurse ~245 times and qsearch about 10 or so, we're very close to 1Mb usage. So 2Mb would be enough, but if we have 1Mb we could fail. Note that the stack size reservation of the main program thread, is not necessarily equal to that of the created threads.

Edit: I've seen your https://github.com/official-stockfish/Stockfish/pull/2303 now, that's a good idea.

@CoffeeOne or whoever had reproducible hangs with the new libwinpthread DLL, does applying https://github.com/official-stockfish/Stockfish/pull/2303 and without downgrading the DLL change anything?

@noobpwnftw worth testing, but unlikely (since @CoffeeOne sees the hangs during a bench at low depth, i.e. low stack usage).

Does anybody know (or can construct) a position which reaches selective depth = 250 for nominal depth = 13? Could be a good candidate position to add to bench!

note that even a position which reaches such high seldepth quickly (i.e. seconds) would be useful, even if it reaches it at higher depth. In that case, we could make this an additional test in the travis CI.

Here is a branch which writes some debug messages in the terminal concerning the stop flag and the depths reached by threads, allows to change the MAX_DEPTH on a thread-by-thread basis, test for infinite search explosion even from the starting position, that sort of things:
https://github.com/snicolet/Stockfish/tree/search_explosion2

• the messages are written on lines beginning with [DEBUG_HANG], you can search for this word in the code

• usage:
./stockfish setoption name Threads value 3 go wtime 10000 btime 10000 winc 5000 binc 5000

• by modifying line 391 of search.cpp, we can change maximum depth of each thread

• by commenting line 562 of search.cpp, we can force the emergency time check in
check_time() to trigger

• by modifying line 1040 of search.cpp, we can simulate search explosion for each thread

===> does this branch help testing the stack usage?

@vondele the script https://github.com/official-stockfish/Stockfish/issues/2291#issuecomment-530572074 uses the worker process:

  • concurrency=N
  • one instance of stockfish running 'go infinite' w/ N-1 threads
  • one instance of stockfish running 'bench' w/ 1 thread

The instance of stockfish running 'bench' seems to hang only w/ concurrency>2, and it hangs very fast when using concurrency=max CPU threads. I have no evidence that 'stockfish bench' hangs w/ only one thread.

@CoffeeOne has done more testing, so he could confirm that 'stockfish bench' hangs w/ only one thread.

@noobpwnftw https://github.com/official-stockfish/Stockfish/pull/2303 hangs running the script w/ concurrency=8 (latest gcc built by msys2)

PS D:\__test_worker> C:\Python27\python.exe .\test.py
2.7.16
Iteration 1 of 100
Verifying signature of stockfish-pthreads2303.exe ...
bench_nps=1119493
Iteration 2 of 100
Verifying signature of stockfish-pthreads2303.exe ...
bench_nps=1074747
Iteration 3 of 100
Verifying signature of stockfish-pthreads2303.exe ...
bench_nps=1113040
Iteration 4 of 100
Verifying signature of stockfish-pthreads2303.exe ...
Traceback (most recent call last):
  File ".\test.py", line 67, in <module>
    main()
  File ".\test.py", line 62, in main
    bench_nps=verify_signature()
  File ".\test.py", line 39, in verify_signature
    p_out, p_err = p.communicate()
  File "C:\Python27\lib\subprocess.py", line 478, in communicate
    stderr = _eintr_retry_call(self.stderr.read)
  File "C:\Python27\lib\subprocess.py", line 125, in _eintr_retry_call
    return func(*args)
KeyboardInterrupt

[EDIT] at first I suspected the python code on Windows, so I rewrote that code using the best practice for subprocess and buffers. But I haven't experienced hangs w/ stockfish built using gcc from MingW-w64 or downgrading libwinpthread-1.dll , so the latest libwinpthread-1.dll from msys2 seems to be the culprit.

@ppigazzini OK, so the possibility of a stack overflow may have triggered the hang seems eliminated.
Thanks for testing, it's good to know.

@snicolet good idea, just giving unconditionally extensions makes it easy to test stack usage. With this:

diff --git a/src/search.cpp b/src/search.cpp
index 79942bcdf..0b5ec4d2b 100644
--- a/src/search.cpp
+++ b/src/search.cpp
@@ -1011,6 +1011,10 @@ moves_loop: // When in check, search starts from here
                && pos.pawn_passed(us, to_sq(move)))
           extension = ONE_PLY;

+      extension = ONE_PLY;
+      if (PvNode)
+      std::cout << ss->ply << std::endl;
+
       // Calculate new depth for this move
       newDepth = depth - ONE_PLY + extension;

we easily reach MAX_DEPTH from the startpos, also for PV lines. This allows testing (together with e.g. ulimit -s 1024 what we need for a stack limit. Quick testing under linux indeed shows we need more than 1Mb, but we're fine with 1.5Mb.

Testing this modified code with a mingw compile under wine, shows no problem with stack size in those conditions. This might not be identical to testing this 'natively' under windows.

I can only provide two positions (originally posted in a thread started by Uri Blass on talkchess) where the search very quickly arrives at the 50-move rule.

position fen k1b5/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/8/8/K1B5 w - - 0 1
position fen k1r5/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/8/8/K1B5 w - - 0 1

SF used to have/still has? problems finish searching these.

@ppigazzini OK, so the possibility of a stack overflow may have triggered the hang seems eliminated.
Thanks for testing, it's good to know.

@noobpwnftw @vondele @snicolet at the moment we have:

  • worker process - simple 'go infinite' and 'bench':

    • SF built w/ msys2 hangs very fast

    • SF built w/ mingw-w64 seems fine

  • games a VLTC w/ high CPU:

    • SF built w/ msys2 sometime hangs

    • SF built w/ mingw-w64 no data

These could be two different unrelated problems.
w/ the fast hangs in the worker process I don't understand how SF can finish long games at VLTC ...

The chances of a hang caused by libwinpthread is related to how many go, stop commands are sent(each time search starts/stops, there is a wait signal), not related to time controls.

@noobpwnftw that is intuitive, but why would there be hangs on a 'go infinite', where these are essentially absent?

I guess he still has to send a stop after go infinite to check if the engine respond or not...

@vondele in the worker process the hang is w/ the SF running 'bench', but I view this hang only when there is another SF running 'go infinite' (more threads bigger probability)

(@snicolet as SF maintainer you should use Windows as main OS :-P )

so no hang in the go infinite process (it still does sync_cout, which at least grabs a mutex, which I assume might require calling into the library), only in the bench process ? So, it seems this bug in the dll triggers with higher likelihood when another process is active, which would explain why it is rare in games (cutechess is more or less idle during games). Maybe you see it more often during games if you play at STC with a few threads, but have concurrency as well in the games (i.e. concurrency option to cutechess)?

@vondele @noobpwnftw @tomtor
the worker hangs in p_out, p_err = p.communicate() after the command stockfish bench, the previous SF instance ('stockfish go infinite') could do anything.

    with open(os.devnull, 'wb') as f:
      p = subprocess.Popen([engine, 'bench'], stderr=subprocess.PIPE, stdout=f, universal_newlines=True)
      p_out, p_err = p.communicate()

here the complete function:

def verify_signature():
  engine='stockfish_250819.exe'
  concurrency = 48
  if concurrency > 1:
    with open(os.devnull, 'wb') as f:
      busy_process = subprocess.Popen([engine], stdin=subprocess.PIPE, stdout=f, stderr=subprocess.STDOUT)
      busy_process.stdin.write(enc('setoption name Threads value %d\n' % (concurrency-1)))
      busy_process.stdin.write(enc('go infinite\n'))
      busy_process.stdin.flush()

  try:
    bench_sig = ''
    print('Verifying signature of %s ...' % (os.path.basename(engine)))

    with open(os.devnull, 'wb') as f:
      p = subprocess.Popen([engine, 'bench'], stderr=subprocess.PIPE, stdout=f, universal_newlines=True)
      p_out, p_err = p.communicate()

    for line in p_err.splitlines():
      if 'Nodes searched' in line:
        bench_sig = line.split(': ')[1].strip()
      if 'Nodes/second' in line:
        bench_nps = float(line.split(': ')[1].strip())

    if p.returncode != 0:
      raise Exception('Bench exited with non-zero code %d' % (p.returncode))

  finally:
    if concurrency > 1:
      busy_process.communicate(enc('quit\n'))
      busy_process.stdin.close()

  return bench_nps

EDIT_000 : btw also the worker master code hangs (IMO here the python code is sub optimal)

def verify_signature(engine, signature, remote, payload, concurrency):
  if concurrency > 1:
    busy_process = subprocess.Popen([engine], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    busy_process.stdin.write(enc('setoption name Threads value %d\n' % (concurrency-1)))
    busy_process.stdin.write(enc('go infinite\n'))
    busy_process.stdin.flush()

  try:
    bench_sig = ''
    print('Verifying signature of %s ...' % (os.path.basename(engine)))
    with open(os.devnull, 'wb') as f:
      p = subprocess.Popen([engine, 'bench'], stderr=subprocess.PIPE, stdout=f, universal_newlines=True)
    for line in iter(p.stderr.readline,''):
      if 'Nodes searched' in line:
        bench_sig = line.split(': ')[1].strip()
      if 'Nodes/second' in line:
        bench_nps = float(line.split(': ')[1].strip())

    p.wait()

@vondele

  • -concurrency 2 option.Threads=4
.\cutechess-cli.exe -repeat -rounds 1000 -tournament gauntlet -resign movecount=3 score=400 -draw movenumber=34 movecount=8 score=20 -concurrency 2 -engine cmd=stockfish-master.exe option.Hash=8 option.Threads=4 -engine cmd=stockfish-master.exe option.Hash=8 option.Threads=4 -each proto=uci tc=10+0.1 -openings file=2moves_v1.pgn format=pgn order=random plies=16

Indexing opening suite...
Started game 1 of 1000 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT)
Started game 2 of 1000 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT)
Terminating process of engine Stockfish 150919 64 POPCNT(2)
Finished game 2 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT): 0-1 {White's connection stalls}
Score of Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT: 1 - 0 - 0  [1.000] 1
Finished game 1 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT): * {No result}
Score of Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT: 1 - 0 - 0  [1.000] 1
Elo difference: inf +/- nan, LOS: 84.1 %, DrawRatio: 0.0 %
Finished match
  • -concurrency 1 option.Threads=4
    ```
    .\cutechess-cli.exe -repeat -rounds 1000 -tournament gauntlet -resign movecount=3 score=400 -draw movenumber=34 movecount=8 score=20 -concurrency 1 -engine cmd=stockfish-master.exe option.Hash=8 option.Threads=4 -engine cmd=stockfish-master.exe option.Hash=8 option.Threads=4 -each proto=uci tc=10+0.1 -openings file=2moves_v1.pgn format=pgn order=random plies=16

Indexing opening suite...
Started game 1 of 1000 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT)
Finished game 1 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT): 1/2-1/2 {Draw by adjudication}
Score of Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT: 0 - 0 - 1 [0.500] 1
Started game 2 of 1000 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT)
Finished game 2 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT): 1/2-1/2 {Draw by adjudication}
Score of Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT: 0 - 0 - 2 [0.500] 2
Started game 3 of 1000 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT)
Finished game 3 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT): 1/2-1/2 {Draw by insufficient mating material}
Score of Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT: 0 - 0 - 3 [0.500] 3
Started game 4 of 1000 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT)
Finished game 4 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT): 1/2-1/2 {Draw by adjudication}
Score of Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT: 0 - 0 - 4 [0.500] 4
Started game 5 of 1000 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT)
Finished game 5 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT): 1-0 {White wins by adjudication}
Score of Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT: 1 - 0 - 4 [0.600] 5
Started game 6 of 1000 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT)
Finished game 6 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT): 1/2-1/2 {Draw by 3-fold repetition}
Score of Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT: 1 - 0 - 5 [0.583] 6
Started game 7 of 1000 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT)
Finished game 7 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT): 1/2-1/2 {Draw by adjudication}
Score of Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT: 1 - 0 - 6 [0.571] 7
Started game 8 of 1000 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT)
Terminating process of engine Stockfish 150919 64 POPCNT(1)
Elo difference: 88.7 +/- 116.0, LOS: 92.1 %, DrawRatio: 75.0 %
Finished match
Finished game 8 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT): 0-1 {White's connection stalls}
Score of Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT: 2 - 0 - 6 [0.625] 8

well, anyway, I think this all firmly points to the same cause, e.g. bug in the .dll, and I don't see any indications to the contrary.

@vondele I don't know the SF code, what libwintpthread do and how we use it, so I wanted to be sure to add useful information in this issue.

much appreciated, indeed.

@CoffeeOne or whoever had reproducible hangs with the new libwinpthread DLL, does applying #2303 and without downgrading the DLL change anything?

@noobpwnftw
I compiled the branch threads of https://github.com/noobpwnftw/Stockfish.git
It does not change anything for me using up to date MSYS2 environment.
I only need to try a profile build with my machine, which does not finish because of the hanging bench.

@CoffeeOne has done more testing, so he could confirm that 'stockfish bench' hangs w/ only one thread.

Yes, I have hangs with a normal bench (1 thread), making a profile-build impossible.
I all documented it in the 2 issues (one stockfish and a bit more info here: https://github.com/msys2/MINGW-packages/issues/5610)

@CoffeeOne So that proves your problem is not related to thread stack size at least for MINGW.
I have made another branch: https://github.com/noobpwnftw/Stockfish/tree/threads2 which uses native Windows API to create threads, can you please test it and see if it changes anything?

@noobpwnftw
I have hangs in bench with your branch threads2, too.

@CoffeeOne That's odd, since that branch does not use pthread to create search threads.
Can you please post the outputs of g++ -dM -E -x c++ - < /dev/null | grep WIN?

$ g++ -dM -E -x c++ - < /dev/null | grep WIN
#define _WIN32 1
#define _WIN64 1
#define __WINT_MAX__ 0xffff
#define __WINT_MIN__ 0
#define __WIN32 1
#define __WIN64 1
#define __WINNT 1
#define __WINNT__ 1
#define __WIN32__ 1
#define __SIZEOF_WINT_T__ 2
#define WIN32 1
#define WIN64 1
#define __WINT_TYPE__ short unsigned int
#define __WINT_WIDTH__ 16
#define WINNT 1
#define __WIN64__ 1

Well, it looks like your problem is not related to libwinpthread, it looks like a mis-compile.

@noobpwnftw can you explain how you get to that conclusion? Is there no other dependence on the .dll (e.g. for mutex or condition variables or atomics ....). I assume that would imply the dll is not any more needed to run the binary.

@CoffeeOne can you compile your code with optimize=no (after a make clean) and see if that matters?

Done, Liklihood of hangs goes down, but it still hangs.
I remember, I made the problem go away, when I modified cond.c in the winpthread library, and build the library by myself. So it must have to do with condition variables, I guess they are used in the mutex calls.

@vondele Yes, since given the compiler flags, https://github.com/noobpwnftw/Stockfish/blob/afe4631e053419b91bf3110c69e6ed45fd429770/src/thread_win32_osx.h#L38 will be true and SF will use native APIs for mutex, https://github.com/noobpwnftw/Stockfish/blob/afe4631e053419b91bf3110c69e6ed45fd429770/src/thread_win32_osx.h#L83 will also be true and threads are created with native APIs.
And atomic operations on simple bool types are very likely to be compiled like lock cmpxchg instructions anyway.

At least no obvious reasons I can see that will make use of pthread, do you mind uploading one of your faulty compiles?

@noobpwnftw
When I compile with
make build ARCH=x86-64-modern COMP=gcc -j
I see the the created exe file is still dependant on libwinpthread:

$ ldd stockfish.exe
        ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7ffb38b20000)
        KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7ffb36bd0000)
        KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7ffb35bb0000)
        msvcrt.dll => /c/WINDOWS/System32/msvcrt.dll (0x7ffb36f10000)
        libwinpthread-1.dll => /mingw64/bin/libwinpthread-1.dll (0x64940000)
        libstdc++-6.dll => /mingw64/bin/libstdc++-6.dll (0x6fc40000)
        libgcc_s_seh-1.dll => /mingw64/bin/libgcc_s_seh-1.dll (0x61440000)

libstdc++-6.dll is dependent on libwinpthread-1.dll, but that is not the problem.
The problem is, does Stockfish use it anywhere? If Stockfish doesn't, and speculations that even that will still cause a hang due to some underlying implementation is correct, then almost every multi-threaded program will have problem regardless of how they spawn threads, and I think that is highly unlikely the case.

A miscompile at -O0 (optimize=no) however, is very unlikely. One way to test is to use an old toolchain that works fine, create a binary, but use a newer libwinpthread.

@noobpwnftw
I can't follow your arguments anymore.
When I make a non static build (so dlls are needed) of your branch threads2, afterwards copy away the exe + I copy additionally libgcc_s_seh-1.dll, libstdc++-6.dll but no winpthread,
I try to start stockfish.exe in a normal windows cmd
grafik
So that is on start, I do not come to the point to enter a single command, like uci
So libwinpthread is still needed.

@CoffeeOne the argument is that libwinpthread is needed because it is used in libstdc++... which is indeed sound.

Yes, but it's needed in runtime, when stockfish is executed, too. Also in noob's thread2 branch.

dll will be loaded at program startup, so that's not impossible. Let's see what noob can figure out.

You are not going to like what I found, I have pushed a new commit on threads2 with a naive implementation of ConditionVariable, to test if std::condition_variable_any is compatible with our own implementation of Mutex, I suspect not.

When I get home I’ll try to rewrite our calls to mutex and/or condition variables with custom yielding spinlocks, and see what happens.

Oh, our intentions crossed :-)

@noobpwnftw
I don't know if I should have done it, but I pulled your last commit from threads2, and tested.
It compiles fine, but similar result.
The executable is still dependant on libwinpthread at runtime (won't even start without the dll)
and it still hangs.

@CoffeeOne
Well, this is miserable: after I use custom ConditionVariable, I checked the only import reference remains in Stockfish binary to libwinpthread is pthread_mutex_destroy, which has no code reference in binary code within Stockfish, but it still requires the DLL to start, which I don't think there is any way to remove it because other runtime library like libstdc++-6.dll would still need it.
This is my test compile of threads2 branch which I think is free of anything that relies on pthread, except for some unknown runtime functions but if they break, they'd break a lot of things.
testcompile.zip

Can you also post one of your compiles of my branch(which hangs) since I have not experienced any hangs with both versions of libwinpthread? Thanks.

I add my compilation of threads2, it was compiled with: make build ARCH=x86-64-modern COMP=mingw, so it does not depend on any external dlls, and it's not stripped.
stockfish.zip
and it hangs on this PC :)

I cannot use your testcompile, my testing machine has a Amd FX Cpu, I need a modern compile.

testcompile2.zip
This is a modern pgo compile from MINGW GCC 9.2, with all depending DLLs from a fresh install after pacman -Syuu.

I have checked your compile:
It doesn't hang on my system, has no reference to pthread conditional variables(via disassembling).
Stack size at 8MB, although it has usually high stack usage
image but should be within safe range.

So I still think your problem is unrelated to 1) stack size, 2) libwinpthread.

So I thought the crash at TCEC is due of the faulty libwinpthread dll? But somehow @CoffeeOne has another crash problem with a different cause? Are we still reasonably certain downgrading the dll will fix the crash issue at TCEC?

Thx.
Your compilation also hangs.
It's timing dependant, the hangs - interestingly - happen a lot more often, when executing inside Mingw-w64 64 bit shell (instead of windows cmd).
Are you sure, that the winpthread lib is not called anymore via std::mutex usage in stockfish in your branch?

Yes, I am very sure because there is not even such function imports from the DLL except for one:
image
Comparing to TCEC binary:
image

So now one thing remains is for you to use a downgraded libwinpthread and see if the hang persists. If so, I need to look into the exact differences between the library and find how that still affects us.

It'd be a very serious issue if some underlying calls to pthread from libstdc++ in MINGW would be bugged, that would affect virtually everything so I think that may not be the case, but who knows.

When I combine your testcompile2 with the older 5325 version of the libwinpthreads lib, I have hangs, too.
I don't have hangs with the 5325 library with latest official-stockfish.

Just as a reference, I have compiled my branch with MSVC, and removing _MSC_VER limit so that it use the same custom implementations of mutex and condition variables as testcompile2. To eliminate the chances of a wrong implementation. Can you please also test this one?
Stockfish-MSVC.zip

It also hangs

Great, so my implementation has a problem at least on your system.

@noobpwnftw two systems: it hangs also w/ my Intel 3770k.

.\cutechess-cli.exe -repeat -rounds 1000 -tournament gauntlet -resign movecount=3 score=400 -draw movenumber=34 movecount=8 score=20 -concurrency 2 -engine cmd=stockfish-msvc.exe option.Hash=8 option.Threads=4 -engine cmd=stockfish-msvc.exe option.Hash=8 option.Threads=4 -each proto=uci tc=1+0.01 -openings file=2moves_v1.pgn format=pgn order=random plies=16
Indexing opening suite...
Warning: 2 opening repetitions vs 1 games per encounter
Started game 1 of 1000 (Stockfish 160919 64 vs Stockfish 160919 64)
Started game 2 of 1000 (Stockfish 160919 64 vs Stockfish 160919 64)
Finished game 2 (Stockfish 160919 64 vs Stockfish 160919 64): 1-0 {White wins by adjudication}
Score of Stockfish 160919 64 vs Stockfish 160919 64: 0 - 1 - 0  [0.000] 1
Started game 3 of 1000 (Stockfish 160919 64 vs Stockfish 160919 64)

...

Started game 25 of 1000 (Stockfish 160919 64 vs Stockfish 160919 64)
Terminating process of engine Stockfish 160919 64(3)
Finished game 16 (Stockfish 160919 64 vs Stockfish 160919 64): 1-0 {Black's connection stalls}
Score of Stockfish 160919 64 vs Stockfish 160919 64: 7 - 13 - 4  [0.375] 24
Finished game 25 (Stockfish 160919 64 vs Stockfish 160919 64): * {No result}
Score of Stockfish 160919 64 vs Stockfish 160919 64: 7 - 13 - 4  [0.375] 24
Elo difference: -88.7 +/- 138.2, LOS: 9.0 %, DrawRatio: 16.7 %
Finished match

The cross-compile w/ Ubuntu runs fine the same match w/ cutechess.

Since I'm unable to reproduce those hangs locally, I have to bother you repeatedly, but I want to get to the bottom of this.
Here are another implementation of conditional variables with Semaphores(pushed to threads2), which may fix a race case. Can you please help with testing? Thanks.

stockfish-MINGW.zip
Stockfish-MSVC.zip

Both hang.

The first one (mingw) crash at start.
The second one (msvc) hangs after 24 games.

@ppigazzini You need the dlls for the first one.

@CoffeeOne to have the same ground truth please run this tournament at very STC w/ the Abrok compile

.\cutechess-cli.exe -repeat -rounds 1000 -tournament gauntlet -resign movecount=3 score=400 -draw movenumber=34 movecount=8 score=20 -concurrency 2 -engine cmd=stockfish_19091500_x64_modern.exe option.Hash=8 option.Threads=4 -engine cmd=stockfish_19091500_x64_modern.exe option.Hash=8 option.Threads=4 -each proto=uci tc=1+0.01 -openings file=2moves_v1.pgn format=pgn order=random plies=16

Removing all custom mutex and conditional variables entirely and rely on compiler
defaults.
stockfish-MINGW.zip
Stockfish-MSVC.zip

The Mingw hangs for me with the newest libwinpthreads,
and works with older 5325 version of the lib.
The MSVC version works for me.

EDIT: I have to make a break now, will be back in ~ 45 minutes

@noobpwnftw the MSVC is running fine (450/1000 games). I will update the results in half an hour.

Great! So there is some progress, I probably know the reason, will file a PR. Thanks very much.

EDIT:
These are complies after restoring the mess during my investigation:
stockfish-MINGW.zip
Stockfish-MSVC.zip

Expect hangs with the latest libwinpthread, but not with the older one.
MSVC version should also work.

If it turns out as expected, then I can put things together and explain to you how this happened.

@noobpwnftw I confirm the results of @CoffeeOne : MSVC finished the tournament w/o hangs, the MINGW hangs fast w/ the newer libwinpthread and is running fine w/ the older 5325 version (400/1000 games).
EDIT: the results are for the https://github.com/official-stockfish/Stockfish/issues/2291#issuecomment-531597603

@noobpwnftw results for https://github.com/official-stockfish/Stockfish/issues/2291#issuecomment-531598592:

  • mingw hangs fast w/ latest libwinpthread and runs fine w/ the older one (1000/1000 games)
  • msvc runs fine (1000/1000 games)

Great! So there is some progress, I probably know the reason, will file a PR. Thanks very much.

EDIT:
These are complies after restoring the mess during my investigation:
stockfish-MINGW.zip
Stockfish-MSVC.zip

Expect hangs with the latest libwinpthread, but not with the older one.
MSVC version should also work.

If it turns out as expected, then I can put things together and explain to you how this happened.

Yes, same for me, and the similar to your previous attached versions:
Hang for the MINGW version with latest libwinpthread,
Working: MINGW with 5325 libwinpthread and MSVC version

@CoffeeOne for sake of a complete confirmation, please also test this one:
Stockfish-MSVC.zip
It should not hang.

Looks good to me => no hangs

@CoffeeOne to have the same ground truth please run this tournament at very STC w/ the Abrok compile

.\cutechess-cli.exe -repeat -rounds 1000 -tournament gauntlet -resign movecount=3 score=400 -draw movenumber=34 movecount=8 score=20 -concurrency 2 -engine cmd=stockfish_19091500_x64_modern.exe option.Hash=8 option.Threads=4 -engine cmd=stockfish_19091500_x64_modern.exe option.Hash=8 option.Threads=4 -each proto=uci tc=1+0.01 -openings file=2moves_v1.pgn format=pgn order=random plies=16

@ppigazzini
I ran this 4 threads match with 2 games at the same time so the CPU was ~ 100%.
I stopped at game 604 => no stalled engines until this point

@CoffeeOne for sake of a complete confirmation, please also test this one:
Stockfish-MSVC.zip
It should not hang.

@noobpwnftw it runs fine (1000/1000 games)

Completely unrelated, but as a web developer who's never written a single line of C and C++ you guys are doing some god level debugging that's seriously impressive to me. Not that I understand a single bit of what's going on exactly 🤣.

Thanks! I think I have a convincing conclusion for my investigations, please see https://github.com/official-stockfish/Stockfish/pull/2307.

@adentong we are taking applications for web developers to improve fishtest, take a look. :)

@ppigazzini Haha I can certainly do some work over the weekends whenever I'm not on call :D.

So I have pushed this branch: https://github.com/snicolet/Stockfish/tree/yielding_spinlock4

As promised, I have tried to implement a custom Mutex class based on the idea of yielding spinlocks, and to declare ConditionVariable to be an instance of std::condition_variable_any :

typedef std::condition_variable_any ConditionVariable; 

My implementation no longer uses specific Windows code, no #if defined(_WIN32) conditional code, so the code path should be the same for Mac, Linux and Windows.

It seems to run fine on my Mac. If @ppigazzini and @CoffeeOne could test this branch and tell if it works on Windows, it would be an interesting data point :-)

Thanks! I think I have a convincing conclusion for my investigations, please see #2307.

@noobpwnftw
I am sorry, but with your latest threads2 I still have hangs

@CoffeeOne
You'd need to use old libwinpthread or there will always be hangs under MINGW regardless of the fix.

@snicolet dynamic build w/ gcc version 9.2.0 (Rev2, Built by MSYS2 project):

  • hangs fast w/ the latest version of libwinpthread (as expected I suppose)
  • runs fine w/ the older version of libwinpthread (1000/1000 games)

@snicolet I like the simple way of using a custom spin-lock, however under heavily congested conditions, unlike most OS implementations, where they usually spin in user-land for a while and then go into kernel alert queues which are almost FIFO, it may cause the lock to spin for quite some time. We don't use such locks very often so I think heavy but reliable ones are OK.

@snicolet
https://github.com/snicolet/Stockfish.git
Branch yielding_spinlock4
works 100% for me with latest libwinpthreads
👍
I even run the test of @ppigazzini

....
Started game 1000 of 1000 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT)
Finished game 999 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT): 1-0 {White wins by adjudication}
Score of Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT: 316 - 348 - 335  [0.484] 999
Finished game 1000 (Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT): 1/2-1/2 {Draw by adjudication}
Score of Stockfish 150919 64 POPCNT vs Stockfish 150919 64 POPCNT: 316 - 348 - 336  [0.484] 1000
ELO difference: -11
Finished match

@ppigazzini Are you sure that you compiled the right branch?

@noobpwnftw These are yielding spinlocks, they spin only a little bit and then yield to the OS, so I think they don't put too much pressure even in congested situations. We used them in Stockfish instead of std::mutex a couple of years ago right after the transition to C++11 for the YBWC parallel alpha-beta (before Lazy SMP), because they handled hyper-threading better. See that test which gave a +140 Elo result, it was the good old time :-)

http://tests.stockfishchess.org/tests/view/54fae0500ebc5902160ebfd9

That said, in our current problem, the only advantage I see in my custom Mutex class is that it is independent of pthread since it only relies on std::atomic_int. But most probably std::condition_variable_anystill uses pthread.

To get completely rid of pthread, we would need to remove the condition variable cv in the Thread class. Do you see how to do that, using only mutexes?

@CoffeeOne yes, I downloaded Stockfish-yielding_spinlock4.zip from GitHub.
To be sure I'll redo all from ground zero.

@CoffeeOne The first time I took the lazy way to build dynamically using the trick make profile-build ARCH=x86-64-modern COMP=gcc -j, this times I'm building statically in proper way make profile-build ARCH=x86-64-modern COMP=mingw -j

@CoffeeOne @snicolet I confirm the hangs of https://github.com/snicolet/Stockfish/tree/yielding_spinlock4 built static w/ latest libwinpthread. I have Stockfish 160919 vs your Stockfish 150919 as output.

.\cutechess-cli.exe -repeat -rounds 1000 -tournament gauntlet -resign movecount=3 score=400 -draw movenumber=34 movecount=8 score=20 -concurrency 2 -engine cmd=stockfish.exe option.Hash=8 option.Threads=4 -engine cmd=stockfish.exe option.Hash=8 option.Threads=4 -each proto=uci tc=1+0.01 -openings file=2moves_v1.pgn format=pgn order=random plies=16                                                                                                              Indexing opening suite...
Warning: 2 opening repetitions vs 1 games per encounter
Started game 1 of 1000 (Stockfish 160919 64 POPCNT vs Stockfish 160919 64 POPCNT)
Started game 2 of 1000 (Stockfish 160919 64 POPCNT vs Stockfish 160919 64 POPCNT)
Finished game 2 (Stockfish 160919 64 POPCNT vs Stockfish 160919 64 POPCNT): 1-0 {White wins by adjudication}
Score of Stockfish 160919 64 POPCNT vs Stockfish 160919 64 POPCNT: 0 - 1 - 0  [0.000] 1
Started game 3 of 1000 (Stockfish 160919 64 POPCNT vs Stockfish 160919 64 POPCNT)
Finished game 1 (Stockfish 160919 64 POPCNT vs Stockfish 160919 64 POPCNT): 1/2-1/2 {Draw by adjudication}
Score of Stockfish 160919 64 POPCNT vs Stockfish 160919 64 POPCNT: 0 - 1 - 1  [0.250] 2



Started game 39 of 1000 (Stockfish 160919 64 POPCNT vs Stockfish 160919 64 POPCNT)
Terminating process of engine Stockfish 160919 64 POPCNT(1)
Finished game 28 (Stockfish 160919 64 POPCNT vs Stockfish 160919 64 POPCNT): 0-1 {White's connection stalls}
Score of Stockfish 160919 64 POPCNT vs Stockfish 160919 64 POPCNT: 11 - 16 - 11  [0.434] 38
Finished game 39 (Stockfish 160919 64 POPCNT vs Stockfish 160919 64 POPCNT): * {No result}
Score of Stockfish 160919 64 POPCNT vs Stockfish 160919 64 POPCNT: 11 - 16 - 11  [0.434] 38
Elo difference: -46.0 +/- 96.0, LOS: 16.8 %, DrawRatio: 28.9 %
Finished match

@snicolet Replacing the conditional variables is possible by an atomic notify counter, however it is not possible to remove pthread entirely from MINGW compiles since cxa_alloca, iostream and other things still has dependency within the runtime library itself, that's why it still hangs even without a single use of pthread from our code.

@ppigazzini @CoffeeOne Thanks for testing!

Can I ask for yet another test... This removes our custom Mutex implementation, just using std::

https://github.com/vondele/Stockfish/tree/noCustomMutex

would be interesting to see if this hangs in the various cases, and for a non-hanging version, if the (threaded) performance is similar to master.

@vondele @noobpwnftw
https://github.com/vondele/Stockfish/tree/noCustomMutex
tournament 400 games.

  • dynamic build w/ gcc 9.2 by msys2, using libwinpthread-1.dll:

    • latest by msys2 : hang

    • older by msys2 : fine

    • 8.1 by mingw-w64 : fine

  • dynamic build w/ gcc 8.1 by mingw-w64, using libwinpthread-1.dll:

    • latest by msys2 : hang

    • older by msys2 : fine

    • 8.1 by mingw-w64 : fine

@vondele
https://github.com/vondele/Stockfish/tree/noCustomMutex
2.6% slowdown wrt master, both dynamic build w/ gcc 9.2 by msys2, same dlls (switching libwinpthread-1.dll has not effect on performance)

$ bash bench-parallel.sh ./stockfish-master.exe ./stockfish-noCM.exe 100
run   1 /100
run   2 /100
run   3 /100
run   4 /100
run   5 /100
...
 98  1911486  1856040  -55446
 99  1907220  1854419  -52801
100  1905518  1861733  -43785

base =    1904572 +/- 5351
test =    1854418 +/- 5065
diff =     -50153 +/- 965
speedup = -0.026333

I think I can make it not hang under any conditions, without damaging performance, but it may look ugly.
@ppigazzini Can you test my https://github.com/official-stockfish/Stockfish/pull/2307 after "Wake up again in case of a data race."? It should not hang even with new DLLs under MINGW.

Updated my msys2 installation this morning, compiled current master and experienced no problems so far.

@joergoster what is your CPU? At the moment we have hangs on Intel 3770k, Amd FX, Intel Xeon (bmi2) and the TCEC CPU.

make profile-build ARCH=x86-64-modern COMP=mingw -j (or COMP=gcc to have a dynamic build) and run this:

.\cutechess-cli.exe -repeat -rounds 1000 -tournament gauntlet -resign movecount=3 score=400 -draw movenumber=34 movecount=8 score=20 -concurrency 2 -engine cmd=stockfish.exe option.Hash=8 option.Threads=4 -engine cmd=stockfish.exe option.Hash=8 option.Threads=4 -each proto=uci tc=1+0.01 -openings file=2moves_v1.pgn format=pgn order=random plies=16

@ppigazzini CPU i5-4570
compiled with make profile-build ARCH=x86-64-bmi2 comp=mingw -j3

However, just noticed that the compiles only run under msys2 prompt.
Under Windows prompt it doesn't run anymore. :-(
This is eventually related to a major Windows update this morning. I need to investigate.

@noobpwnftw https://github.com/official-stockfish/Stockfish/pull/2307 so far so good (400/1000). Dynamic build w/ msys gcc 9.2 and latest msys2 libwinpthread.

@noobpwnftw 1000 games completed successfully.
Same speed of the master:

$ bash bench-parallel.sh ./stockfish-master.exe ./stockfish-t2.exe 100
run   1 /100
...
 99  1961511  1952508  -9003
100  1906369  1889749  -16620

base =    1934715 +/- 4579
test =    1923861 +/- 5288
diff =     -10854 +/- 1144
speedup = -0.005610

Just for reference, here is a version using atomic instead of condition variable: https://github.com/snicolet/Stockfish/tree/hang

Does it hang?

@snicolet No, it runs fine w/ the latest libwinpthread (ie the problematic one).
Slowdown 2% wrt master.

$ bash bench-parallel.sh ./stockfish-master.exe ./stockfish-hang.exe 20

base =    1706058 +/- 20454
test =    1665974 +/- 22636
diff =     -40083 +/- 16241
speedup = -0.023495

@noobpwnftw here the script to test the speed

#!/bin/bash
if [[ $# -ne 3 ]]; then
  echo "usage:" $0 "base test n_runs"
  echo "example:" $0 "./stockfish_base ./stockfish_test 10"
  exit 1
fi

base=$1
test=$2
n_runs=$3

# temporary files initialization
cat /dev/null > base000.txt
cat /dev/null > test000.txt
cat /dev/null > tmp000.txt

# preload of CPU/cache/memory
($base bench >/dev/null 2>&1)&
($test bench >/dev/null 2>&1)&
wait

# bench loop: SMP bench with background subshells
for k in `seq 1 $n_runs`;
  do
    printf "run %3d /%3d\n" $k $n_runs

    # swap the execution order to avoid bias
    if [ $((k%2)) -eq 0 ];
      then
        ($base bench >/dev/null 2>> base000.txt)&
        ($test bench >/dev/null 2>> test000.txt)&
        wait
      else
        ($test bench >/dev/null 2>> test000.txt)&
        ($base bench >/dev/null 2>> base000.txt)&
        wait
    fi
  done

# text processing to extract nps values
cat base000.txt | grep second | grep -Eo '[0-9]{1,}' > base001.txt
cat test000.txt | grep second | grep -Eo '[0-9]{1,}' > test001.txt

for k in `seq 1 $n_runs`;
  do
    echo $k >> tmp000.txt
  done

printf "\nrun\tbase\ttest\tdiff\n"
paste tmp000.txt base001.txt test001.txt | awk '{printf "%3d  %d  %d  %+d\n", $1, $2, $3, $3-$2}'
paste base001.txt test001.txt | awk '{printf "%d\t%d\t%d\n", $1, $2, $2-$1}' > tmp000.txt

# compute: sample mean, 1.96 * std of sample mean (95% of samples), speedup
# std of sample mean = sqrt(NR/(NR-1)) * (std population) / sqrt(NR)
cat tmp000.txt | awk '{sum1 += $1 ; sumq1 += $1**2 ;sum2 += $2 ; sumq2 += $2**2 ;sum3 += $3 ; sumq3 += $3**2 } END {printf "\nbase = %10d +/- %d\ntest = %10d +/- %d\ndiff = %10d +/- %d\nspeedup = %.6f\n\n", sum1/NR , 1.96 * sqrt(sumq1/NR - (sum1/NR)**2)/sqrt(NR-1) , sum2/NR , 1.96 * sqrt(sumq2/NR - (sum2/NR)**2)/sqrt(NR-1) , sum3/NR  , 1.96 * sqrt(sumq3/NR - (sum3/NR)**2)/sqrt(NR-1) , (sum2 - sum1)/sum1 }'

# remove temporary files
rm -f base000.txt test000.txt tmp000.txt base001.txt test001.txt

Seeing how @noobpwnftw closed his PR, I guess a practical question is, if SF makes it to the superfinals, do we 1) downgrade libwinpthread 2) use a different build of SF (e.g. abrok) 3) change the custom implementation of mutex/conditional variable in SF or 4) something else?

If we are keeping MINGW compiles, then libwinpthread must be downgraded to be 100% sure it doesn't deadlock elsewhere(while we can fix our use case or not use CV at all(i.e. using atomic instead), we cannot fix their STL or not to use STL at all).

After https://github.com/official-stockfish/Stockfish/commit/db00e1625eb46517f61085ffd3bcd28779e71220 the speedup of msys2 build (pgo & lto) vs Abrok one (pgo & lto) is only 0.26%

$ bash bench-parallel.sh ./stockfish_19091615_x64_bmi2.exe ./stockfish_msys2.exe 100

base =    1793776 +/- 9052
test =    1798456 +/- 9169
diff =       4679 +/- 3190
speedup = 0.002609

@ppigazzini Interesting script for the parallel execution, I do almost the same locally to test speed but I like to

a) show the md5 signature of the two binaries (to avoid looking for speed difference when the binaries are the same :-))
b) do less iterations and use bench at depth 22 instead of 12 (to reduce the impact of the launching time and hash initialization time)

# show the md5 signatures of the two tested binaries
md5 $testdir/testedpatch
md5 $testdir/master

# two long bench runs (depth 22), in parallel
$testdir/testedpatch  bench 16 1 22 > /dev/null && echo "testedpatch"  &
$testdir/master       bench 16 1 22 > /dev/null && echo "master"

@snicolet not a big difference in precision

  • using bench
$ bash bench-parallel.sh ./stockfish_msys2_ss3.exe ./stockfish_msys2_ss4.exe 10
run     base    test    diff
  1  1620088  1747310  +127222
  2  1771216  1679313  -91903
  3  1740901  1762447  +21546
  4  1745168  1755928  +10760
  5  1760994  1768283  +7289
  6  1769748  1773421  +3673
  7  1760269  1759544  -725
  8  1762447  1765360  +2913
  9  1754485  1750173  -4312
 10  1751608  1753765  +2157

base =    1743692 +/- 27586
test =    1751554 +/- 16520
diff =       7862 +/- 32465
speedup = 0.004509
  • using bench 16 1 22
$ bash bench-parallel.sh ./stockfish_msys2_ss3.exe ./stockfish_msys2_ss4.exe 10
run     base    test    diff
  1  1663089  1655786  -7303
  2  1662813  1639512  -23301
  3  1663466  1658005  -5461
  4  1663692  1672258  +8566
  5  1731483  1740323  +8840
  6  1785393  1784149  -1244
  7  1779448  1778672  -776
  8  1750255  1748615  -1640
  9  1778701  1773629  -5072
 10  1728167  1764137  +35970

base =    1720650 +/- 32824
test =    1721508 +/- 35972
diff =        857 +/- 9455
speedup = 0.000499

I have written some code to try to get a reduction of the hang: https://github.com/official-stockfish/Stockfish/issues/2309

in the forum thread about the preparation for the super-final binary, Pasquale made the remark that the Abrok versions are build using MinGW too:

Le mardi 24 septembre 2019 02:24:32 UTC+2, [email protected] a écrit :
Past Abrok builds were slow because the Ubuntu 16.04 MinGW was not able to do a Profile Guided Optimization. Now Ubuntu 18.04 MinGW is able to do a PGO. The compiler is always MinGW, built by msys2, by mingw-w64, by ubuntu.

Now I am confused, is there a risk that the Abrok builds use the buggy condition variable implementation too?

https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/xc3DM_xzxYA

I think so, depends on what libraries are being used... that's why it needs testing.

@ppigazzini @CoffeeOne
Is it my understanding that you have managed to fix your libraries for your compiler, and that you are now able to build a binary for Stockfish that no longer crash when you compile the latest versions of master with the latest patches, matching the speed of Abrok builds?

@Krgp
Did you update the pthread DLL on your machine?

I think it says which version of mingw is used at the top of the abrok web
page? Does that help?

I thought it was an old version but I'm not sure, I'm not an expert on that
stuff.

On the top of the Abrok page it says: "They are compiled with gcc/mingw 7.3 on Ubuntu 18.04."

But how can we sure? This is the problem, if the Abrok people decided to update their compiler but didn't change the page...

Is there a way to get a binary and tell which compiler produced it?

@snicolet different versions of MinGW, only the msys2 one uses the bugged libwinpthread

gcc version 9.2.0 (Rev2, Built by MSYS2 project)

gcc version 8.1.0 (x86_64-posix-seh-rev0, Built by MinGW-W64 project)

gcc version 7.3-posix 20180312 (GCC) Ubuntu 18.04

Nevertheless @CoffeeOne tested that the proposed bugfix is working fine
https://github.com/msys2/MINGW-packages/issues/5610#issuecomment-533830377

@snicolet Ubuntu 19.04 uses this:
gcc version 8.3-posix 20190406 (GCC)

But PGO build works only w/ Ubuntu 18.04, so if Abrok will upgrade to a newer Ubuntu version, SF will be terribly slow :)

x86_64-w64-mingw32-c++-posix -Wall -Wcast-qual -fno-exceptions -std=c++11 -fprofile-generate -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -msse3 -mpopcnt -DUSE_POPCNT -flto   -c -o syzygy/tbprobe.o syzygy/tbprobe.cpp
x86_64-w64-mingw32-c++-posix: fatal error: Killed signal terminated program cc1plus
compilation terminated.
make[2]: *** [<builtin>: misc.o] Error 1
make[2]: *** Waiting for unfinished jobs....

On the top of the Abrok page it says: "They are compiled with gcc/mingw 7.3 on Ubuntu 18.04."

But how can we sure? This is the problem, if the Abrok people decided to update their compiler but didn't change the page...

@snicolet please freeze in proper time the commit that you want to use in the SuFi.

See pull request https://github.com/official-stockfish/Stockfish/pull/2327 for a tentative to show at startup time which compiler was used to compile Stockfish.

Closing this issue now that the reasons for the hang are well understood (buggy external pthread library for GCC 9.1 and 9.2 compiler suite on MinGW64). What remains to be done is documenting the reasons properly in our Wiki pages.

Thanks to everybody here for all the investigations and testing during the last two weeks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

NightlyKing picture NightlyKing  ·  7Comments

nguyenpham picture nguyenpham  ·  4Comments

ZagButNoZig picture ZagButNoZig  ·  6Comments

Alayan-stk-2 picture Alayan-stk-2  ·  5Comments

niklasf picture niklasf  ·  5Comments