I've seen this many, many, many, many, many times. When SF seems to be losing and yet the eval is sort of flatlined, it would randomly push a pawn and suicide instead of playing for 50 moves. Can we make it so that SF plays for 50 moves as much as possible and not suicide? Even if this doesn't get merged into master it's useful to have in a special tournament build.
Happened yet again in CCC just now. Threw away the fortress that leela was completely clueless about.
Two examples of Stockfish recently throwing a fortress away vs Leela:

98... b4 by Stockfish 310320 64 BMI2
98... b44B3/2k5/2nb4/1pr2p2/p4P2/P7/1P1Q4/1K6 b - - 45 98
Move 87. b4 by Stockfish 20200407DC
87. b43nq3/r2n2p1/1k2p1N1/1p1pP1NP/p1pP2Q1/P1P5/1P3RK1/8 b - - 64 87It鈥檚 helpful to post fen. Thx.
i've edited my comment to include the FEN right before Stockfish's move for both games
Maybe, it would be interesting to see if bae019b53e5c2bfcf0d69b4ebfc52b4f4de762eb influenced that behavior close to the 50moves rule.
Just happened again in superfinals game 26, in which sf needlessly made pawn moves that made defense much more difficult.
@vondele Might be responsible for the otherwise inexplicable play in https://www.chess.com/computer-chess-championship#event=ccc13-finals&game=63
@Alayan-stk-2 could you try to investigate that a bit carefully, i.e. see if the behavior really changes with this patch? @joergoster, it that something you could look into?
@adentong Game 26 of TCEC Superfinal was very likely already lost anyways.
@vondele I will take a look.
It looks like the blunder in game 63 was not 161. .. Kf5-e4, but the following 162. .. Rc3-c2+.
Instead 162. .. Be6-d7 seems to hold.
7 Threads, 2 GB Hash, 6-man syzygy bases, 3 min, multipv=10:
info depth 29 seldepth 37 multipv 1 score cp -75 nodes 1845617389 nps 10253372 hashfull 973 tbhits 10812407 time 180001 pv e6d7 f2e2 c3c2 e2d1 e4d3 f8f3 d3e4 d1c2 e4f3 c2b3 f3e4 b3c3 e4f5 c3b4 f5e6 b4c5 d7c8 c5b6 e6e7 b6c6 e7e6 e5g3 c8a6 g3h4 a6c8 h4e1 c8a6 e1a5 a6c8 a5d2 e6e7 d2f4 e7e6 c6b6 e6e7 b6c5 e7e6
info depth 29 seldepth 57 multipv 2 score cp -2042 nodes 1845617389 nps 10253372 hashfull 973 tbhits 10812407 time 180001 pv c3c2 f2g3 e6d7 g3h4 c2c6 h4g5 e4d3 f8d8 d7h3 d8d5 d3e4 d5d8 h3f5 e5d6 f5h3 d6g3 e4f3 d4d5 c6c1 g5h4 c1h1 d5d6 h3g4 h4g5 h1h5 g5g6 h5c5 d6d7 g4h5 g6f6 c5c6 f6e7 c6g6 d8f8 f3g3 c7c8q g3g2 c8c2 g2g3 c2c7 g3h3 c7c3 h3h4 d7d8q
info depth 29 seldepth 44 multipv 3 score cp -2183 nodes 1845617389 nps 10253372 hashfull 973 tbhits 10812407 time 180001 pv e4d3 f2g3
info depth 28 seldepth 37 multipv 4 score cp -5446 nodes 1845617389 nps 10253372 hashfull 973 tbhits 10812407 time 180001 pv c3c4 f2e2 c4c2 e2d1 c2g2 c7c8q e6c8 f8c8 e4d3 d1c1 g2g6 c8c5 d3e4 c1d2 g6g2 d2c3 g2g1 c5a5 g1g6 a5a8 e4e3 a8c8 g6b6 c8c5 e3e4 c3d2 b6b2 c5c2 b2b7 c2c8 b7a7
info depth 28 seldepth 35 multipv 5 score cp -5446 nodes 1845617389 nps 10253372 hashfull 973 tbhits 10812407 time 180001 pv c3c6 f2e2 c6c2 e2d1 c2g2 c7c8q e6c8 f8c8 e4d3 d1c1 g2g6 c8c5 d3e4 c1d2 g6g2 d2c3 g2g1 c5a5 g1g6 a5a8 e4e3 a8c8 g6b6 c8c5 e3e4 c3d2 b6b2 c5c2 b2b7 c2c8 b7a7
info depth 28 seldepth 33 multipv 6 score cp -5446 nodes 1845617389 nps 10253372 hashfull 973 tbhits 10812407 time 180001 pv c3c1 f2e2 c1c2 e2d1 c2g2 c7c8q e6c8 f8c8 e4d3 d1c1 g2g6 c8c5 d3e4 c1d2 g6g2 d2c3 g2g1 c5a5 g1g6 a5a8 e4e3 a8c8 g6b6 c8c5 e3e4 c3d2 b6b2 c5c2 b2b7 c2c8 b7a7
info depth 28 seldepth 37 multipv 7 score cp -5827 nodes 1845617389 nps 10253372 hashfull 973 tbhits 10812407 time 180001 pv c3b3 c7c8q e6c8 f8c8 b3b2 f2g3 b2b3 g3h4 b3f3 h4g5 f3f5 g5g6 f5f2 c8h8 e4d3 e5f6 f2f1 g6f7 f1a1 f7e6 a1a5 e6f5 d3e2 f6e5 e2f2 h8d8 f2e3 f5g4 e3e4 d8f8 e4d3
info depth 28 seldepth 41 multipv 8 score cp -5827 nodes 1845617389 nps 10253372 hashfull 973 tbhits 10812407 time 180001 pv c3h3 f2e2 h3e3 e2d2 e3d3 d2c2 e6d7 c7c8q d7c8 f8c8 d3a3 c2d2 a3a6 c8h8 a6a2 d2c3 a2a3 c3b4 a3a6 h8h4 e4d3 b4c5 a6a5 c5d6 d3c3 h4h8 c3d3 h8c8 d3e4 c8c1 a5a6 d6c5 a6a5 c5b4 a5a2
info depth 28 seldepth 34 multipv 9 score cp -5845 nodes 1845617389 nps 10253372 hashfull 973 tbhits 10812407 time 180001 pv e6f5 f2e2 c3c2 e2d1 c2c3 d1d2 c3d3 d2c2 f5d7 f8d8 d7f5 d8e8 d3h3 c7c8q f5c8 e5g7 e4f5 e8c8 h3h7 c8f8 f5e4 f8e8 e4f5 g7e5 h7b7 e5g3 f5g4 e8d8 b7b5 g3d6 g4f3 c2c3 b5b7
info depth 28 seldepth 30 multipv 10 score cp -14896 nodes 1845617389 nps 10253372 hashfull 973 tbhits 10812407 time 180001 pv c3e3 f8e8 e3f3 f2e2 e6d7 c7c8q d7c8 e5d6 e4f5 e2f3 c8a6 e8e5 f5g6 f3e3 a6c4 e3d2 g6g7 d2c3 g7h7 c3b4 h7g7 b4c5 c4a2 e5d5 g7f6
bestmove e6d7 ponder f2e2
Could someone please confirm?
This is the position command taken from the logfile:
position startpos moves e2e4 c7c6 d2d4 d7d5 b1c3 d5e4 c3e4 b8d7 f1d3 g8f6 e4g5 d8c7 g1f3 h7h6 g5e6 c7d6 e6f8 d7f8 c2c3 c8g4 e1g1 d6d5 d3e2 f8g6 h2h3 g4f5 f1e1 e8g8 e2f1 a8d8 d1e2 f8e8 c3c4 d5a5 b2b3 e7e6 c1b2 g6e7 e2e3 f5h7 e3f4 a5f5 f4c7 d8d7 c7h2 e7g6 e1e3 f5a5 a2a4 a5c7 h2c7 d7c7 b3b4 g6f8 f3d2 e8d8 b4b5 f8d7 d2b3 h7c2 b3a5 d8c8 b2a3 f6e8 a3b4 g7g5 f2f3 c2g6 e3e1 g8g7 e1d1 d7f6 b4e1 f6h5 d1d2 c8a8 d2b2 e8f6 g2g3 g7h7 c4c5 f6d5 a5c4 f7f6 b5c6 b7c6 a4a5 h5g7 a5a6 g7f5 e1f2 h7g8 g3g4 f5g7 b2b7 c7f7 c4a5 a8c8 a1a3 h6h5 a3b3 f6f5 b7f7 g6f7 b3b7 c8c7 a5c4 h5g4 h3g4 g7e8 c4e5 e8f6 f2e1 f7e8 e1a5 c7c8 g4f5 c8a8 f5e6 d5f4 g1f2 f6d5 a5d2 f4e6 f1c4 e6d8 b7b2 d8f7 c4d5 c6d5 e5f7 g8f7 d2g5 e8c6 g5f4 a8d8 f4e5 d8e8 f2e3 f7e6 b2h2 c6b5 h2b2 b5a6 b2a2 a6b5 a2a7 e8e7 a7a5 e7b7 e3f2 e6d7 a5a3 b5c4 a3a8 c4d3 a8a3 d3c4 f3f4 c4b5 f2g3 d7e6 g3g4 b7f7 g4g5 f7f5 g5g4 f5f8 a3a7 f8f5 a7a5 b5e2 g4g3 f5f8 g3f2 e2d3 a5a3 d3b5 a3b3 b5d7 b3b6 e6f5 f2f3 d7e6 c5c6 f8a8 c6c7 a8a3 f3e2 a3a2 e2d3 a2a3 d3c2 a3a4 c2b3 a4c4 b6b8 f5e4 b8h8 e6d7 h8d8 d7h3 d8h8 h3d7 h8h7 d7a4 b3a3 a4b5 h7h8 b5d7 h8h7 d7e6 a3b3 e6g4 h7g7 g4e6 g7g6 e6d7 g6g3 d7e6 b3b2 e4f5 g3g7 f5e4 b2b3 e6f5 g7g5 f5d7 g5g1 d7e6 g1g2 e6h3 g2g1 h3e6 g1g3 e6f5 b3b2 f5d7 g3a3 c4b4 b2c3 b4c4 c3d2 d7e6 f4f5 e4f5 a3a6 e6d7 a6a8 f5e6 a8h8 e6f7 d2e3 c4c3 e3f4 f7e7 h8h7 e7e6 f4g5 c3c1 g5g4 c1f1 g4g3 f1c1 g3f2 c1c3 f2e2 d7c8 h7h8 c8d7 e2d2 c3c4 h8h7 d7c8 d2e3 c4c1 h7h8 c8d7 e3f4 c1f1 f4g3 f1c1 g3f3 e6f5 h8h7 c1c3 f3e2 f5e6 e2f2 d7c8 h7h8 c8d7 f2e2 c3c2 e2f3 c2c1 h8h7 c1c2 h7g7 c2c1 f3e3 c1c2 g7g1 e6f5 g1g7 d7e6 g7h7 f5g5 h7h8 c2c3 e3d2 c3c4 h8e8 g5f5 d2e3 c4c3 e3f2 e6d7 e8d8 d7e6 d8h8 f5e4 h8f8
And this is the relevant part of the logfile. Thanks to @Alayan-stk-2 for providing it!
3388782 >Stockfish(1): position startpos moves e2e4 c7c6 d2d4 d7d5 b1c3 d5e4 c3e4 b8d7 f1d3 g8f6 e4g5 d8c7 g1f3 h7h6 g5e6 c7d6 e6f8 d7f8 c2c3 c8g4 e1g1 d6d5 d3e2 f8g6 h2h3 g4f5 f1e1 e8g8 e2f1 a8d8 d1e2 f8e8 c3c4 d5a5 b2b3 e7e6 c1b2 g6e7 e2e3 f5h7 e3f4 a5f5 f4c7 d8d7 c7h2 e7g6 e1e3 f5a5 a2a4 a5c7 h2c7 d7c7 b3b4 g6f8 f3d2 e8d8 b4b5 f8d7 d2b3 h7c2 b3a5 d8c8 b2a3 f6e8 a3b4 g7g5 f2f3 c2g6 e3e1 g8g7 e1d1 d7f6 b4e1 f6h5 d1d2 c8a8 d2b2 e8f6 g2g3 g7h7 c4c5 f6d5 a5c4 f7f6 b5c6 b7c6 a4a5 h5g7 a5a6 g7f5 e1f2 h7g8 g3g4 f5g7 b2b7 c7f7 c4a5 a8c8 a1a3 h6h5 a3b3 f6f5 b7f7 g6f7 b3b7 c8c7 a5c4 h5g4 h3g4 g7e8 c4e5 e8f6 f2e1 f7e8 e1a5 c7c8 g4f5 c8a8 f5e6 d5f4 g1f2 f6d5 a5d2 f4e6 f1c4 e6d8 b7b2 d8f7 c4d5 c6d5 e5f7 g8f7 d2g5 e8c6 g5f4 a8d8 f4e5 d8e8 f2e3 f7e6 b2h2 c6b5 h2b2 b5a6 b2a2 a6b5 a2a7 e8e7 a7a5 e7b7 e3f2 e6d7 a5a3 b5c4 a3a8 c4d3 a8a3 d3c4 f3f4 c4b5 f2g3 d7e6 g3g4 b7f7 g4g5 f7f5 g5g4 f5f8 a3a7 f8f5 a7a5 b5e2 g4g3 f5f8 g3f2 e2d3 a5a3 d3b5 a3b3 b5d7 b3b6 e6f5 f2f3 d7e6 c5c6 f8a8 c6c7 a8a3 f3e2 a3a2 e2d3 a2a3 d3c2 a3a4 c2b3 a4c4 b6b8 f5e4 b8h8 e6d7 h8d8 d7h3 d8h8 h3d7 h8h7 d7a4 b3a3 a4b5 h7h8 b5d7 h8h7 d7e6 a3b3 e6g4 h7g7 g4e6 g7g6 e6d7 g6g3 d7e6 b3b2 e4f5 g3g7 f5e4 b2b3 e6f5 g7g5 f5d7 g5g1 d7e6 g1g2 e6h3 g2g1 h3e6 g1g3 e6f5 b3b2 f5d7 g3a3 c4b4 b2c3 b4c4 c3d2 d7e6 f4f5 e4f5 a3a6 e6d7 a6a8 f5e6 a8h8 e6f7 d2e3 c4c3 e3f4 f7e7 h8h7 e7e6 f4g5 c3c1 g5g4 c1f1 g4g3 f1c1 g3f2 c1c3 f2e2 d7c8 h7h8 c8d7 e2d2 c3c4 h8h7 d7c8 d2e3 c4c1 h7h8 c8d7 e3f4 c1f1 f4g3 f1c1 g3f3 e6f5 h8h7 c1c3 f3e2 f5e6 e2f2 d7c8 h7h8 c8d7 f2e2 c3c2 e2f3 c2c1 h8h7 c1c2 h7g7 c2c1 f3e3 c1c2 g7g1 e6f5 g1g7 d7e6 g7h7 f5g5 h7h8 c2c3 e3d2 c3c4 h8e8 g5f5 d2e3 c4c3 e3f2 e6d7 e8d8 d7e6 d8h8 f5e4 h8f8
3388782 >Stockfish(1): isready
33887843388785 >Stockfish(1): go wtime 7374 btime 8455 winc 5000 binc 5000
33887873388787 3388787 3388787 3388788 3388788 3388789 3388789 3388790 3388790 3388792 3388792 3388793
3388795
33887963388797 3388797 3388798 3388800 3388801 3388804 3388804 3388805 3388812 3388812 3388813 3388814 3388814 3388830 3388880 3389373 3389750 3389943 3390350 3390411 3390772 3390938 3390979 3390979
3390979
Would this make any sense, conceptually (the actual code would look different) :
if (eval > 10 against self && depth > N) { // everything loses
don't select pawn push as best move;
don't select captures as best move unless it's "free";
}
It doesn't gain elo in rating lists, it rarely would change anything when adjudication is used ; but on the odd occasion that SF sees it's lost and randomly pick a 50mr reset, it would make it so much less frustrating.
Arguably not worth the hassle, it's simply frustrating that when SF sees that everything loses, its defense becomes less challenging.
no wouldn't make much sense, e.g. the optimal move to give the longest path to mate could be a pawn push or a capture, or unless one captures next move is a fork on K and Q etc.... would be ugly very quickly. SF plays the best move, one can only try to improve its understanding of what is a bestmove.
@vondele Do you think there's anything else that can be done for this, other than the patch you just merged today? If not then I'll close the issue.
Let's keep it open for a few days, in case @joergoster sees any relationship with the commit mentioned previously. If not we can close. Thanks.
SF plays the best move, one can only try to improve its understanding of what is a bestmove.
Yeah, I generally agree ; but in lost positions where everything is horrible, there is no good way to differentiate "challenging moves" from "delay mate longer but give easy play to the opponent". It doesn't seem really fixable.
@vondele I thought it to be quite obvious now from my above posts, that one possible cause for this issue is the thread voting patch. See the highlighted PV line at the end of my 2nd post.
The question remains, though, why one thread (or even more than one?) keeps flying through the plies and reaching depth 73 without noticing that this root move is losing. It is possible that the patch you mentioned is causing this. OTOH, a single-threaded search doesn't show this problem.
The short time-control may also play a role here, idk.
if (pos.rule50_count() < 90)
return ttValue;
It seems to me very likely that 90 is more drastic than required and backfires, while the rare ill effects of GHI could be alleviated with smaller margin, say 94.
But the question is how to test? Normal TC's rarely trigger it. Maybe just on those 50-move positions?
@joergoster I indeed didn't see that you highlighted the interesting fact that the different threads must have had a difference of depth ~40 in that run. That's an indication of a potential issue, but one would need to understand better.
@vondele Yes, it would certainly help to know whether this also happened in other cases.
I must admit I didn't see this huge jump in depth at first as I skipped between the highlighted parts of the log. It's very suspicious, especially when considering seldepth...
The output shows a seldepth of 9 only. The line that was chosen in PV was not forced at all, so I fail to see the why of this abysmal seldepth. SF expected Ke1 and a quick exchange into a 7-men TB draw (this is a set of TB it must have at CCC).
The 50mr counter on Rc2+ was at 77, so there is a good chance that this bug isn't directly caused by 50mr.
Happened again in game 94 of SUFI. Leela was completely clueless and had no idea how to convert the endgame, but SF pushed a pawn which helped Leela. Even if the endgame was objectively lost, I strongly believe that if SF didn't push the pawn it woulda drawn the game due to Leela's cluelessness. We need a patch that ignores pawn moves in completely lost positions. This patch can be tested for non regression and applied for tournament play only. I imagine doing so definitely won't hurt since if a position is already completely lost then it doesn't matter what move is played, but it'll definitely exploit Leela's bad endgames and possibly draw some otherwise lost endgames. @vondele @Alayan-stk-2
@adentong link, fen, move played, better move, + deep analysis to correctly assess the position. In this case, the cutechess log would be useful as well (to check the actual depth etc).
TCEC S17 Sufi Game 94, position after 143. Ra1: 8/p3kp2/Pp2p3/1n2PpP1/5P2/1Kp5/8/R7 b - - 68 143.
TCECfish played the instantly suicidal Nd7, supposedly after evaluating 530 million nodes. The better move was Kd7. My 20200407 Homefish quickly switches away from Nc7 / c2 / Nd4+ (moves it does initially consider at low depths), after which it prefers Kd7 forever. The linked Lifish also mirrors my Homefish's behavior.
Even after move 163, when the position is already objectively lost 5k2/1R6/4p1n1/4PpP1/3K4/8/8/8 b - - 10 163, pushing f4 is essentially just conceding. Even if in this particular case it may not have mattered at that point not pushing the pawn would have been stauncher defense. Also @nickolasreynolds seems like in the other issue I opened https://github.com/official-stockfish/Stockfish/issues/2643 SF also chose an instantly losing move which homefish very quickly refutes but SF with 176 threads somehow fail to do so.
I also tried Stockfish 070420 x64 at 64 threads 5 times and never considered 143...Nc7? blunder that throws the game. It just plays Kd7 from start to finish.
After 143...Nc7 it shows fail low after 2 seconds with >+2.50 evals and after 10 seconds goes to +4.5.
I use partial 7 men TBs though and don't know if it makes a difference.
143...Kd7 appears to be a sure draw.
Analysis by Stockfish 070420 64 POPCNT:
143...Ke7-d7 144.Ra1-h1 Kd7-e8 145.Kb3-b4 Nb5-c7 146.Kb4xc3 Nc7xa6 147.Rh1-h8+ Ke8-e7 148.Rh8-h7 Ke7-e8 149.g5-g6 f7xg6 150.Rh7xa7 Na6-c5 151.Ra7-g7 g6-g5 152.f4xg5 Ke8-f8 153.Rg7-c7 Nc5-e4+ 154.Kc3-d4 Ne4xg5 155.Kd4-e3 Kf8-e8 156.Rc7-c6 Ke8-e7 157.Rc6xb6 Ke7-d7 158.Ke3-f4 Ng5-e4 159.Rb6-a6 Ne4-c5 160.Ra6-d6+ Kd7-e7 161.Rd6-d4 Nc5-e4 162.Rd4-a4 Ke7-d8 163.Kf4-f3 Kd8-d7 164.Ra4-a6 Ne4-g5+ 165.Kf3-f4 Ng5-e4 166.Ra6-b6 Ne4-f2 167.Rb6-d6+ Kd7-e7 168.Kf4-f3 Nf2-e4 169.Rd6-b6 Ke7-d7 170.Rb6-b8 Ne4-g5+ 171.Kf3-f4 Ng5-e4 172.Rb8-h8 Kd7-e7 173.Rh8-h3 Ke7-e8 174.Rh3-f3 Ke8-f7 ................
White is better: +/- (0.73) Depth: 76/87 00:02:54 10212MN, tb=105077340
The TCEC log is this:
SF seems to think between 143...c2 and 143...Nc7 that were both losing moves.
786578331 >Stockfish 20200407DC(137): isready
786579048786579048 >Stockfish 20200407DC(137): go wtime 25230 btime 32864 winc 5000 binc 5000
786579065786579065 786579066 786579066 786579066 786579067 786579068 786579069 786579070 786579071 786579072 786579073 786579075 786579077 786579079 786579080 786579122 786579143 786579166 786579179 786579214 786579222 786579265 786579317 786579346 786579381 786579407 786579481 786579503 786579546 786579671 786579685 786579731 786579928 786580010 786580103 786580129 786580699 786580792 786580832 786581036 786581197 786581200 786581201
@vondele So there have now been at least two instances where SF chose a losing move which homefish, supposedly on inferior hardware, refutes quickly. Could it be that 176 threads are too many threads and SF doesn't scale well past a certain number of threads?
Also https://github.com/official-stockfish/Stockfish/issues/2620#issuecomment-612852816 mentioned above. I think we should take a look at it to see if it's a legitimate issue.
@adentong I'm actually currently running tests on the fen posted by @nickolasreynolds (thanks!). I'll report a bit later. The question is not if sf can refute the move (of course it can), but if it always finds the move. So, can your run the search (e.g. for 1sec) on homefish ~200 times, and see what the distribution of bestmoves is. I'm doing that right now on 250 threads.
@vondele Will do. I'll report back once I finish.
@vondele From the log snippet provided it looks like a thread has been picked which didn't find the refutation of Nc5-b7 in the given time.
@joergoster yes, question is why... I can reproduce similar results locally, running 10s of repetitions with following settings (changing movetime and # threads), up to 5 different moves are picked (4 of which are wrong):
setoption name Threads value 250
setoption name Hash value 80000
position fen 8/p3kp2/Pp2p3/1n2PpP1/5P2/1Kp5/8/R7 b - - 68 143
go movetime 1000
# count #bestmove #analysis (200s)
216 e7d7 -> cp 110
44 e7d8 -> cp 471
7 e7e8 -> cp 421
5 b5c7 -> cp 400
setoption name Threads value 250
setoption name Hash value 80000
position fen 8/p3kp2/Pp2p3/1n2PpP1/5P2/1Kp5/8/R7 b - - 68 143
go movetime 2153
# count #bestmove
141 e7d7
65 e7d8
2 b5c7
setoption name Threads value 50
setoption name Hash value 80000
position fen 8/p3kp2/Pp2p3/1n2PpP1/5P2/1Kp5/8/R7 b - - 68 143
go movetime 2153
# count #bestmove
197 e7d7
11 b5c7
setoption name Threads value 50
setoption name Hash value 80000
position fen 8/p3kp2/Pp2p3/1n2PpP1/5P2/1Kp5/8/R7 b - - 68 143
go movetime 10000
# count #bestmove
65 bestmove e7d7
2 bestmove b5c7
1 bestmove c3c2 -> cp 400
the situation seems worse with many threads, but that's maybe coincidence. I wonder if there are just many threads with the wrong move as a search result, or if there is a different reason. Since it seems to be reproducible, maybe it can be analyzed.
Edit: the distribution depends a bit on the version, the version I used was about 2weeks old, current master gives:
setoption name Threads value 250
setoption name Hash value 80000
position fen 8/p3kp2/Pp2p3/1n2PpP1/5P2/1Kp5/8/R7 b - - 68 143
go movetime 1000
160 bestmove e7d7
13 bestmove b5c7
12 bestmove e7d8
5 bestmove e7e8
This is imo the danger of too many reductions on multicore. They definitely help at 20+0.2" 8-thread, but its very risky for high threads + gains should diminish at high TC, because depth is much more needed at low TC's. In general I think there is an issue with multithreaded LTC reliability.
Probably though this is something more buggy than just reductions.
@vondele Thank you, very interesting! Although I think the main issue here is the very small amount of time available, probably all 178 threads want to probe the hash table and the TBs almost simultaneously, can you please also check with my 50-move rule patch https://github.com/official-stockfish/Stockfish/commit/bae019b53e5c2bfcf0d69b4ebfc52b4f4de762eb removed?
My apologies
@NKONSTANTAKIS yes your need to relax about this. If you want to comment things like that, please, take the code, make the changes, run the tests. Otherwise it isn't particularly helpful.
@joergoster with the revert of that patch on master results look similar (250th, 1000ms search):
4 bestmove b5c7
87 bestmove e7d7
4 bestmove e7d8
3 bestmove e7e8
@vondele Thanks, much appreciated! I just wanted to be sure this patch of mine is not the root cause. ;-)
@vondele I just hope what I'm doing its not seriously obstructing. If my posts are 90% useless and 10% of some use, I am happy. But I have to be careful to not put myself on auto-ignore, else this 10% will pass unnoticed...
All in all, I am living it ;)
uploaded a file with 200 searches on this fen with 250 threads. The code has a modification so that one can see the bestmove for each thread at voting time together with depth, and score.
diff --git a/src/search.cpp b/src/search.cpp
index a7e90a0..31f7b69 100644
--- a/src/search.cpp
+++ b/src/search.cpp
@@ -284,6 +284,10 @@ void MainThread::search() {
for (Thread* th: Threads)
minScore = std::min(minScore, th->rootMoves[0].score);
+ for (Thread* th: Threads) {
+ std::cout << "xxx " << UCI::move(th->rootMoves[0].pv[0], rootPos.is_chess960()) << " " << th->completedDepth << " " << th->rootMoves[0].sc
+ }
+
// Vote according to score and depth, and select the best thread
for (Thread* th : Threads)
{
for example for one search, sorted with equal entries counted:
4 xxx b5c7 36 -441 504
1 xxx b5d4 36 -441 504
1 xxx c3c2 33 -406 1617
2 xxx c3c2 34 -406 1666
2 xxx c3c2 35 -406 1715
1 xxx b5c7 35 -403 1820
1 xxx c3c2 37 -370 3145
1 xxx b5c7 32 -355 3200
1 xxx b5c7 36 -366 3204
1 xxx b5d4 33 -357 3234
1 xxx b5c7 37 -367 3256
1 xxx b5c7 37 -366 3293
1 xxx b5d4 37 -366 3293
1 xxx c3c2 37 -366 3293
1 xxx c3c2 34 -349 3604
1 xxx b5d4 35 -349 3710
3 xxx b5c7 35 -348 3745
2 xxx b5d4 35 -348 3745
5 xxx b5d4 36 -349 3816
5 xxx c3c2 36 -349 3816
1 xxx b5d4 36 -348 3852
4 xxx c3c2 36 -348 3852
12 xxx b5c7 37 -349 3922
8 xxx b5d4 37 -349 3922
9 xxx c3c2 37 -349 3922
1 xxx b5c7 36 -346 3924
19 xxx b5c7 37 -348 3959
2 xxx b5d4 37 -348 3959
9 xxx c3c2 37 -348 3959
30 xxx b5c7 38 -349 4028
2 xxx b5d4 38 -349 4028
13 xxx c3c2 38 -349 4028
44 xxx b5c7 38 -348 4066
3 xxx c3c2 38 -348 4066
19 xxx b5c7 39 -349 4134
38 xxx b5c7 39 -348 4173
no single thread stands out, they're just all wrong.
Lets try with less logarithmic reductions?
Is it possible / reasonable to gradually reduce reductions and increase pruning thresholds for each new thread as the number of threads increases?
Not trying to point any fingers here, since it is still not very clear to me which compile was sent of which version of SF, but since it looks like it was a compile after 4/2/20, maybe #2603 could have played a role in this?
Master still currently has the following line:
https://github.com/official-stockfish/Stockfish/blob/221893bf679f70098e6f751fded2fe843471c6be/src/search.cpp#L300
Are we sure that this is not supposed to be a less than or equal to sign instead of a greater than sign?
It would also help if we could see the diff of the version that was actually sent
EDIT: replaced incorrect link with actual link to line in current master
EDIT 2: could also potentially be related to #2643
just for reference, so that we don't point to random patches in the past, the results for SF 10 are clearly worse than master, so it is definitely not some recent regression:
setoption name Threads value 250
setoption name Hash value 80000
position fen 8/p3kp2/Pp2p3/1n2PpP1/5P2/1Kp5/8/R7 b - - 68 143
go movetime 1000
32 bestmove b5c7
6 bestmove b5d4
73 bestmove e7d7
97 bestmove e7d8
and similar for SF 9:
34 bestmove b5c7
16 bestmove c3c2
96 bestmove e7d7
60 bestmove e7d8
1 bestmove e7e8
1 bestmove e7f8
Is this another example?
TCEC S17 Sufi Game 95, position after 32. ... Be7: 5r2/1k2b2p/2q1p3/Pp1bPrpB/2pP4/6QP/2RB1PP1/5RK1 w - - 2 33.
TCECfish played the instantly suicidal Rb2, supposedly after evaluating 28 billion (!) nodes. My 20200407 Homefish quickly (0.0s) switches away from Rb2 , after which it prefers Be2 and then Bg4 forever. I only ran one test, perhaps people with beefier machines can do multiple runs up to 28 billion nodes to confirm that my Homefish's behavior is typical of non-supercomputers.
(It's also notable that Stockfish lost _both_ sides of this opening, something that really shouldn't be possible at this level and time control.)
And another possible example from the forums:
bsda...:
Looks like SF played another weird suicide move in TCEC game 98 https://www.tcec-chess.com/archive.html?season=17&div=sf&game=98
On move 30 SF plays a5 ?? then shows a losing eval 2 moves later
FEN: r1rb2k1/1bpnqn2/pp1p4/3Pp1p1/PP2PpPp/NQNB1P1P/5B2/R1R3K1 b - - 10 30
Bryan:
The latest SFdev only wants a5 at d=2, and then prefers other moves to d=50 and 39 billion moves. So I can't replicate a5 in the latest SFdev.
TCECfish played the instantly suicidal Rb2, supposedly after evaluating 28 billion (!) nodes. My 20200407 Homefish quickly (0.0s) switches away from Rb2 , after which it prefers Be2 and then Bg4 forever. I only ran one test, perhaps people with beefier machines can do multiple runs up to 28 billion nodes to confirm that my Homefish's behavior is typical of non-supercomputers.
Things are not that easy, as running SF with e.g 50 threads it really likes Rb2 after a lot of time and 14 billion nodes(too lazy to wait another 4.5 minutes) believing it's not suicidal at all:
Analysis by Stockfish 070420 64 POPCNT:
33.Rc2-b2 Kb7-a8 34.Kg1-h2 Rf8-b8 35.Bd2-b4 Be7xb4 36.Rb2xb4 Rf5-f4 37.Rf1-d1 Qc6-c8 38.Kh2-g1 Qc8-f8 39.Rd1-b1 Rf4xd4 40.Rb4xb5 Rb8xb5 41.Rb1xb5 g5-g4 42.Bh5xg4 Rd4-d3 43.Bg4-f3 Bd5xf3 44.g2xf3 Rd3-b3 45.Rb5-b6 Rb3xb6 46.a5xb6 c4-c3 47.Qg3-h4 Ka8-b7 48.Qh4-a4 Qf8-c5 49.Qa4-a7+ Kb7-c6 50.Qa7-c7+...........
The position is equal: = (0.00) Depth: 53/47 00:02:17 7056MN, tb=8050284
33.Rc2-b2 Rf8-b8 34.Rf1-b1 Kb7-a8 35.Bh5-e2 Bd5-e4 36.Rb1-e1 Be4-d5
The position is equal: = (0.00) Depth: 54/65 00:04:10 12985MN, tb=14949214
33.Rc2-b2 Rf8-b8 34.Rf1-b1 Kb7-a8 35.Bh5-e2 Bd5-e4 36.Rb1-c1 Be4-d5
The position is equal: = (0.00) Depth: 55/9 00:04:23 14313MN, tb=15835853BTW what's up with Depth 55/9 ?
I have seen that before but seems strange.
The low seldepth has been seen in a game blundered at CCC. I have a hard time believing the seldepth reporting is legit in those situations, but in that other case SF expected a quick transition to TB that was incorrect (one of the expected moves was a blunder).
the low seldepth can happen IMO. I'm not sure this is the real issue.
The cycle detection mechanism just came to mind. By a-priori detecting no-progress via transposition, duplicate search is avoided, but what happens when the few in-between moves alter a 50-move win to a 50-move draw or a draw to a loss? Pretty rare, since 3 things need to coincide:
Pure speculation here, since cycle detection was AFAIR introduced just before SF9.
Nvm it was just after SF9 https://github.com/official-stockfish/Stockfish/commit/91a76331ca27b40d63f0031fbd7b9e41ead354d4
So basically sorry for noise, & hopefully some imaginative solution at that GHI-ish area.
There is another case mentioned in the german CSS forum in this thread.
Ich sah eine unheimliche Suchtiefe von 245 (!) und Bewertung 0.00 bei Stockfish, als er diesen Zug spielte.
I'm more and more inclined to think that this is an issue with TB scores flooding the hash table, which are being stored with maximum depth and thus will hardly ever be replaced!
@AndyGrant already changed this for Ethereal here https://github.com/AndyGrant/Ethereal/commit/12dd95fc467ba6a28cb5ac6790fc64726affcddd
Maybe we should apply this, too. We are free to revert it, yet people could grab the version from abrok site and give feedback.
Stockfish does the following
tte->save(posKey, value_to_tt(value, ss->ply), ttPv, b,
std::min(MAX_PLY - 1, depth + 6), MOVE_NONE, VALUE_NONE);
TB scores have "inflated" depths, but you could argue the +6 is a fair adjustment, since the TB scores are "true" values. Personally, I don't think that argument holds weight, so I opted to just saved at the actual depth, as that maintains the most consistency in how I deal with the TT.
In relation to this, but not really to the thread as a whole, I considered the idea of flagging TT entries as belonging to the TB or not, and having those have maximal depth but also the highest prio. to be replaced.
I lack the power to test that to an extent that could justify such an overhaul.
@AndyGrant Yes, you're right. But even this depth + 6 might cause trouble.
The repeated observation in this thread of reaching very high search depths with a 0.00 score leads me to this guess.
That raises another question, which is whether or not storing any TB hit into the TT is worthwhile.
Lets assume that the engine has access to 6-man Syzygy on a recent SSD.
Is storing a TB hit into the TT solely done to avoid a TB lookup? Assuming SyzygyProbeDepth is set such that no TB probes are restricted, it would appear to me that any position which would look up a TB hit in the TT, would also find the same exact score just a few steps of the search later.
I'm not convinced that depth + 6 serves any substantiated purpose. I'm also not convinced that on modern hardware with 6-piece Syzygy (7-piece could be another story) there is any purpose in hashing TB hits. If a given TB position is actually important, it will be make an impact on it's parent node and grandparent node, and so on.
Note that the tests above reproduce one of the issues without TB. There might be several issues however.
@AndyGrant I fully agree. See https://github.com/joergoster/Stockfish/commit/db04a51bf0d93de892b1b25de4c7f0b0a0ea38a1 where I don't save TB scores at all, but let the parent node do this as for every other score. With my limited testing I was not able to measure any drawback. :-)
Trying to read search.cpp from scratch with a fresh look, I noted that the two functions called value_to_tt and value_from_tt used to be mathematical inverses of each other, in the sense that
value_to_tt(value_from_tt) == identity
value_from_tt(value_to_tt) == identity
This property seems to have been broken by https://github.com/official-stockfish/Stockfish/commit/be5a2f015e45886e32867b4559ef51dd694a3cec , can this fact be relevant for the current discussion?
unlikely, as the behavior discussed in this issue was also seen in SF9 and SF10. https://github.com/official-stockfish/Stockfish/issues/2620#issuecomment-616963653
I have pushed a pull request there: https://github.com/official-stockfish/Stockfish/pull/2666
I'm not sure if there is a problem here, I don't think the engine should ever change it's move because it's opponent doesn't know how to convert it's advantage. The goal shouldn't be winning engine tournaments, it's helping humans analyzing positions.
@USGroup1 Different goals for different folks. There will never be universal agreement just as there is no right or wrong answer as to what the goal be. You might want to consider trying my fork of Stockfish as I am also a corr player as well and tailer my fork more towards long term analysis as well as keeping it current with development Stockfish. Check out the honey branch of SF @MichaelB7 . You can also grab the latest release under the release tab.
@USGroup1 I have to disagree. This isn't about beating leela in tournaments. It's about SF with a high enough thread count will sometimes just make completely nonsensical and losing moves. Of course if it weren't for TCEC no one would probably even realize this problem exists, but it's a legitimate problem nonetheless. Of course a nice byproduct of fixing this would be to lose a few less games against leela, but that's really beside the point.
@vondele Huh, did I miss something? Do we know what is causing these blunders?
I don't think so ...
When I look at a number of FENs https://github.com/official-stockfish/Stockfish/pull/2666#issuecomment-626117311 most of the positions were clearly lost, and the one that was not was fixed. However, maybe I overlooked a FEN? I propose that a new issue is opened for an issue, with an analysis, showing which move is a clearly holding the draw.
Not every lost game is worth an issue however...
Most helpful comment
Two examples of Stockfish recently throwing a fortress away vs Leela:
98... b4by Stockfish 310320 64 BMI298... b44B3/2k5/2nb4/1pr2p2/p4P2/P7/1P1Q4/1K6 b - - 45 98Move
87. b4by Stockfish 20200407DC87. b43nq3/r2n2p1/1k2p1N1/1p1pP1NP/p1pP2Q1/P1P5/1P3RK1/8 b - - 64 87