Stockfish: NNUE eval rotate vs mirror

Created on 18 Aug 2020  路  65Comments  路  Source: official-stockfish/Stockfish

NNUE eval currently rotates the board to eval from blacks point of view, rather than mirror it. In shogi, this makes sense as startposition has rotational symmetry and there is no castling. In chess startpos is mirrored, and there's side-specific castling, so rotating seems unintuitive.

Is this an oversight or intentional?

NNUE

Most helpful comment

Since flip and rotate require matching the correct engine w/ the correct net I wonder if it would make sense to include a version number in the nets? This would prevent the engine from loading an incompatible net. Also features like castling rights, en passant, and maybe others should result in incompatible nets.

All 65 comments

If this is an oversight it will be a costly one because EvalList::put_piece is used in training too, and a lot of training effort will be wasted.

a lot of training effort will be wasted.

Flipping the position of pieces should not be too difficult... (I didn't carefully examine the code,)

If this is an oversight it will be a costly one because EvalList::put_piece is used in training too, and a lot of training effort will be wasted.

How so?

The current code can still create the training data we need for a new updated net.

@gekkehenker So nets trained on the outputs of previous nets can be saved/reconstructed without much effort? That's good news.

@gekkehenker So nets trained on the outputs of previous nets can be saved/reconstructed without much effort? That's good news.

I was talking about the training data that Sergio had to generate to create those nets.
Training data is just a fen, eval score, game results. As long as the training data is still present, the process of generating a "bugfixed" net is a day's work.

Is this actually a bug and if so, does it affect anything(i.e. producing wrong results)?

@nodchip @tttak @dorzechowski might be able to comment on this being a bug or a mere convention. I suspect (but do not know) this might influence as the rotated position is not equally strong as the mirrored one when it comes to castling rights (even if castling right are unknown as an explicit feature, implicitly the position should be stronger/weaker). The impact of this could be NNUE playing weaker (how much would be a question).

It could be why training with castling rights as a feature didn't work well

This was an oversight. This happened because I didn't change the basic structure of NNUE when I ported NNUE to stockfish.
Though the training works well so far so good. I guess this is because the input feature is halfkp.

Halfkp is one-hot encoding of the following combination:

  • The position of a king.
  • The position of a piece other than king.
  • The type of the piece above.

In other words, halfkp has independent nets for each position of a king. In the white's turn, NNUE often trains the net that the king is in 1e. In the black's turn, NNUE often trains the net that the king is in 1d. The white's king sometimes in 1d, and the black's king sometimes in 1e. But the probability is relatively lower. This may be why the training works well.

But there is an concern about this one. The trainer mirrors positions by 50% when it reads a position from the training data file with the default learn command option.

The trainer mirrors positions for "augmentation". "augmentation" in the context of machine learning is to extend training data. For example, in the machine learning of image processing, developer sometimes scale, rotate, translate input images to increase the number of input data virtually.

Both in the white's turn and black's turn, NNUE will train the nets that the king is both in 1d or 1e. I'm not sure if this is good or not for training. To investigate the effects of mirroring, we could lower the percentage of mirroring, or set it to 0.

Rotating/mirroring is used in learner code which I'm not too familiar with. I'm not sure it's a bug for a network without castling as an input because without castling rotated positions are equivalent. If we want to include castling, it would need to be changed I think.

so, a first experiment could be to fix it in SF, and just run a test on the framework to see if this is a regression with the current net, i.e. if the net is sensitive to that. That's just a data point, but I hope somebody can submit a test.

After that, we might have to fix both player and learner and retrain a net. I think it is better to fix this now, even if the effect might not be large.

@nodchip I don't think this is a bug.
I think this "rotate180" is just for reversing the view, and has nothing to do with whether start position has rotational symmetry or not.
Even if start position of shogi does not have rotational symmetry, I think we can still use the code in this part of NNUE.
I'm sorry if I'm mistaken.

I'm not sure there is anything to fix in SF. If I understand it correctly, NNUE eval always looks at a position from side to move (sente in shogi terms), so we literally rotate the board 180 degrees after each move. For instance, White rook on a1 (= square number 0), from Black POV is seen as a rook on square h8 (63). It's like side to move is always White.

@tttak I'm not sure if this is a bug which decrease the elo or not. I think that it might be better to flip rank instead of rotating 180. Because the starting position is vertically symmetry. But I also think that we don't have to fix because the trainer mirrors the position by 50%. In my gut feeling, I guess that we don't have fix it.

@dorzechowski Black king on e8 from white POV is seen as a king on square d1 from black POV. This is different from the starting position of white king on e1 from white POV. I'm thinking if this affects the training. As I said above, I guess that we don't have to fix it in my gut feeling.

@nodchip That's true but I was talking about Stockfish playing code. There we use rotate180() only in one place, in order to track piece type and square of a given PieceId from both POVs and I think it's correct because we just look at the board from the other side. I'm not very familiar with learner code so I'm not talking about this part.

but isn't this 'from the other side' not equivalent i.e. king d1 vs e1 is not equivalent. A king on d1 never has castling rights while one on e1 might have them. A net should thus give a different evaluation even if castling rights is not an input feature. It is a bit related to the older discussion of psqt being a-d + flip or a-h, IIUC.

How a net can know about castling rights if they are not in inputs?

By the way, in the current concept, king on d1 may have castling rights if it's Black king (because e8 = d1 if we switch sides).

so, a first experiment could be to fix it in SF, and just run a test on the framework to see if this is a regression with the current net, i.e. if the net is sensitive to that. That's just a data point, but I hope somebody can submit a test.

https://tests.stockfishchess.org/tests/view/5f3d20cea95672ddd56c63bc

ELO: -195.00 +-36.9 (95%) LOS: 0.0%
Total: 224 W: 12 L: 126 D: 86
Ptnml(0-2): 34, 53, 18, 7, 0 

If we flip file instead of rotate 180, we also need to fix the trainer, and train a new net.

Rotating/mirroring is used in learner code which I'm not too familiar with. I'm not sure it's a bug for a network without castling as an input because without castling rotated positions are equivalent. If we want to include castling, it would need to be changed I think.

@dorzechowski I don't understand this, doesn't the network have implicit knowledge about castling because we train with an evaluation value that has been generated at depth > 0? If there is legal castling in a given position an evaluation at depth might castle and return the score after castling, while castling in the rotated position isn't possible so the score is just plain wrong for the rotated position?

I wanted to do this test because it was not obvious how much this would impact the strength, my "not obvious" was the reason why maybe I misunderstood vondele's words. This change makes the net see black's positions as if they were mirrored horizontally (doesn't sound too big of a difference does it? especially because without castling rights this doesn't change the game). IMO this suggests at least one of:

  1. the network was relying heavily on whether it was evaluating a position for white or for black and it learned to recognize that from the king position
  2. it didn't generalize enough to handle mirrored positions, which without castling rights shouldn't make a difference.
  3. the network somehow had knowledge about castling that this interfered with, similar to what gvreuls suggests

@gvreuls I have absolutely no idea what kind of implicit knowledge the network contains. Black and White can probably be distinguished with pretty good accuracy because White king would be probably 95% of the time on king side (mainly e1 or g1) while Black king on queen side (d1 and b1 - after rotation). But I just don't know if this kind of guesswork is correct, it would be much clearer to have an explicit castling input, adapt the training and make a new net.

This change makes the net see black's positions as if they were mirrored horizontally

No.
It is NOT equivalent to ANY symmetry transformation of the position.
The "net" always "looks from both sides". (if you do a null move, the two "sights" should be exchanged (this has nothing to do with code in search.cpp.)). What you did was to change the view from black side only. Of course the old weights won't work well.

the network was relying heavily on whether it was evaluating a position for white or for black and it learned to recognize that from the king position

  1. it didn't generalize enough to handle mirrored positions, which without castling rights shouldn't make a difference.

Very unlikely. The "training" code sometimes flips the board horizontally.

  1. the network somehow had knowledge about castling that this interfered with, similar to what gvreuls suggests

To summarize, you simply broke it, by giving data in a different format. The observed strength loss has nothing to do with "generalization" bla bla. ("Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" In the wrong order, in our case.)

I retract my earlier statement that it is not difficult to permute the weights to convert to new format.

If the code is incorrect, then we need to fix it, instead of trying to pretend that it doesn't affect anything. Of course, if after fixing everything and trained the new net, it didn't gain anything, then we can say it is just another way of doing things and move on.

I just knew this. I thought the board is mirrored vertically for black, not rotated. I turned off horizontal mirroring during training, with the purpose of letting the net learn that it has castling rights only at e1. I think that's why the test failed badly.

Note that the release is a beta release. ASilver is beta testing now.

The source code change is:

src/types.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/types.h b/src/types.h
index ce4c2dbb..d1a47ae3 100644
--- a/src/types.h
+++ b/src/types.h
@@ -565,7 +565,7 @@ constexpr Square to_sq(Move m) {

 // Return relative square when turning the board 180 degrees
 constexpr Square rotate180(Square sq) {
-  return (Square)(sq ^ 0x3F);
+  return flip_rank(sq);
 }

 constexpr int from_to(Move m) {

I tried training a net with flip instead of rotate180 and also a net using rotate180 from the same data for comparison. (100M d12 fen loop 1 from sf classical eval), mirror_percentage at 0%
Source of training data: https://discordapp.com/channels/435943710472011776/737073080081317948/738816872279179294
It seems like it's working pretty well but these nets are still weak compared to fully trained nets to make conclusions (by ~200 elo). At least the training is faster, maybe a higher ceiling too?
```
PLAYER : RATING ERROR POINTS PLAYED (%)
1 flip : 21.8 14.6 531.0 1000 53.1%
2 rotate : 0.0 ---- 469.0 1000 46.9%

@Serianol What engines did you use to measure the elo? We need to use an engine with flip for a net with flip, and an engine with rotate180 for a net with rotate180.

@nodchip I used a flip engine for the flip net and a rotate one for the rotate net of course. In fact, I even tried to change the binary for the flip net (using the rotate one) as an experiment and it lost about 150-200 elo. I also trained both nets with mirror_percentage at 0%

@Serianol Thank you. How elo will be changed if we set 50% to mirror_percentage both for a flip net and a rotate180 net. My expectation is "rotate with 0% mirror_percentage" < "flip with 0% mirror_percentage" < "rotate with 50% mirror_percentage" = "flip with 50% mirror_percentage".

Just noticed that hybrid eval is on for learning. So for positions with large imbalance, the back propagation is against sf eval, not the nnue. That鈥檚 probably not what you want.

The hybrid problem also counts for adding in tempo.

I got all of that stuff out before doing my test above

Since flip and rotate require matching the correct engine w/ the correct net I wonder if it would make sense to include a version number in the nets? This would prevent the engine from loading an incompatible net. Also features like castling rights, en passant, and maybe others should result in incompatible nets.

It does include it already.
To minimize disruption, maybe add castling rights together with switching to flipping?
And in training, shouldn't only positions without castling rights be mirrored?

@mstembera this is explicitly one reason for using the sha, and strongly emphasizing that there is a default net, and others will likely not work. The sha is an extreme way of versioning. There will be plenty of chances to change the format in incompatible ways, and I don't want to spent resources on maintaining what is compatible and what is not.

@vondele The sha naming convention is good but it just checks the name of the net matches the contents. You could have a net that uses flipping w/ a correct sha name and one that uses rotate180 that also has a correct sha name but they won't work w/ the same binary. I was thinking more like the first 4 bytes of the binary would hold a version number. @sf-x says a version number is already included but I'm not sure where.

the way I see it is that the sha is also a 'version number' and the default of 'EvalFile' specifies which version you need to use, giving some freedom to those who want to play with version, at their own risk, nevertheless.

If we would go with embedding the net in the binary (https://github.com/official-stockfish/Stockfish/issues/3030), we might as well get rid of the EvalFile option.

Perhaps it's just a point of view but to me the sha is more like a uniqueness number. Version would not be unique to just one specific net. Anyway I just suggested it in case it helped solve some migration issues between player and network training tools.

The network file indeed includes a version number, although it is more like a magic number:
https://github.com/official-stockfish/Stockfish/blob/530fccbf272ffe424ae53a464b91db148cc968ae/src/nnue/nnue_common.h#L71-L72

For those who have trouble parsing the C++ NNUE implementation, it might help to look at the Cfish code:
https://github.com/syzygy1/Cfish/blob/90cb3aed6b0587618fd929d05c9d02af65c64855/src/nnue.c#L756-L789

It seems common sense that rotating by 180 degrees cannot be optimal. Of course switching to a vertical flip requires training a new network.

Since castling and learning were mentioned in this thread, my 2 cents;

It would be good to add awareness of castling rights at the same time as vertical flipping. I don't think it is necessary to include en passant in the evaluation, as en passant can be handled by the search quite easily. Just avoid training the network on positions with en passant rights (very similar to how ep positions don't need to be included in TB files).

To add castling rights, it seems one could enlarge the number of king positions from 64 to 67, where the extra three positions are for:

  • king on E1 with 0-0 and 0-0-0 rights
  • king on E1 with 0-0 right
  • king on E1 with 0-0-0 right

Now a move that loses a castling right can be treated like a king move.

The main problem with this approach is that it doesn't work for Chess960. To make it work for Chess960, 5x3=15 extra king positions have to be added (king on B1, C1, D1, F1, G1; unfortunately B1 and G1 are not equivalent). This would also seem to require a lot of training effort just for Chess960, but that seems inevitable anyway if optimal Chess960 support is a goal. (The cheap way out is to ignore Chess960 castling rights in the eval, as is now the case generally.)

Perhaps there is a better way to deal with castling, but at least this is one way that could be implemented without too much hassle (as far as I can tell).

When training a network on search scores output by SF-NNUE, it should be considered whether one really wants to train the network on NNUE evaluations multiplied by 5/4 and with Tempo added:
https://github.com/official-stockfish/Stockfish/blob/530fccbf272ffe424ae53a464b91db148cc968ae/src/evaluate.cpp#L945-L949

(Dampening seems less of a problem as long as the training searches are on positions with rule50_count() == 0.)

I trained two builds with this, using identical content (1.45 billion positions) and parameters, including mirror at 50%. In both cases there was no gain for mirror over rotate at any stage of their development. In fact, rotate actually performed better, though well within the error margins. They also converged at slightly different points. There was no hint that there is any Elo gain to be achieved here. I tested them against a 3rd net, and even in head-to-head.

`````
Score of Stockfish NNUE 256 vs Stockfish NNUE 256mirror: 224 - 202 - 574 [0.511]
Elo difference: 7.64 +/- 14.04

1000 of 1000 games finished.

@ASilver you could upload both nets to fishtest and have two branches play against each other with the respective nets. That should yield better statistics quickly.

How does the "50% mirror" work? If for each position with the king at E1, the mirrored position with the king at D1 is also learned, I would not be all too surprised that rotating by 180 degrees and vertical mirroring give similar performance.

(If this is what happens during learning, it would seem to make sense to confine at least one (and perhaps even both) of the kings to the a-d files, which would reduce the network size by at least 25%. If the network is trained symmetrically, the resulting waits should also be symmetrical and we only need one side of them. Of course this stops being true as long as there are castling rights in existence and taken into account.)

With the network as it is and vertical flipping (instead of rotating), it seems to me that positions with castling rights should not be 50% mirrored in the learning phase.

I apologize, but the first net is a private project as yet, and the second one was trained, at request, identically. One additional point though: hybrid was removed from the patched binary as it would have yielded different build results.

From a discussion on discord a thought emerged that there should be no need for either flip nor rotate. The handcrafted eval always evaluates for white and so should the nnue eval be doing. Any transformations should be done during training only.

edit. This seems to be the only relevant bit https://github.com/official-stockfish/Stockfish/blob/84f3e867903f62480c33243dd0ecbffd342796fc/src/nnue/features/half_kp.cpp#L40, the piece_list_fb and piece_list_wb are always both updated in types.h, one with rotate180

@Sopel97 In my understanding the NNUE evaluation assumes that white is to move, which means that some mirroring operaton is needed when black is to move.

It would be possible not to mirror the weights in black's perspectve, but that would seem to break the incremental updating mechanism.

During network training must be ( NNUE ONLY ) --- no ( NNUE*5/4 + Tempo ) or ( Classic )
Otherwise, we get a hybrid network : )

Depth(8)-training.. Why no Depth(10), Depth(12) --- Time / Quality
It would be logical to do an experiment.. For example, [ D10-net ] vs [ D8-net ]

Here's my assertion: the mirror network embeds mirror symmetry into the network structure, and will increase the accuracy of evaluation, especially when data augmentation is not used during training. However, using the rotation network is not bug, just not as accurate.

Given a mirror-symmetric position, the mirror network will formulate the evaluation as v = F(contempt, [f1,f2,...], [f1,f2,...]), thus switching sides gives the same value; whereas the rotation network does not exploit this.

There is this paper that came out in September.

Finite Group Equivariant Neural Networks for Games

Not sure this helps, but it probably doesn't hurt to point it out here.

So, there has been quite intensive discussion/testing about this in discord, in particular by @noobpwnftw and @Sopel97 , and the conclusion seems that rotate is not a bug, but is by design. Basically, this appears to be a a condensed way to avoid having CR as an input and also imply KvK position in a halfkp architecture. Using the same training procedure, best flip net was -16Elo vs best rotate.

So, I believe we can close the issue.

I have since trained nets with it and had no such issues, plus the person who had implemented it was @nodchip and when it was pointed out to him, he stated it was a mistake. Whether or not it has no detrimental effects is another story, but it was not by design.

Well, originally it was by design for shogi and halfkp was designed with it in mind. So it kind of appears logical to me that rotate works better with halfkp. My flip nets were also worse. Until we find a net arch that works better with flip I think we should stay with rotate.

Well, originally it was by design for shogi and halfkp was designed with it in mind. So it kind of appears logical to me that rotate works better with halfkp.

Yes, but with a fundamental difference: shogi has no castling, so rotating has no effect on the evaluation. If I rotate a position in chess, the kings will be in a position that cannot castle, and the evaluation of a position may presume castling rights in a search. Of course with no CR, rotating is irrelevant as in shogi or in chess positions where castling is no longer a factor.

Maybe rotate is not better because of castling but because of typical opening positions for white being different from typical opening positions for black? In the opening position and when castling short, white has its king on the e-h files, and black on the a-d files (after rotate).

This could be verified by training a net which has separate features for black (+kDimensions/2*perspective)

Are nets trained on the flip-mirror positions of training positions?

Inputs are not transformed in any way. With that said though I have a branch https://github.com/Sopel97/Stockfish/tree/mirror that implements optional mirroring on the feature level and is used in the trainer. With 50% mirroring the results were bad, I've noted slightly better results when putting each position twice - mirrored and not - but it's questionable whether it was due to mirror or effectively more data.

If my explanation for the superiority of rotate over flip is correct, then I would indeed expect that the best current nets are not trained on flip-mirror positions and that doing so would not help (apart from having a larger training set).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

d3vv picture d3vv  路  4Comments

Silver-Fang picture Silver-Fang  路  7Comments

NightlyKing picture NightlyKing  路  7Comments

anonymous7002 picture anonymous7002  路  3Comments

mstembera picture mstembera  路  5Comments