Leela-zero: leelaz doesn't always choose a move that guarantees a win in opening

Created on 22 Nov 2017  路  4Comments  路  Source: leela-zero/leela-zero

When white passes on the second move, black can see that passing wins the game with 100% probability. However pass is not always chosen because plays during the opening are randomized. In my testing in about 25% cases black didn't pass.

wontfix

Most helpful comment

The reason why white passes in the first place is that they can't always see yet that black passing is absolutely the best move and winning instantly. The program must learn this. Why enforce perfect knowledge by hand here and not for other random positions or problems? It's not because the bad move isn't instantly refuted 100% of the time that the program can't learn. The learned probability distribution will pull toward pass stronger next time and decrease the likelihood that even with randomization it fails to pass. And white will see that passing refutes its own pass earlier as well.

If one side can capture a large group in the opening but the randomization let the opponent get off free, should we add rules here as well here to make sure the program plays the right move?

The point is to let the program learn, not to add rules until you have GNU Go 4.0.

All 4 comments

I think it will be good to force black to pass so white learns to not pass

The reason why white passes in the first place is that they can't always see yet that black passing is absolutely the best move and winning instantly. The program must learn this. Why enforce perfect knowledge by hand here and not for other random positions or problems? It's not because the bad move isn't instantly refuted 100% of the time that the program can't learn. The learned probability distribution will pull toward pass stronger next time and decrease the likelihood that even with randomization it fails to pass. And white will see that passing refutes its own pass earlier as well.

If one side can capture a large group in the opening but the randomization let the opponent get off free, should we add rules here as well here to make sure the program plays the right move?

The point is to let the program learn, not to add rules until you have GNU Go 4.0.

I meant it to say.
If we are enforcing random moves at the beginning of the game we should stop enforcing random moves.

If we are not then everything is working correctly.

It does not enforce a random move. But it picks proportionally according to the visit count, so it will not necessarily play the best move. In these cases if the policy prior for pass was very high the odds for choosing something else would be miniscule. But they are not, and it does learn that that is wrong.

Was this page helpful?
0 / 5 - 0 ratings