Stockfish: Is it reasonable to do fishtest against other engines in addition to the current master?

Created on 14 Jan 2019  路  7Comments  路  Source: official-stockfish/Stockfish

Suppose there's an additional test. Let the new version play against Komodo, Lc0, and Houdini for some games, and record points it gets. Then let the master play against the same three, and compare the points with the new version's. If the new version seizes more wins and less losses, then definitely it should be accepted. Otherwise, whether to accept it should be rethought.

Most helpful comment

This is really really really useless stuff.
1) From all history of sf testing there was always a clean evidence that gaining elo in selfplay -> gaining elo vs any opponent. And number is the same (if you take error bars into account). No sufficient proof of anything else was ever provided by anyone.
2) Testing vs other engines will require much bigger number of games because error bars will be doubled. And it will provide basically square zero additional data but will load fishtest with completely useless tests. Not to mention that you need to buy commercial engines/have GPU for leela.

All 7 comments

Such test, apparently, will directly improve SF's competitiveness in tournaments such as CCC or TCEC.

I'll gladly install Houdini and Komodo on my testing rig if you pay for them.

I think instead of Houdini or Komodo we should test against Leela since she's apparently quickly becoming our biggest competitor as well as being a NN engine.

This is really really really useless stuff.
1) From all history of sf testing there was always a clean evidence that gaining elo in selfplay -> gaining elo vs any opponent. And number is the same (if you take error bars into account). No sufficient proof of anything else was ever provided by anyone.
2) Testing vs other engines will require much bigger number of games because error bars will be doubled. And it will provide basically square zero additional data but will load fishtest with completely useless tests. Not to mention that you need to buy commercial engines/have GPU for leela.

@adentong Why not gating the promotion of Leela nets with matches versus Stockfish instead? Every machine there are perfectly capable of running Stockfish, and it is a more reliable source for measuring performance for whoever interested.

This suggestion adds complexity , randomness, noise, confusion and costs where none is needed. The history of Elo gains over the last 10 years by Stockfish is unmatched by any other traditional A/B chess search engine. I do not believe there will be any tangible benefits from the suggestion thhat would increase Elo at faster rate than what SF gains now. Just my $.02, yomd. ( your opinion may differ).

Perhaps for endgame "start position" experimentation if the opponent is similar in strength to SF (either before or after a patch) there may be some value in this. But in general I would expect noise to greatly increase.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

maelic13 picture maelic13  路  3Comments

d3vv picture d3vv  路  5Comments

ZagButNoZig picture ZagButNoZig  路  6Comments

anonymous7002 picture anonymous7002  路  3Comments

d3vv picture d3vv  路  4Comments