Stockfish: Is it reasonable to do fishtest against other engines in addition to the current master?

Created on 14 Jan 2019 · 7Comments · Source: official-stockfish/Stockfish

Suppose there's an additional test. Let the new version play against Komodo, Lc0, and Houdini for some games, and record points it gets. Then let the master play against the same three, and compare the points with the new version's. If the new version seizes more wins and less losses, then definitely it should be accepted. Otherwise, whether to accept it should be rethought.

Source

Silver-Fang

👍1

Most helpful comment

This is really really really useless stuff.
1) From all history of sf testing there was always a clean evidence that gaining elo in selfplay -> gaining elo vs any opponent. And number is the same (if you take error bars into account). No sufficient proof of anything else was ever provided by anyone.
2) Testing vs other engines will require much bigger number of games because error bars will be doubled. And it will provide basically square zero additional data but will load fishtest with completely useless tests. Not to mention that you need to buy commercial engines/have GPU for leela.

Vizvezdenec on 14 Jan 2019

👍2

All 7 comments

Such test, apparently, will directly improve SF's competitiveness in tournaments such as CCC or TCEC.

Silver-Fang on 14 Jan 2019

👍1

I'll gladly install Houdini and Komodo on my testing rig if you pay for them.

gvreuls on 14 Jan 2019

I think instead of Houdini or Komodo we should test against Leela since she's apparently quickly becoming our biggest competitor as well as being a NN engine.

adentong on 14 Jan 2019

Vizvezdenec on 14 Jan 2019

👍2

@adentong Why not gating the promotion of Leela nets with matches versus Stockfish instead? Every machine there are perfectly capable of running Stockfish, and it is a more reliable source for measuring performance for whoever interested.

noobpwnftw on 14 Jan 2019

This suggestion adds complexity , randomness, noise, confusion and costs where none is needed. The history of Elo gains over the last 10 years by Stockfish is unmatched by any other traditional A/B chess search engine. I do not believe there will be any tangible benefits from the suggestion thhat would increase Elo at faster rate than what SF gains now. Just my $.02, yomd. ( your opinion may differ).

MichaelB7 on 14 Jan 2019

Perhaps for endgame "start position" experimentation if the opponent is similar in strength to SF (either before or after a patch) there may be some value in this. But in general I would expect noise to greatly increase.