Stockfish: Calibrate contempt against leela

Created on 6 Jul 2019  Â·  23Comments  Â·  Source: official-stockfish/Stockfish

I'm not the first highlighting this but as far as i know no effort has be dedicated to address this important topic.

Default 24 is probably a tad optimistic against Lc0. Should some research be put into determining a sweet spot value instead ?

0, 5, 10, 15 is arguably more effective. Base value can be lowered and let dynamic contempt play a more important role. Any ideas?

Most helpful comment

Well it’s certainly not a coding or programming issue and it is clearly more appropriate for discussion in the fishcooking forum. That’s my $.02 anyway.

All 23 comments

StockFish.Polyglot version author will now set contempt as 0 in his release.

https://chess.massimilianogoi.com/articles/08-07-19-The-Contempt-Coherence-factor-on-Stockfish/

My gut feeling would argue for safety against lc0, since this engine has proven to be highly resilient and doesn't blunder as much as weak engine. But I didn't find any evidence yet on whatever this settings improve/cripple SF in any way against stronger net.

Well even against Lc0 , it probably should be slightly positive with the white pieces and slightly negative with black. Although if openings are preselected , that’s not always true as some preselected openings favor black over white.

This is not a SF issue. SF default contempt is just that "default" and not calibrated against any specific opponent. If you are concerned w/ using a specific contempt against Lc0 (in TCEC final for example) that should be discussed in the fishcooking forum.

I started a topic over there.

@mstembera It's wrong to say it's not a SF issue when many say a value lower than the default might yield better score than not altering the default against Lc0.

Well it’s certainly not a coding or programming issue and it is clearly more appropriate for discussion in the fishcooking forum. That’s my $.02 anyway.

The issue is not technical, but this particular topic is recurrent in my opinion due to the lack of guidance around how to configure optimally this settings. I would add to that, that I witnessed myself an engine tournament manager complaining about the Hash settings and how the SF dev team was not proactive providing guidance "on the spot" for it. Obviously, such a request is low pri and I'm not throwing anything bad at the team for this. I simply want to prove my point with anecdotical evidence that the need for guidance is real. Also, the default value for those settings is challenge often, which is to me an other manifestation of the same thing.

If you look at other chess engines, some provide guidance in the form of an online user manual or online FAQ.

Ex: http://www.cruxis.com/chess/manual/index.html?some_frequently_asked_question.htm

If I do a quick search, I see no such thing regarding the SF configurable parameters.

If we had such a technical reference on how and when to change those parameters, general users and tournament engine manager would be able to figure out what to do on their own. Also, we could easily save recurrent discussion by simply refering to the online doc on what Contempt settings to use or what Hash size is optimal for a given scenario.

Sadly I have only a shallow knowledge on those topics. I can draft something in the form of a FAQ, and upon review we could update the SF technical docs with it.

It also depends on the long term vision for SF whether you guys want it to have a professional look (like Houdini) or simply a toolbox that other will package with their own guidance (like SF.Polyglot)

Sounds good?

I believe one reason why you do not see definitive guidance is that it literally can vary by OS, CPU etc etc. With that in mind , here are some thought from someone who’s been dealing with these issues for the last 30 years.

Hash - it varies by time control and amount of ram. A general rule of thumb is to never assign more than 1/4 of RAM on the PC and always small enough so that it fills 40-70% for the given time control and core count.

Contempt - the default settings were tested against an equal Stockfish - an engine of equal strength. If one feels they are too high , feel free reduce to zero. They are too high for my personal tastes, but in theory you should be fine if you left it alone. I have found that by reducing contempt , you will lose fewer games perhaps , might draw more games than usual and perhaps also win fewer games. I am not convinced that reducing it to zero will be beneficial in terms of final score against an Lc0 like engine . This is not like where one person has the magic answer. We have spent a lot of resources on testing contempt, the current default values should be fine.

Threads - real cores vs hyperthreading - there are two camps here - one says the number of real cores , one says the number of logical cores minus one for the OS. A middle ground is 3/4 the number of logical cores. When hyperthreading first came out , it was clearly a poor choice for chess engines. Modern CPUs seem to handle it much better now - at least for SF - I personally use logical minus one for analysis. If you try to use hyperthreading with certain engines , e.g., crafty, it is a noticeably defective on my machine. SF behaves completely differently from Crafty in that regard and it appears to be fine with logical cores minus one on my 12 real core machine ( threads ). I invite others to post their opinions as well as I am certain there are many different opinions. Trying to get a consensus - even for a FAQ, would be challenging and perhaps even divisive. If you just follow the hash guidance above , leave contempt alone and set threads to logical cores minus one , you should be fine and nobody here would have a right to complain.

Good luck!

from my iPhone

On Jul 10, 2019, at 6:35 PM, Jean Gauthier notifications@github.com wrote:

The issue is not technical, but this particular topic is recurrent in my opinion due to the lack of guidance around how to configure optimally this settings. I would add to that, that I witnessed myself an engine tournament manager complaining about the Hash settings and how the SF dev team was not proactive providing guidance "on the spot" for it. Obviously, such a request is low pri and I'm not throwing anything bad at the team for this. I simply want to prove my point with anecdotical evidence that the need for guidance is real. Also, the default value for those settings is challenge often, which is to me an other manifestation of the same thing.

If you look at other chess engines, some provide guidance in the form of an online user manual or online FAQ.

Ex: http://www.cruxis.com/chess/manual/index.html?some_frequently_asked_question.htm

If I do a quick search, I see no such thing regarding the SF configurable parameters.

If we had such a technical reference on how and when to change those parameters, general users and tournament engine manager would be able to figure out what to do on their own. Also, we could easily save recurrent discussion by simply refering to the online doc on what Contempt settings to use or what Hash size is optimal for a given scenario.

Sadly I have only a shallow knowledge on those topics. I can draft something in the form of a FAQ, and upon review we could update the SF technical docs with it.

It also depends on the long term vision for SF whether you guys want it to have a professional look (like Houdini) or simply a toolbox that other will package with their own guidance (like SF.Polyglot)

Sounds good?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

dynamic contempt is a self-elo gain vs equal opponent.
Static contempt is not an elo loser vs itself and gain vs lower tier engines.
How are this 2 related tbh? By using the same name and the same function? Their purposes are completely different.
Contempt vs leela - it's all cool but you need some data. We have data that contempt = 24 is not a big regression vs master and fits purpose of killing lower engines really well.
For TCEC finals and stuff? Yes, maaaaybe it has some negative 1-2-3 elo effect (although no one ever had proven it) but there is not data supporting this claim. Also people greatly overestimate it effect. With 200+ elo difference contempt brings extra 20, this is not that much tbh (but signifficantly > 0), against leela who is "within 50 elo" contempt effect is almost non-existant (apart from decreasing drawrate).

Static contempt is possibly elo loser vs stronger opposition.

That would be expected, but if SF is losing by 10 elo, is losing 2 elo more a big deal ? If it's losing by 100 elo, does it even matter if it loses by 10 elo less or 20 elo more ? Reducing draw rate with only a small expected score hit against a slightly stronger opponent actually improves the win/loss ratio and increases likelihood of winning a H2H tournament.

The theory goes that Leela is significantly stronger than SF with most pieces on board but gets weaker with less pieces on board. There are some hints of this. But we lack serious testing data, i.e. 100K games of SF-Leela with one contempt value and 100K games with another.

And this also asks what default contempt is good for. At tournaments, we can send SF with a custom value if needed. Default is what is used at fishtest, in rating lists and by unaware users for whom the small differences are likely inconsequential.

In case SF Dev face SF 10. Doesn't it make sense to adjust contempt for SF 10 to 0 as SF Dev is considerable stronger now ?

SF 10 is black. It's maybe good idea to lower the default contempt in that position. The compromise is for two pawns black can't castle plus active rooks for white. In total it's about 1 pawn in positional compensation. So black is better within a pawn.

Altenatively maybe it's worth a try to align SF10 contempt 0 against Leela as black.

rnbqkbnr/pppppppp/8/8/8/8/1PPPPPP1/RNBQKBNR w KQ - 0 1

image

For information, CCC is running a bonus round with 5 engines: 3 Leelas (Lc0, JHorthos aka Terminator and Leelenstein), SF Contempt = 0 and SF Contempt =100 aka TurboFish.

https://www.chess.com/computer-chess-championship#event=ccc-9-the-gauntlet-bonus-iii

The result so far:

PLAYER : RATING POINTS PLAYED (%)

1 Stockfish : 3545.3 21.0 36 58
2 Terminator : 3522.5 19.5 36 54
3 Lc0 : 3501.1 17.0 34 50
4 Leelenstein : 3468.4 15.0 34 44
5 Turbofish : 3462.8 15.5 36 43

Is good to mention that SF C=0 is in the lead overall and also h2h vs all engines with around 25% of the tournament played. Previous bonus round, SF C=24 (default) played against the Terminator and Lc0 and lost h2h by a small margin. Here's the results of the previous round with C=24 versus a Leelas for comparison.

https://www.chess.com/computer-chess-championship#event=ccc-9-the-gauntlet-bonus.
Lc0 (JHorthos) 53.5/100
SF 46.5/100.

https://www.chess.com/computer-chess-championship#event=ccc-9-the-gauntlet-bonus-ii
Lc0 (net net 42482) 102/200
SF 98/200

I'll need to do some more analysis on thoses number. Also, I consider this data with higher quality than typical FishTest benchmark because of the strength of the hardware used by CCC. So for me, thoses results has a higher weigtht than typical FT runs.

From what I observe of that round so far, time management with SF C=100 seems really poor compare to C=0. It seems that holding on to all its pieces with such a high contempt is complicating the search and make it less efficient. Seems logical because more pieces means longer search for a given depth.

I'll follow that event closely for any possible confirmation demonstrable through some number crunching.

Looks like 0 yield better score than default 24. 24 is a tad optimistic.

sorry but it's what, 40 games?
Are you serious making conclusions from it?

I was never a big fan of moving contempt off zero but once we did , I got onboard with it. There are two options - we change contempt based on fishcooking results and not some random 40 games that are played in an environment that we do not even control - or and , this is the maintainers call , we can say we’re sick of this shit and just move it to zero and we’re done with it.
Thankfully, we are blessed with the fact that our maintainers do not make rash , impulsive, hasty decisions like that and this is clearly one area we should take our time. To move to a process that does not involve fishcooking - I see that path highly unlikely. But an argument could be made that it’s the user prerogative - we , the SF team, provide our best options as we see tested and just like any other option , the user could pick and choose what he wants - number of threads , hash size , pondering , multipv , slowmover, contempt value. I’m personally fine with the current default settings , but if somebody want to run contempt with some other value - fine do what you think you should and you are already doing that. But don’t expect us to change contempt to zero based on your 40 games using zero with big hardware - that is preposterous - and the “I trust my 40 games on big hardware more than your proven statistical approach” (paraphrasing of course )shows arrogance, ignorance and a total lack of appreciation of the fishcooking process. I’m going to quit now - but I could go on. Recommended reading - “How to Win Friends and Influence People”. So far you are batting zero in that respect.

I appreciate your thoughts of your side of the story on this matter. Maybe I didn't explain myself correctly. I meant the quality of the hardware offset the low game count, so that data seems meaningful to me. For reference, I tried myself to replay some game played by selecting the same depth as provided in the PGN file (SF can reach depth high as 40 and above in the middle game portion.. ) and after few hours, a simple blitz game was replayed which ran for less than 10 minutes there. I can run multiple 250 STC test run from fishtest in that time. If you read correcly and openly what I wrote, I never say Fishcooking is not a good statistical approach. Also note that I didn't conclude anything nor ask for anything either. I just provided data in a public forum and gave a bit of an interpretation. You simply interpret what I said to be arrogant which was not the case in any way. I hope my previous statement are now clearer to you. I'm running fishtest worker day and night to support the SF devs and will continue to do so. I won't escalate this further by replying about the book thing which was rude and totally useless.

I meant the quality of the hardware offset the low game count, so that data seems meaningful to me.

This is very, very incorrect. High end hardware doesn't make small sample size disappear.

To put this in comparision Komodo 13.02 with -10 seems to play tougher against SF Dev at +24.

This was also my impression when i tried minus contempt on Houdini few months back.

Minus contempt against stronger opponents make games harder for the stronger side to win.

@Alayan-stk-2 Then, how can we benefit from the data provided by external sources other then fishtest? How can we weight it to compare it to SF test results? For some bonus round (Bonus II for exemple), it is a 200 games run, yes I agree, but it has been done over a 3-4 days period. If we convert that to a Nodes Count or nodes per second on average or something, could that be meaningful in any way? Again, I'll do some number crunching on my end in the near future. No conclusion yet. It might take some times because I want for the bonus round to finish to have maximum data and I'll cross check everything to be sure the methodology I'll use can extract any meaningful thing of that.

@OuaisBla Just FYI, SF contempt 100 is known to be over 40 elo weaker than SF contempt 0.
http://tests.stockfishchess.org/tests/view/5cf2ac170ebc5925cf087454
On the other hand SF contempt 24 is know to be within 1 or 2 elo of SF contempt 0.

@mstembera Thanks for the information, really appreciate.

I propose to move this discussion in the forum, as it becomes less urgent now.
Closing :-)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Alayan-stk-2 picture Alayan-stk-2  Â·  5Comments

Technologov picture Technologov  Â·  3Comments

d3vv picture d3vv  Â·  5Comments

NKONSTANTAKIS picture NKONSTANTAKIS  Â·  6Comments

rayoh123 picture rayoh123  Â·  5Comments