Stockfish submissions to TCEC

Created on 6 Apr 2019  路  41Comments  路  Source: official-stockfish/Stockfish

@mcostalba @snicolet
No sf version has been submitted to this season's TCEC Division Premier which is about to start. Apparently they emailed us (I'm not sure who) on Wednesday and haven't received a reply.

I have suggested they use a recent version from abrok, I suggested this one:

Author: Marco Costalba
Date: Sat Apr 6 02:03:15 2019 +0200
Timestamp: 1554508995

Fix a missing assignment in previous commit  ...

as the last one before their deadline.

Can you contact TCEC to conifrm this is the one to use please?

Can we name someone to take charge of TCEC submissions (@Alayan-stk-2 :-) ?), preferably with someone else as backup. Or officially ask them to just use the latest abrok version each time, so that we're not relying on unreliable communications?

Most helpful comment

@snicolet Don't worry, I'm keeping an eye on it.

All 41 comments

The main question right now is, which version should TCEC use in Div P starting tomorrow?

I'm absolutely ok to assign someone to take care of tournament submission.

I would avoid to indicate a name myself. If Stephane agrees, people could submit their availability to take the role here in this thread, and a decision will be taken according to the community feedback (at the moment I prefer not indicate strict election rules, if needed it will be better formalized).

Regarding tomorrow's tcec, there's no reason now to avoid current Master.

For the current TCEC, they have received an email agreeing a version, so the urgent part is done now :-)

Having someone assigned to handle version submission to TCEC would be of limited use if that person has no authority to answer anything but "use the latest abrok exec from before the deadline".

Decisions should be mindful of community input, but e.g. a person assigned to submissions should be able to ask for non-default contempt or other meaningful parameters changes we'd have good reasons to believe would help.

@Alayan-stk-2 I agree with you on this point. This role should have some flexibility in picking the binary, as long as the engine is _still_ the official Stockfish and not something else. It should be an engine, not necessary from the current master, but IMO picked from this official SF repository. The flexibility I see is regarding the commit from which to pick, the compiled binary and the choice of UCI parameters values.

@mcostalba @Alayan-stk-2
The next TCEC deadline is for submission for the TCEC Cup, I believe the deadline is the end of the current premier division, which is likely to finish in around 7 days, probably sometime on 2019-04-27.

We need to submit a version today.

Aloril42: xoto10 alayant about 18h left until DivP is over.

I have merged https://github.com/official-stockfish/Stockfish/pull/2108 for TCEC.

Nobody stepped up, so I think I can close this one.

@mcostalba I didn't step up as I viewed this as giving some duty to be available and handle these matters, and I can't always be.

However, it seems that each time SF is to play in a new bonus/div at TCEC, the TCEC people have some trouble getting info on updates, and this would be less of an issue if I could confirm updates.

Aloril42: @alayant Could you be/become person who sends tcecSF
updates for TCEC? (Would be basically making sure latest is OK and sending email about it.. and parameters if they are changed) alternatively @xoto10 could be too though not seen here, seen in discord though.

Right now, I'm not appointed to do so, so even when I'm available and am sure of what we should send, my word isn't quite enough.

Would anybody mind ?

@Alayan-stk-2 please forgive me but I am not able to parse: "Right now, I'm not appointed to do so, so even when I'm available...".

This PR was open to ask if someone would step up. Nobody showed. I close it. If now something has changed I am glad to re-open.

So I ask again:

We are looking for somone to take care tournament submission. This role should have some flexibility in picking the binary, as long as the engine is still the official Stockfish and not something else. It should be an engine, not necessary from the current master, but picked from the official SF repository. The flexibility is regarding the commit from which to pick, the compiled binary and the choice of UCI parameters values.

If someone is interested, please, simply do write: "Yes, I candidate myself for this role".

Yes, I candidate myself for this role :slightly_smiling_face:

ok, i will leave this open few more days to see if someone else shows up.

I'm okay with alayant being a submitter

It has been a week now, and nobody else has stepped up or voiced opposition. So we can proceed further I presume ?

@Alayan-stk-2 sorry for my late reply!

Yes, of course. I'm ok with @Alayan-stk-2 as submitter for TCEC.

@mcostalba Thanks.

I assume I could also handle submissions to CCC ? This issue mentions mostly TCEC so coolchess would like a confirmation. I assume that if I'm trusted for one, I can be for the other. :slightly_smiling_face:

@Alayan-stk-2 yes, this is good for CCC too

@Alayan-stk-2 I have received the following email from Anton Mihailov (TCEC):

Premier Division of TCEC Season 16, where your engine participates, is going to start soon.
Last move of bonus gauntlet after League 1 Playoff is deadline for updates.
Please, send us your latest stable version and any additional information that you would like the admin to know.

Keep Aloril and Kan on CC. They are top professionals and will keep the smooth running of TCEC as usual.

For information about the season check it out at this link http://www.chessdom.com/tcec-season-16-information-and-participants/ . All the best in S16!

@snicolet Don't worry, I'm keeping an eye on it.

@Alayan-stk-2 I recently found out that TCEC now allows specifying up to 128GiB of hash and other engines are making use of it. SF is still set to 64GiB. I watched the Livelog for a bit while SF was playing and hashfull was over 70% quite often. More than 50% is usually sub optimal. Do you think you could ask them to bump us to 128GiB next time you submit? It would be nice if they could confirm we don't suffer a significant nps drop. A few percent is expected and normal but probably should be less than 5%.
@vondele Do you agree?

Generally I would say more hash is better if we have significant usage of it. In my experience nps is almost independent of the hash size, as soon as the hash is filled (which obviously requires high depth). I hope total memory is sufficient to have 2 engines use 128GB hash + 6-men TB.

Yes I confirmed the TCEC machine has 1TB total.

Here are some numbers I got pointed to by TCEC.
See pinned comments here for Blue:
https://discord.com/channels/479003439125495819/503252511134842885
The nps difference here is insignificant.

Also here:
https://discord.com/channels/479003439125495819/656532253471670314/658407338616553492
These show a significant difference after 1 minute but not much after 5 minutes.

The SuFi is 120'+10" which is the same as the match being played right now and hashfull in Livelog seems to hit roughly 70% much of the time. However our time usage spikes a lot from move to move and some moves hit over 90% and some below 50%.

I watched the Livelog for a bit while SF was playing and hashfull was over 70% quite often. More than 50% is usually sub optimal.

Do we have test data to back this up ? An issue with these extreme configurations is that they see very little testing.

I remember we had some data on low-medium hash sizes (can't find it in UsefulData, but I remember some table that went up to 1GB), but at 64GB and 128GB we are mostly guessing at what the optimal elo-wise hash is.

To big has isn't always better, first there are slowdowns, second the hash still has 3-fold pollution.

The 'slowdowns' are a dangerous argument. IMO, as soon as the hash is larger than caches, the slowdowns are mostly related to properties of search (e.g. explores more or less new nodes) especially at half-used hashes. When the search is different, the speed is kind of meaningless (e.g. likely that if any visited node is a hash hit, one is quite effective, even if a bit slower). I once did a measurement on speed at various large hash size, at full hash for all sizes (i.e. not standard bench), and it mostly is the same (might depend a bit on hardware, of course):

That's on 127 threads:

|Hash (Mb) | nps |
| --: | --: |
| 1024 | 155083772 |
| 2048 | 154900915 |
| 4096 | 155914852 |
| 8192 | 159769535 |
| 16384 | 163382859 |
| 32768 | 163083055 |
| 65536 | 161142317 |
| 131072 | 162743338 |

It is only not a problem when everything works just right, for example the TT memory does not get migrated by OS very often, the machine doesn't have much NUMA congestion and so on. It gives a false impression that one can go with as large hash as possible without any consequences.

One example is if running on the other box, it'll get severe slowdowns for anything beyond 16GB. RAM configuration is at full speed on all channels, not technically a hardware "problem", it is what it is, just with faster cores and more nodes.

@Alayan-stk-2 Since 50% and 70% are percentages they are independent of total size. Here is some quick math though. When the hash is 50% full it means any single entry has a 50% chance of being current. There are 3 entries per cluster. Therefore when a new entry has to be stored there is a 0.5 * 0.5 * 0.5 = 12.5% chance that another current existing entry will have to be tossed out to make room. At 70% full this becomes 0.7 * 0.7 * 0.7 = 34.3% so almost 3x more likely.

@noobpwnftw Yes agree testing for such cases is very important. That is why I asked for the numbers from TCEC linked above. I don't think they point to any problems.

Another small improvement for submission would be to request slightly less than the max 176 threads available on the TCEC machine. The machine is CPUs: 4 x Intel Xeon 4xE5-4669v4 2.2 GHz, Cores: 88 physical / 176 threads. Leaving 1 or 2 or 4(1 per NUMA node) threads for the OS, other non search SF threads, etc. would allow the scheduler not to prempty the search threads as much and actually help performance. I did a test of 35 threads vs 36 threads on a 36 thread machine and it was a slight win.
https://tests.stockfishchess.org/tests/view/5f361d0411a9b1a1dbf18e83
I would request 175 or 174 or 172 threads.

yes, I agree that it might be wise to leave a few threads empty. Maybe 172 is right. The test could however be sensitive to TC (i.e. cutechess reacting too slowly, and a few ms matter at those TC).

@Alayan-stk-2 @vondele Will one of you be able to notify TCEC of the config tweaks discussed above?

yes I have done so.

Thanks!
I just found out that TCEC will conduct a "hashsize" test after League 1. Here is the info:
"Testing Stockfish 64GiB vs 128GiB, 2 games from starting position, 120min+10s. Will link
to resulting log files in #enginedev-log after test."
The livelog data should be interesting.

Aloril has confirmed to me that the DivP submission for SF is ok and that SF will use the default net and 172 threads as requested.

The two 120'+10" TCEC test games of SF NNUE 172 thread 64GiB vs 128GiB have concluded and are accessible here:
https://tcec-chess.com/#div=hashsize&game=1&season=19
The compressed log file is also available for download here:
https://tcec-chess.com/loglive/archive/TCEC_Season_19_-_Hash_Size_Test.log.xz
I don't believe there is any speed issue with 128GiB and the log also shows that we do run into situations where the hash is over 90% full. I would like to thank TCEC for running these tests. We should notify them before the start of DivP of our decision.
@vondele @Alayan-stk-2 Thank you.

As extracted from the info string just before the bestmove:

depth
nps
hashfull

nothing particular stands out, the observation that with 64GiB we're indeed quite frequently with hashfull > 90% holds, so I think we can go with 128.

A quick note: There's basically three theoretical slowdowns related to hash size:

  1. When the hash is bigger than the CPU L3 cache, which is around couple megs to couple dozen megs depending on the system. After that, TT lookups start to become misses.
  2. When the hash is bigger than the CPU TLB cache coverage, which is around couple of gigs with large pages. After that, TT lookups also begin to require a page walk for virtual-to-physical address translation.
  3. When the hash is so big that even the page tables won't fit in the CPU caches. Then the TT lookups will also trigger an extra DRAM read for accessing the last level of page tables. That should be somewhere in the tens of TBs range. 1 GB large pages should push this transition beyond the PB range.

64 GB and and 128 GB are well between slowdowns 2 & 3, so there shouldn't be a big nps difference with these hash sizes. On paper, at least.

The slowdown transitions are easiest to observe with 1-threaded bench using a high-clocked CPU.

The TCEC box has a quite rare configuration which seems to be free of most problems regarding to RAM sizes. Such behavior usually does not transfer to any other machine, people need to run benchmarks on their own box to decide what is the best.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

d3vv picture d3vv  路  5Comments

rayoh123 picture rayoh123  路  5Comments

anonymous7002 picture anonymous7002  路  3Comments

maelic13 picture maelic13  路  3Comments

nguyenpham picture nguyenpham  路  4Comments