Stockfish: @Vizvezdenec

Created on 25 Nov 2020  Â·  21Comments  Â·  Source: official-stockfish/Stockfish

Since you locked the conversation here https://github.com/Vizvezdenec/Stockfish/commit/b5fc79f8adf0aaf2b3172c4a7221b1697d52eb32 I simply reply here:
Unfortunately you confirm you think everything is about you.

Most helpful comment

I was hoping that this kind of discussions would be unnecessary, and that we can be self-organized based on being reasonable and flexible, for example by self-regulating and setting the priority on our own tests manually. I usually don't like strict rules, as they get in the way of making progress. Additionally, in the past years, in large part thanks to @noobpwnftw generous support of the project, waiting/running times have always been relatively short.

Yet, I also observe that that some of our most productive devs have many more tests in the queue. On the one hand, this is a good thing, it is how we make progress. On the other hand, we should make sure that those that have a different development style, submitting only a few patches per week, have also a satisfying testing experience. Ultimately, fishtest crunches through all patches submitted, so each dev essentially gets whatever 'share' he wants. I definitely don't think fishtest should be 'developer' centric, right now it is patch centric, each patch gets its fair share.

So, I don't think we should adjust priorities out of 'fairness' but rather as a tool on our side to make testing attractive for new/irregular devs, and provide an incentive for those to be active, i.e. grow our pool of developers. That is an important long term strategy.

As such I could consider adjust priority based on the number of patch in the queue by a given user. Some options could be

  • a boost for a test coming from a user with a single active test
  • a mild reduction in throughput for users with many tests (e.g. 1/(1+pow(n,1./3))).

All 21 comments

I simply will take an extended break and will see how you will substitute.
Feel free to make PRs if anything passes and submit as much tests as you want (1 per day is what you usually do on average).
I will be doing smth more interesting than arguing with almost inactive developers that don't even participate in any meaningful discussions like SPRT bounds and stuff despite being tagged there a few times.
Feel free to do whatever you want w/o me "abusing system" and other stuff.

Contributing to a project doesn't give you the right to put yourself ahead of others. Rest assured I won't do this

image

So who will do the work @Vizvezdenec did then? And what are the advantages if this is not done?

don't worry.
According to fisherman it's better to not test anything than "abusing" system submitting 8 tests at idle time when no one will submit any test in next 5 hours.
Cause you know idling fishtest is how stockfish improves. More idle time = more elo.
Seems logical and I'm accepting this.

Can't we just have a very pragmatic solution and significantly reduce throughput when multiple very similar variations are tested at the same time, e.g., something like throughput of variants of the same idea should add up to max 100% (or whatever number). When fishtest is idle it does not matter how low throughput is, the tests will anyway be picked up, and otherwise nobody else is blocked by the parallel attemps. It might not be a perfect solution, but it should be something that addresses the valid concerns of both of you (overuse of resources vs. idle workers).

I don't see the problem with Viz applying many patch variations.

If you think Viz is "stealing" resources you can set the priority of his patches lower. Now Viz's patches will just be replaced by more net tunes. Unless you think so lowly of Viz, I don't see how that's a net gain.

Shouldn't worker suppliers have the ability to refuse running tests of which they considered crap?

I've seen worse, but there is a priority system and you never actually managed to overload the test queue except when the server is dead in recent years?

Maybe a limit on concurrent tests from one user would help? Extra tests
automatically put to prio -1 until one of previous tests finishes? Or just
automatically downgrade the throughput if more than 4 (*) tests from one
user? e.g. 8 jobs at 50% throughput.

  • or whatever. Maybe a formula using the total number of active jobs

On Thu, Nov 26, 2020 at 3:28 PM noobpwnftw notifications@github.com wrote:

Shouldn't worker suppliers have the ability to refuse running tests of
which they considered crap?

I've seen worse, but there is a priority system and you never actually
managed to overload the test queue except when the server is dead in recent
years?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/official-stockfish/Stockfish/issues/3234#issuecomment-734360402,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AFTEM7DFXT3VUCWB2AFJMR3SRZX2XANCNFSM4UCK7ROQ
.

@ianfab @gekkehenker You both suggest lower throughput/priority just as I have suggested here https://github.com/Vizvezdenec/Stockfish/commit/b5fc79f8adf0aaf2b3172c4a7221b1697d52eb32#commitcomment-44524165 but you can read the response below.

@noobpwnftw We are all extremely grateful for the valuable resources you provide and hope we don't take them for granted or waste them. As such I automatically support any suggestions you have. The queue doesn't get overloaded thanks to you. The issues isn't that there are too few resources but that one user is selfishly grabbing more than 50% for himself. As a result all other users have to wait twice as long for their results hindering their ability to contribute.

An easy automatic solution is to assign fishtest resources per user instead of per test.

This is honestly such a pointless argument... STCs usually fail quickly anyways and it's better to keep fishtest running at full throttle than leaving it half empty. I'm sorry but I really fail to see your point of view @mstembera What you're saying would make sense if fishtest is super overloaded with a crapton of tests already, and then Viz submits a dozen tests on top of that, then maybe I can kinda see where you're coming from, but tbh fishtest almost never gets overloaded to such an extent...

But instead of directing the argument at Viz, maybe it's more productive to have an actual discussion on prio and throughput. @vondele What do you think?

I agree with @adentong . Any test (even failed one) is a contribution to Stockfish. Someone has to do such job as launching a lot of tests. I don't think it's constructive to focus on who exactly run the test.

@adentong I think you fail to understand that fishtest is always running at full throttle. The issue is that a majority of the resources get diverted to one selfish user who schedules 6 versions of the same patch. This is hurting the productivity of everyone else overall.

@mstembera Weren't these tests launched at night when fishtest was free enough?

All you need to know that this were not 6 versions of the same patch.
It were 4 versions + 2 versions of completely different patch which I was too lazy to make separate name for so made 5 6 instead of #patchname 1 2 - it was almost in the same place of search.cpp and I'm usually reluctant to make new patch names if I'm not making full reset of code.
But because some approvers tend to approve a lot of stuff not even making a slight glance on what is written (and then I need to stop this and inform author that either this logic is corrupted / it was recently tried / that it's not good to try 22nd patch on the same idea that never produced a single green stc, etc) they think that it's 6 versions of the same patch.
And then they try to teach everyone how to operate fishtest telling about some mythical "full throttle" while every person who actually speaks with noob reasonably frequently knows he has like 10-20k cores he can launch on fishtest at least for like 8 hours/day effectively increasing it calculating power x5 - but there are not enough tasks to load them so he does it only by personal demand (for example, in discord). And he did this multiple times when we actually had flurry of patches loading fishtest up to 50 tasks - which this cores managed to completely clean in 3 hours (and people were submitting new and new tests!). If anything we can write 3x patches, but I understand why noob wants to not give his cores on some tests that can't be committed or useless tuning sessions that produce nothing (especially for classical eval which is kinda obviously a dead end nowadays).
Before making some claims at least get some knowledge or you will look stupid.

@MaximMolchanov What is at night? We are all in different time zones but you can see the status from the screen shot above.
Anyway he does this regularly.

@Vizvezdenec "2 versions of completely different patch"? Please stop lying. Anyone can see for themselves these were all versions of the same patch. All comparing lmrDepth, (contHist[0])[movedPiece][to_sq(move)], and (contHist[1])[movedPiece][to_sq(move)] just to different constants.
https://tests.stockfishchess.org/tests/view/5fbdca4267cbf42301d6b268
https://tests.stockfishchess.org/tests/view/5fbdca7767cbf42301d6b26a
https://tests.stockfishchess.org/tests/view/5fbdcab967cbf42301d6b26c
https://tests.stockfishchess.org/tests/view/5fbdcaf667cbf42301d6b26e
https://tests.stockfishchess.org/tests/view/5fbdcb5367cbf42301d6b270
https://tests.stockfishchess.org/tests/view/5fbdcb8967cbf42301d6b272

@mstembera I'm sorry if what I'm about to say may offend you, but at this point I think you either have a personal vendetta against Viz, despite your claim otherwise, or you're too stubborn and too hung up on technicalities. Six variations of the same idea that either passes or fails in 20k or 30k games is seriously not even going to put a dent in fishtest's efficiency and capability, and that's not even considering the fact that noob is willing to temporarily inject his fleet if ever fishtest doesn't have enough cores to resolve the patches quickly enough. I still think that this whole thread is completely pointless, and as such I will not be making any further comments because I've said all I wanted to say.

I think this is a bit fishy, I have nothing against @Vizvezdenec and do appreciate his extensive contributions to the project, but I do agree with @mstembera that running 12 tests simultaneously is a bit much. My point isn't that the framework can't handle one dev doing this but if one dev is allowed to do this you can't stop other devs from doing the same thing and also run a dozen tests simultaneously, and if more devs start doing this it would affect the framework. This very thread proves that this can lead to conflict among the devs, so there probably should be some rule/maximum.

I was hoping that this kind of discussions would be unnecessary, and that we can be self-organized based on being reasonable and flexible, for example by self-regulating and setting the priority on our own tests manually. I usually don't like strict rules, as they get in the way of making progress. Additionally, in the past years, in large part thanks to @noobpwnftw generous support of the project, waiting/running times have always been relatively short.

Yet, I also observe that that some of our most productive devs have many more tests in the queue. On the one hand, this is a good thing, it is how we make progress. On the other hand, we should make sure that those that have a different development style, submitting only a few patches per week, have also a satisfying testing experience. Ultimately, fishtest crunches through all patches submitted, so each dev essentially gets whatever 'share' he wants. I definitely don't think fishtest should be 'developer' centric, right now it is patch centric, each patch gets its fair share.

So, I don't think we should adjust priorities out of 'fairness' but rather as a tool on our side to make testing attractive for new/irregular devs, and provide an incentive for those to be active, i.e. grow our pool of developers. That is an important long term strategy.

As such I could consider adjust priority based on the number of patch in the queue by a given user. Some options could be

  • a boost for a test coming from a user with a single active test
  • a mild reduction in throughput for users with many tests (e.g. 1/(1+pow(n,1./3))).
  • a boost for a test coming from a user with a single active test

@vondele There was a proposal back then by @MJZ1977 here https://github.com/official-stockfish/Stockfish/issues/3066 but you rejected that idea. Anyway, many devs would be happy with that proposal since I think devs who prefer to submit only 1 or 2 outnumber those who submit greater than 4. They just need to come back from their long vacation. @31m059 I hope you are doing well :)

Sorry but "pruning moves with clean countermove history" (idea taken from laser chess code) and "pruning earlier moves with really bad countermove history" (idea created by me) are different ideas, at least in my opinion.
At least if you add THEM BOTH they wouldn't intersect. So I see it as 2 different ideas on refinement of countermove based pruning.
Also what I can say - you can approve or not approve as much as you want. Or approve 5 tests and then approve 5 more test within 4 more hours.
I will keep submitting tests as I do because all your logic is based on claim that "fishtest lacks resources" - which is, as I said, just false - and I still see you silently ignoring this part of my speech twice (nice technique in arguing, but you are not going to fool anyone with it). If anything it lacks active developers.
Needless to say that when I hit 10 elo gainers in span of 40 days for some reason no one told me that I'm doing smth wrong (and I submitted like 3x of tests on average compared to current days) and now when I'm on a cold streak you happily appear to say smth.
Anyway, I don't want and wouldn't continue discussion on this topic and will remain with my opinion.
So I will mute this thread and mute it author personally to not receive any meaningless input.
Stay tuned to my next attempts to improve stockfish play after I finish my break.
PS currently we have 14 active tests, 5 of which are about classical eval (dead end which should not really be 100% priority ever because it's not really improving stockfish play in main mode), 2 tunings (net that was tuned and retuned 55 times and another classical eval term) and 2 simplifications - so basically 9 / 14 tests are not really going to bring us any elo, thus 70% of fishtest is loaded with stuff... Not really useless, but smth that wouldn't really improve stockfish.
At least it's not filled with my useless tests I guess.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

NightlyKing picture NightlyKing  Â·  7Comments

MoonstoneLight picture MoonstoneLight  Â·  5Comments

d3vv picture d3vv  Â·  4Comments

fun8 picture fun8  Â·  4Comments

maelic13 picture maelic13  Â·  3Comments