It seems that a somewhat recent update to arena tournament pairings has affected black/white assignments for games, leading to sometimes really long streaks of players getting the same color in consecutive games.
To illustrate, looking at the past April marathon at https://lichess.org/tournament/spring20, and looking at the last 50 games played by the top 10 players, none(!) of the top 10 players ever had a streak of more than 3 games of the same color. (And even streaks of 3 times the same color were somewhat rare.) Clearly the system was set to prevent long streaks, and anything beyond 3 seemed to be impossible.
In comparison, looking at the finished Yearly Bullet at https://lichess.org/tournament/5SOczyHK, and again looking at the last 50 games for streaks of more than 3 games, I find among others:
The #1 had streaks of length 4, 4, 4, 4.
The #2 had streaks of length 4, 7.
The #3 had streaks of length 4, 5.
The #4 had a streak of length 4.
The #5 had streaks of length 4, 4, 4, 5.
...
The #10 had a streak of length 9.
The recent https://lichess.org/tournament/C696te7d is currently showing a similar trend, with long streaks being the standard rather than being an exception.
Unless there is a benefit to not preventing long streaks of the same color I am not seeing, this is probably a bug.
I now realize this is especially weird given that the Yearly Bullet was in January and the Spring Marathon was in April. The ongoing https://lichess.org/tournament/C696te7d shows the same trends as the Yearly Bullet, so maybe the Marathon was running on different/older pairing code when it started in April?
I somehow feel that having those streaks in 106 (maximum of the players that you say) or 84 (minimum of the players that you say) is likely to happen, I don't want to calculate it myself, but this video explain how likely to happen streaks are with 20 coins, whith 84 it seems much more likely to happen, I think there isn't any bug.
First of all, I said these streaks are from the last 50 games of all these players, not 106 or 84 games. (That's because the tournament page, when clicking on a player, only shows the last 50 games.) Second of all, why would color assignments be made fully random? Clearly the system was set up before such that, e.g., if someone on a BBB streak played someone not on such a streak, they would get a W in the next game. And if the system indeed would try to limit long streaks like this, the odds of no player ever getting any streak of 9 games would be overwhelming. This has nothing to do with sequences of random coin flips.
And I haven't done an exhaustive search at all, but in the BLM arena I have found a player who had an 11(!) game streak of white games (in the BLM arena, see #258 - Blauregen, with screenshot below). Even for a simple system which does not take color streaks into account for choosing which players to pair up, but which does take these streaks into account for assigning colors for this game, I think the odds of getting a 11 game streak would already be less than 1 in 10^80. So I should not be stumbling upon such a streak, if pairings were done to prevent them.

And especially for arena tournaments, where getting a streak of white games in top events can make a big difference in scoring lots of wins in a row and winning the tournament (which may involve prize money as well), color assignments should be as fair as possible and such streaks should be avoided.
First of all, I said these streaks are from the last 50 games of all these players, not 106 or 84 games. (That's because the tournament page, when clicking on a player, only shows the last 50 games.)
Sorry, I hadn't read that before posting my comment
Second of all, why would color assignments be made fully random?
1 and 2 add up, and I'm almost sure that lichess makes it less random and if it doesn't there would have been more streaks.
Clearly the system was set up before such that, e.g., if someone on a BBB streak played someone not on such a streak, they would get a W in the next game.
Lichess does that I think (explained on next anwer).
And if the system indeed would try to limit long streaks like this, the odds of no player ever getting any streak of 9 games would be overwhelming. This has nothing to do with sequences of random coin flips.
Coin flip: >7.5% (source)
Random study on lichess on the the first tournament I see with at least 50 games (link): 0% I only checked 50 first games, but there's a difference, isn't it?
And I haven't done an exhaustive search at all, but in the BLM arena I have found a player who had an 11(!) game streak of white games (in the BLM arena, see #258 - Blauregen, with screenshot below). Even for a simple system which does not take color streaks into account for choosing which players to pair up, but which does take these streaks into account for assigning colors for this game, I think the odds of getting a 11 game streak would already be less than 1 in 10^80. So I should not be stumbling upon such a streak, if pairings were done to prevent them.
I think with random pairings this probability is 2% (I haven't worked it but I've made an aproach using other known values)
Yes, you may have not done an exhaustive search, but think that there are many lichess users, you are the one that saw that streak and wanted to post it (See that streak >0.001% want to post it >0.001% users on lichess, users on lichess 30,000, time on average to see this 2 years), if you work this, it isn't very low likely to happen
And especially for arena tournaments, where getting a streak of white games in top events can make a big difference in scoring lots of wins in a row and winning the tournament (which may involve prize money as well), color assignments should be as fair as possible and such streaks should be avoided.
Why, isn't randomness fair, you've talked about streaks, but assuming black is worse than white you would have to talk about the total number of games and not about streaks.
"Random" isn't the fairest at all. In Swiss events with large numbers of people, any decent pairing system will give players who had W in round 1 a B in round 2. This usually alternates up to the last round, and only rarely does anyone end up with a significant +/- in terms of W/B games. None of these pairings are ever random in any decent software.
Do you then also say it's "fair" and optimal to, when a player who played WW and a player who played BB are paired against each other, we flip a coin to determine colors?
Note that what you care about here is the variance in the quantity (#W - #B), as clearly the mean over all players will be 0. And the variance will be significantly less if you try to balance this out actively, rather than doing random pairings.
If you want to get into the maths: for a streak of 10 whites, randomly this would be 2^(-10). Now suppose that a pairing system pairs players randomly, and once it pairs them it makes sure that the longest of the two streaks, if longer than 3, is ended there if possible. So a streak of 3 whites has probability 1/8. Now a streak of 4 whites only occurs if two players have a streak of 3 whites and are paired against each other, with probability roughly 1/8 * 1/8 = 2^(-6). To get a streak of 5 whites you need two players on a 4-whites streak, with probability 2^(-6) * 2^(-6) = 2^(-12). The exponent doubles every time the streak is extended. Continuing this we get a streak of 10 whites with probability approximately 2^(-384). So while 2^(-10) means this on average happens once every tournament with 1000 players playing just 10 games, 2^(-384) means that if every atom in the entire universe flipped such a biased coin every millisecond for the next gazillion years, this still would not happen.
Anyway: yes this is a bug, as no credible tournament pairing system would ever do random color assignments, as it increases imbalances and the variance in whites/blacks unnecessarily. How much this affects results is another topic, but unless there is a reason not to implement such a streak prevention system (which was clearly in place before, but has been deactivated now) I don't see why one would not implement that now.
Coin flip: >7.5% (source)
Random study on lichess on the the first tournament I see with at least 50 games (link): 0% I only checked 50 first games, but there's a difference, isn't it?
As I stated in my initial message, the tournament you quote does indeed actively prevent streaks, presumably using different rules than are in place now for determining colors. Indeed in that tournament I bet you can easily scroll through all thousands of players and not even find anyone with a streak of 7 games of the same color, let alone 8, 9, 10, or even 11.
The streak of 11 games is from a recent tournament, where these streaks do occur much more. And as I explained, with some basic maths the probably of such a streak happening with simple streak prevention rules would be 2^(-768).
Instead of trying to response or trying to read and understand tournament module myself (which is written in programming languages I do not understand) I think it's time to ask @ornicar and @niklasf to see what they think. What do you think?
To narrow down the source of this bug, a search through some recent big marathons gives the following results as to whether pairings were done properly or not (i.e. whether streaks of length 5/6/7/8+ appear). The list is sorted chronologically.
Listing Bullet Shield Arenas separately below:
Except for the 2020 Spring Marathon, and except for some delay in the effect in Shield Arenas, the trend seems to have started some time late last year.
I suspect that something wrong with commit 10d5bb.
One magic constant 50 was replaced by 30. I think, it should be (maxGroupSize / 2) instead.
Actually, if there are 31 or more pairs then user color statistics isn't updated (important that userRepo.firstGetsWhite had side effect).
And It seems that this case is often occurs in Bullet Marathons.
This issue dates back to quite long before that commit, so I doubt that's the origin of the bug.
Lichess never had color streaks protection algorithms in arena tournaments (only color balancing heuristics). Arena tournaments always announced as quite different thing than swiss tournaments with it's own features. It is quite possible that long color streaks is one of them as a cost of cheap implementation.
And I have some doubts that in current situation if longest color streak is approximate equal to binary logarithm of number of games played in tournament then it could be considered as long.
BTW, I don't know in mentioned commit it was a typo or one minute dirty hack for fixing that mongodb can't handle many updates for big tournaments due virus effect.
Maybe the protection was not explicit, but implicitly it definitely existed. I doubt that if you do a global search over all tournaments played from the origin of lichess up to late 2019, you will even find a single streak of length, say, 10. While recently streaks as long as 11 have been appearing.
And by your logic, you'd be ok with having a tournament where 1 in every 1000 players starts a tournament with 10 white games, and 1 in every 1000 players starts with 10 black games? Like I wrote before, no decent pairing system would ever allow such color imbalances -- if you e.g. host a 4 round swiss tournament with 10000 players on any website/software, certainly >99% of the players will get two whites and two blacks. While with random color pairings, or calculating with your binary logarithm heuristic, less than half of all players would have a proper color balance.
Just found 10 blacks streak in 2015 year tournament.
I add some ArenaTournamentColorHistory helper functions for streak prevention mechanics into my repository. For integration them into lila, we need one additional int32 for tourId, userId pair. Although, I'm not complete sure that it is possible for performance reason.
Brief possible usage:
Where vals with names starting with _ch_ are instances of _ArenaTournamentColorHistory_ class.
@antma Did you happen to find that 10 blacks streak by chance? Or did you do some query search over the database? If it's the latter, can you share what the data told you?
And if indeed that old 10 blacks streak was a freak occurrence that doesn't happen otherwise, it seems that the pairing system was fine until recently, when these streaks became more frequent. If we could roll back to the earlier color assignments, then at least these streaks would not happen so often anymore.
@tmmlaarhoven Found by parsing single user PGN downloaded about 3 years ago with simple Python script. Bullet marathon could be downloaded from API "https://lichess.org/api/tournament/${TOUR_ID}/games?moves=false", but for me they were too fat, so I had been started from local PGN.
if a pairing batch generates more than 30 games then it's random. Else it uses the user color counter https://github.com/ornicar/lila/blob/e79af646d5368c688510473ace0510af9f81ed72/modules/tournament/src/main/arena/PairingSystem.scala#L69-L82
To add again: penguingim1 is currently on a 13 whites streak in the summer marathon, and all top players have much longer color streaks than they should have. So this issue may be closed but the underlying problem still exists.