Paper: Players experience very high ping/timeouts when anti-xray is enabled (>3000ms)

Created on 6 Dec 2020 · 7Comments · Source: PaperMC/Paper

What behaviour is expected:

Expected: Anti-xray engine mode 1 or 2 should not affect player connection issues or ping.

What behaviour is observed:

When enabled on either engine mode, some players are either unable to connect, frequently timeout, or experience extremely high ping. When disabled, there is no ping issue whatsoever.

Steps/models to reproduce:

Unsure of how to reproduce or exact causes. It does not affect all players equally. For the player in question I used for testing, these were our procedures:

Enable anti-xray on the Survival server (both engine modes were tested)
Restart the Survival server
/server command from the Lobby server to the Survival server in question
Observe one or more of the following:
Very high ping, calculated with /papi parse %player_ping%
Timeouts while joining or shortly after joining
An inability to connect at all
Disable anti-xray on Survival server
Restart the Survival server
/server command from the Lobby server to the Survival server in question
Observe typical behavior

A video can be provided if needed. I am not able to record one as of the time of making this report.

Plugin list:

bukkit.yml, spigot.yml, paper.yml, server.properties

Default paper.yml causes no issues: https://pastebin.com/i2vMEkLw
With anti-xray on: https://pastebin.com/bFvaZgUx
spigot.yml: https://pastebin.com/VsfA0KzP
bukkit.yml: https://pastebin.com/nFScUQyW
server.properties: https://pastebin.com/UVqLNZiM

Paper version:

Waterfall version:

Anything else:

I have gone through testing without plugins as well and have not been able to find any other way to fix the issue. After much troubleshooting, I came across anti-xray being to blame. I am wondering if it is perhaps a packets issue? It seems to only affect some players (I and an admin of mine can connect, but a handful of users and mod staff have this issue.) Our bungee network is 8 servers and only on our Survival server (the only one we've used anti-xray on so far) is the only one with the issue. Switching to any other server through the proxy yields perfect results and good ping.

bug 1.16

Source

Stixil

All 7 comments

I have a few questions:

Does it happen independently of the number of online players? If not, roughly how many players are required to reproduce this issue?
How many CPU cores can this server use? Which CPU is used? Is the CPU also used for other tasks at the same time?
Does the server lag? Is the server overloaded while this is happening? Can you provide a timings link?
Do the players who are experiencing the problems have a worse internet connection than the other players? The packet size is indeed increased by Anti-Xray, especially in engine mode 2. But actually not that dramatic.

Additional infos about Anti-Xray that might be related here: When the server sends a chunk packet to the client and Anti-Xray is enabled, the server's network manager is is queueing the chunk packet and all other subsequent packets (except a few special packets, see NetworkManager.InnerUtil.canSendImmediate(NetworkManager, Packet<?>)) until the chunk packet obfuscation has been finished on the async thread. This is done to ensure that the packets are sent in the correct order to the client, while still being able to continue ticking the server on the Server thread. In principle this queue can cause higher pings if the thread pool queue is filled up with lots of packets or other stuff to be executed and there are not enough threads available or the CPU is already fully occupied. I cannot reproduce this issue on my server.

stonar96 on 6 Dec 2020

It is independent of the number of players. During testing it was just me, and two staff (one with the issue and one without.) We had the server in maintenance mode to see what was going on.

CPU is an R7 3800X. That server can use up to a max of 6 threads, but usage is usually very low. CPU runs other servers on the network but are far smaller, using only 2 threads each, with the rest reserved for the system. Server memory is 12GB.

Server does not lag at all during this. Using spark, as well as the built in TPS checks, it never has a drop because of this. I can't currently provide a timings report but can when I'm able to get back to testing it out.

The player who helped me test earlier does generally have a weaker connection (Los Angeles to Montreal), but almost always levels out around 120ms. With anti-xray on the server, the ping increases well above 1000-3000ms.

Stixil on 6 Dec 2020

I just did a quick test by adding a 5 seconds delay before chunk packets are flagged as ready by Anti-Xray. This doesn't affect the ping at all. So it seems like the issue is not the queue.

stonar96 on 7 Dec 2020

Could you provide a /protocol dump?
That shows which plugins use protocollib and which packets they listen for