Paper: Netty Issues - Players Kicked & Cannot Log In & Ping Issues - 1.12.x Onwards

Created on 25 Sep 2017  Â·  61Comments  Â·  Source: PaperMC/Paper

I am not sure who's domain this falls under, Spigot, Paper, Mojang etc.

Players are being kicked at random and cannot log back if if there is a player within render distance.

Netty Error: io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed
Paper Version: Paper version git-Paper-1216 (MC: 1.12.2) (Implementing API version 1.12.2-R0.1-SNAPSHOT)

There are documented issues here: https://bugs.mojang.com/browse/MC-118372 and https://github.com/netty/netty/issues/6607

Any further information would be useful. I'm not sure if you are aware of this issue or not.
It has been present on all my servers at least since 1.12.1 but possibly 1.12 also.

Again, apologies if this isn't anything that you can resolve.
Thanks.

bug vanilla

Most helpful comment

java …. -Dpaper.playerconnection.keepalive=XX -jar …

On Dec 24, 2017, at 4:06 PM, Mathias Gusto notifications@github.com wrote:

Thanks for this @electronicboy https://github.com/electronicboy. What would the flag be for the startup script? -paper.playerconnection.keepalive=30 ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/PaperMC/Paper/issues/895#issuecomment-353791977, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLAZB25IHX9Nmv-8SY_AvBzcIIVZdEpks5tDnajgaJpZM4PigDZ.

All 61 comments

cannot log back in if there is a player within the render distance

if that's the difference between a player being kicked and not being kicked, that would sound more akin to plugins than an actual bug on the server, all that specific error means is that the connection itself was closed, hence it couldn't be read

This was happening at random, I spun up a server on paper 1218 with no plugins and still get it, although I could only replicate it once. The log-in issue is quite rare on a daily basis, it's the kicks and apparent rollbacks every minute or so that seem to be affected by Netty.

I have ran all the mtr scans needed by my host and have confirmed it's not an issue on their side or on the VPS. Judging from the Mojang JIRA link that a troublesome version of Netty is being used, I can only imagine that is the case as the issues have only arisen since 1.12.

i have a same issue. The player keep disconnect likes "Looping", then if i give them the other backup ip , they can relogin the server again.

I am sure it is a Netty issue. When the player login then they are hard to load the map then get disconnection, no matter i reinstall my Bungeecord server and even my OS system, the problem is still here, when the player get this problem, i will give them the other server ip(other bungeeord server) then it's work, they can relogin without loading issue and without keeping disconnect.

Any chance you're both behind bungeecord? Spigot (and thus paper) also bundles a newer version of netty than the vanilla server in order to fix that issue that you've linked

I tried to test with both Bungeecord Build #1271 and Paperspigot #1218, and the problem is still existed.
It is very hard to test the result because this is a random issue.
This issue arisen since 8 ago. As we may know Bungeecord updated the Netty version recently, so think Paperspigot is work fine but Bungeecord.

I'm having the same poblem with a similar setup.

As 0XE4 , i used PingPlotter to do some MTR scan and no issue found at server side network.
Netty issue is confirmed. Especially ,If player is getting poor connection who will get this issue frequently.
Try to give player use the new ip, it will be solved. I tested for 4 days.

Other method you can help player to get in the server, you can teleport them to other place, after they join the server they will stay around 5-10 seconds to loading the data , you can see them then try to teleport your player to other chunk, this is second method to solve this issue temporarily.

I am not behind Bungeecord, never have used it. I am only running PaperSpigot.

~Edit: I am currently looking into the possibility of this also being an issue in ProtocolLib.~
~https://github.com/dmulloy2/ProtocolLib/issues/411~
I don't think this is the same.

Edit2: Rolled back ProtocolLib just in case and the issues persist, so it's not that. Nothing else on my server has changed since 1.12 so in my opinion it must still be Paper.

More links to this issue on 1.12, I can't see anything that points to this issue prior to 1.12
https://www.spigotmc.org/threads/netty-error.248430/
https://www.spigotmc.org/threads/error-on-join-server.269112/
https://www.spigotmc.org/threads/kicked-when-unafk.267038/

What build of Netty is PaperSpigot currently using as it seems it needs at least 4.1.11

4.1.15.Final

I can't see anything on netty's commit log that would indicate any resolution to such an issue, does disabling use-native-transport in server.properties have any effect? (Really not ideal, nor a long term solution, but might help provide some narrowing down...) The error that you're getting is more on the side of the OS itself than inside paper (an error is being thrown by the kernel when netty tries to read the connection, maybe an issue in nettys native library if anything on paper itself, could even be a bad host or something somewhere);

I'm somewhat skeptical on this as for an issue such as this one I would have expected a larger amount of people having issues and reported sooner and wonder if it's more of an issue in the software that people are using to host the server. if you're running on linux (and have access), is there anything in the kernel logs (dmesg)?

I will check those out at some point today @electronicboy. We have had the issue since 1.12 but I was troubleshooting the plugins first and then working with my host to see if it was an issue there before coming here, which is why it took some time I guess. Currently running CentOS 7 fully up to date with patches etc, it's fully managed by me so I can check those logs.

java -Xmx24G -cp netty-all-4.1.16.Final.jar:paperclip-1220.jar com.destroystokyo.paperclip.Main

This has greatly improved the performance and 'rollbacks' although they do seem to still be occasionally happening and everyone's connections show lower for short periods of time. Re-done mtr testing with my host both ways and can still confirm it is not a hosting issue.

provide some details. I am suing Ubuntu 16.04 with up-to-date kernel 4.13.3 and enabled the BBR. Hope can help to narrow down the issue. Host : OVH

Almost some as this Post( https://www.spigotmc.org/threads/error-on-join-server.269112/ )

When a player enter into Survival and wait 10 seconds, he's kicked to lobby with this message in console: Internal Exception: io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer

Anything in client logs when people are booted from the server?
As I said, this is a native connection issue, be that nettys native epoll, or an issue with the system the server is running on, or even the connection being closed by the client itself (or even bungee) for some reason, if people can confirm that updating netty, as above, improves the issue, I personally have nothing wrong with bumping up the netty version; However, I would need more confirmation on that it does help the issue, and ideally something from the client logs to at least give some indication of what is happening, even if it's a generic "connection went bye bye"

This is from a client after a random kick during high ping:

[17:32:04] [Client thread/FATAL]: Error executing task
java.util.concurrent.ExecutionException: java.lang.NullPointerException
    at java.util.concurrent.FutureTask.report(Unknown Source) ~[?:1.8.0_91]
    at java.util.concurrent.FutureTask.get(Unknown Source) ~[?:1.8.0_91]
    at h.a(SourceFile:47) [1.12.2.jar:?]
    at bib.az(SourceFile:991) [1.12.2.jar:?]
    at bib.a(SourceFile:419) [1.12.2.jar:?]
    at net.minecraft.client.main.Main.main(SourceFile:123) [1.12.2.jar:?]
Caused by: java.lang.NullPointerException
    at jx.a(SourceFile:42) ~[1.12.2.jar:?]
    at brz.a(SourceFile:558) ~[1.12.2.jar:?]
    at jx.a(SourceFile:38) ~[1.12.2.jar:?]
    at jx.a(SourceFile:11) ~[1.12.2.jar:?]
    at hv$1.run(SourceFile:13) ~[1.12.2.jar:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:1.8.0_91]
    at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.8.0_91]
    at h.a(SourceFile:46) ~[1.12.2.jar:?]
    ... 3 more

Do you prefer only logs from vanilla clients?

I understand that the client also uses Netty, some more technical players have used the latest build of Netty in a client and have not experienced any random kicks or kicks when logging in, which I found interesting.

I think only Netty group can fix it, so just wait the stable version.

I have noticed that the server process starts to eat up the CPU at the same time the ping goes up.
Most players get around 60ms and this shoots to 1100ms+ while htop shows the server process using ~120% CPU.

All ping tests from terminals are not reflecting this jump, they stay normal. This is making me think something within the Minecraft server is causing this.
Lots more troubleshooting and monitoring ahead i think...

Are there any flags that can be used to minimize CPU usage? At the moment it's a very plain 'java -Xmx20G -jar paperclip-1227.jar'

the fact that a player is getting an NPE when they're kicked would more imply that there is something sending the player packets that it's not all too happy about or that there is a bug in the client itself, I do question if there is actually anything for us to do here, and if there is anything we can do.

I have no real reason to believe that this is an issue with netty on the server (at least from what I've seen), but does using a newer version of netty on the client solve the issue for those who are experiencing this? are those experiencing this issue unable to reconnect, but able to connect after they restart their client?

0XE4 , just wait for the latest bungeecord with latest netty.

Mostly 'timed out' or 'closed connection' messages.

@FirstReplay Same errors here for players. @samueleycw I do not use bungeecoord, never have!

@FirstReplay our player came across with the same issue, i am think is it a map issue (or chunk?)

@0XE4 try Build 1232 , i felt better so far.

Updated to 1232 from 1230 @samueleycw and so far, it does seem better in terms of ping and rollbacks. Fingers crossed, the server has only been up a few hours and it's currently the middle of the night so I will check again during the day.

The commits for 1231 and 1232 do look like they are fixing the timeout issue.

After 1.12.2 update we have seen players getting high ping like over 500ms for no reason but are other server that is on 1.12 on the same machine is fine. this is without bungeecord btw with bungeecord it seems to be a little better but is still just as bad

any chance you can test the latest build, Spigot decided to move the keepalive calculations to the main thread after mojang moved it away, the latest commit should solve that issue. I'm not sure if there are other issues somewhere, but this should at least help with the keepalive ping calculations and potentially prevent some disconnections.

@electronicboy Server has been up almost 7 hours now on 1232 and the ping issues are MUCH better and no one is being randomly kicked from what I can tell. There are still occasional ping spikes but nothing like before, which was into the 1000ms+. Still getting a few rollbacks though when running etc but a lot less too.

@w-o-a-h yup, there are invisible issues inside the bungeecord (maybe Netty), i tested more than 3 hosting companies with clean bungeecord server and all of them got high ping. 1.12.1 ~ ms 60-80 , 1.12.2 ~ ms180 - 250.

gonna comment here so I get notified of updates to this :D somewhat rare but still getting reports of this, players saying they can get on other (most likely still outdated) servers fine but not mine, and getting random time outs while playing

Since updating to the latest past 1232, we are starting to see issues again. Although not as bad as before, but they are becoming just as frequent.

Okay, so, I have several areas where I can point fingers to, but I don't operate my own server anymore due to free time and other commitments, so; as it stands my only piece of information I can gather is from the source of the server software, and from you guys;

I have a few changes in here that should help provide more time for a client to respond to a keepalive and provide more information when something does happen, ~this isn't a change that I'm happy enough to push up to the main repo yet as I'd rather see if there are any tweaks that I need to make before it hits the repo (be warned, this jar will fail version checks, I recommend removing this asap, be that once you provide the info or any of these changes hit the repo here)~

~https://atlas.valaria.pw/jenkins/job/paper-self/90/~
(this has been merged into master now)

you won't be able to do this on a shared host, but this logger config will attempt to provide a bit more info, I'm not a log4j expert, but I know that this is pulling all of the potentially interesting information, save the file into your server folder, and add -Dlog4j.configurationFile=log4j2.xml as a launch paramater (anywhere before -jar)
https://gist.github.com/electronicboy/86a6a8d5e920015a0e2e749e5b1f902f

@electronicboy , may i know how to solve this issue after added -Dlog4j.configurationFile=log4j2.xml ? Thanks!
https://paste.md-5.net/fikuruvulu.sm

java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

that error won't be new, it's just because that log4j configuration enables debug logging level, the error you're seeing that is on debug level can now be seen. you can just ignore it all together, it managed to load when it needed to load in the end, it just needed to try in different ways to do it.

[19:36:03] [Server thread/INFO]: wtvrcait[/:] logged in with entity id 230418 at ([worlds/], ,)
[19:36:33] [Server thread/WARN]: wtvrcait was kicked due to keepalive timeout!
[19:36:33] [Server thread/INFO]: wtvrcait lost connection: Timed out
[19:36:33] [Server thread/INFO]: wtvrcait left the game
[19:36:33] [Server thread/WARN]: wtvrcait was kicked due to keepalive timeout!
[19:36:33] [Server thread/WARN]: wtvrcait was kicked due to keepalive timeout!
[19:36:33] [Server thread/WARN]: wtvrcait was kicked due to keepalive timeout!
[19:36:33] [Server thread/WARN]: wtvrcait was kicked due to keepalive timeout!
...

@electronicboy i always see this message: https://paste.md-5.net/gufuwowixu.avrasm

If that debug message is what you're seeing around the time you saw the other, that's to be expected, the client didn't reply to the keepalive, considering that the build you're using gives 30 seconds for a client to reply, there really isn't much we can do about this; if this happens while the player is connecting to the server, I'd suggest looking into the following, as I'm really not sure if there is something that I can accurately hook into in order to combat this;

  • how high is your view distance, larger distances = more packets to be sent, and that may overload some clients networks and cause them to not be able to respond to the keepalive in time
  • are you using any scoreboards, if so, check the side of world/data/scoreboard.dat, the contents of that file has to be sent to the client, if that's big, it stands to lag out the client before it can respond.
  • Do you have any plugins that might be sending a ton of packets when the player connects, e.g. particle plugins
  • Is there something in your spawn, e.g. a ton of tile entities such as chests that might cause chunk data to be large and take longer time to process on their side

On 1243 and seeing some new errors when players try and join, get in for 2 seconds then get kicked:

Thank you for the work so far, the ping/lag is really very low now.

if you got a different ID than it was expecting, that would probably be more akin to a plugin or something in your setup messing with the packet

Hmm, interesting. I don't think we have any that would do that, we don't use hardly any plugins. I'll look into it, thanks.

can you send a list of plugins?

any chance that the people who are getting the "got id: 0 expected id: xxxxx" are running protocols support? Speaking to somebody on IRC and they appear to be getting the same message that might be at the hands of protocolsupport and older clients sending a keepalive when connecting.

What do you mean by Protocols Support @electronicboy? A quick google tells me it's a plugin to allow older clients to connect. We don't use this. We do use ProtocolLib though, the latest build v390.

Hey, I know I am not using Paper spigot, But I wanted to point out, I am the usual spigot and I am experiencing these issues for my server REGULARLY as well as other specific people experiencing them regularly. I came across this because I have been looking through the internet for any solutions and have not found any. Assuming paper spigot does not break anything on 1.12 / spigot side, I would be willing to test any changes to this since I absolutely cannot join my own server even.

Paper already has changes that should improve the situation, based on the reports in this thread we are not certain it is cured but it is better.

If you want to help out, the best way to do that is to start running paper now with the changes we already have present. Then you can let us know your experience with them.

@Zbob750 we will try to to experience it, thanks for your profession team's support. No doubt , it is improve a lots and better than spigot build so far. I monitored this issue for almost two months , and i found that if the player keep getting kick then if you give them other bungeecord host ip then they can login to server as normal without read timeout problem, if switch to previous ip then can will be kicked.

I experience this issue on my server. It should be noted that I do use ProtocolSupport but have tried removing it and the issue seems to persist.

What happens for me is that some players get kicked due to keep alive timeout when trying to join the server. The message goes like "Playername was kicked due to keepalive timeout!".

It does not happen to all players. We think only those with bad/slow internet connections. Also it mostly happens on a certain map with loads of blocks and pretty high view distance.

Would it be possible for you to make a configuration option for the "keepalive timeout time"? If I could increase it to 60 seconds or even 90 seconds using configuraiton I could easily test and see if that solves the problem?

if you want to play with the value, you should clone the repo and modify it. This isn't one of those things that should really be configurable, especially given the nature of it we would have to have some sane clamping to prevent stupid values causing issues.

I have considered a system property, which makes it somewhat more only accessible to those who know how to set them, but still has the issue above. if you do create a build and play around with the value, feel free to report back your findings, I'm already iffy on the 30 seconds as it sits on the edge of what the read timeout handler allows inside the netty pipeline, which is not something I really want to increase.

Just to be sure we are talking about the same thing, I'm referring to the line if (!this.processedDisconnect && elapsedTime >= 30000L) { // 30 seconds for a ping reply also, don't fire if already disconnected here: https://github.com/PaperMC/Paper/blob/master/Spigot-Server-Patches/0246-Increase-time-allowed-for-a-keepalive-reply.patch#L36

The reason I'm thinking of that line is because I see the message "{} was kicked due to keepalive timeout!" (from the line just below) in my server console quite often.

I'm not sure I feel comfortable checking out the source code and making changes to PaperSpigot. If you could create a config or system property that would be really great.

By the way, what would bad values be? Would setting this to 60s not work? I'm eager to test and see. If it solves or does not solve my problem that is information either way. Gathering some information can't hurt, right?

Is there any kind of progress on this issue?

Players with a bad connection can enter 1.11.2 but from 1.12.x they can't enter, and I really do not know what to tell them.

as a "Merry Christmas now get the heck outta my shop!" kinda deal, I have exposed the keepalive limit to a system property, paper.playerconnection.keepalive; This value is currently not tamed, mainly as I personally cannot decide a sane limit beyond the default, this property is provided as isâ„¢, and may be removed in the future or tweaked.

You're more than welcome to set this property in your startup script (if you don't have access to the startup script, you will not be able to configure this (a plugin would be able to tweak this with reflection, but I'm not going to expose this in the paper config due to its nature)), the default value is 30, I've heard from people who've set this to 60 and "magically" all of their issues go away, however 60 is a bit OTT and may cause issues as that isn't the only limiting factor here. I'd suggest slowly bumping it to say, 40, and seeing if that solves your issues. Do NOT expect any form of support for any issues caused by playing with this value!

This should also NOT be treated as the first line of action, ideally, configure stuff like your view distance to support these people properly, ensure that you don't have any plugins slowing down the connection process.

Shamefully, with the client no longer sending keepalive packets to the server during connection, the server connection process is now much more limited in how long it can take during the connection process and how long the client can take before it starts replying to keepalives. The ability to set this limit should provide relief for those who are still having remaining issues, there is really nothing else that I believe we can do on this front.

Thanks for this @electronicboy. What would the flag be for the startup script?
-paper.playerconnection.keepalive=30 ?

java …. -Dpaper.playerconnection.keepalive=XX -jar …

On Dec 24, 2017, at 4:06 PM, Mathias Gusto notifications@github.com wrote:

Thanks for this @electronicboy https://github.com/electronicboy. What would the flag be for the startup script? -paper.playerconnection.keepalive=30 ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/PaperMC/Paper/issues/895#issuecomment-353791977, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLAZB25IHX9Nmv-8SY_AvBzcIIVZdEpks5tDnajgaJpZM4PigDZ.

java …. -Dpaper.playerconnection.keepalive=60 -jar …

I have added that to the start script, even to 120 and nothing helps. Players are still getting kicked. Any other solutions?

I'd suggest reading my post above, the only time this should really be an issue is either during connection or if you have a lag spike and the server doesn't send a keepalive.

I'd suggest keeping your eyes on timings, looking out for the usual issues that cause slow connections, e.g. scoreboard.dat, plugins holding up the login events

Can confirm the new java property paper.playerconnection.keepalive works great. Had an Australian player that hasn't been able to join since the 1.12.2 update because his internet sucks. I just found this tonight and set it to 60 and he was finally able to join after a ~45 second load time. Thanks ^_^

04.03 12:34:29 [Server] WARN MrSkullman_ was kicked due to keepalive timeout!
04.03 12:34:29 [Server] WARN handleDisconnection() called twice

@BBoyJD10 The idea of me referring you here is that you actually read the ticket (and the conversation afterward).

Just as a continuation for what it's worth, this issue has seemed to start again recently and I am still using the flag with 60 as the value.

Today it happened as you describe above.
I did not make any changes on the server.

I have over 1000 ping... All players have ping issues.
When i'm joining into server i get 'Loading terrain' for 5-30 seconds
There is still 20 tps, I have good dedicated server and a lot free ram, cpu and bandwidth...

How can I fix this?

further discussion on this issue has been locked, There is nothing further we can do on this (and further issue creation or complaints about how we should fix this really have any no effect on the fact that we've done all that we're really able to do; if you have a way to improve this situation, you're more than welcome to submit a PR), the default setting of 30 seconds should provide enough leeway for the majority of people, for the rest, while discouraged, paper exposes a system property in order to allow you to extend this timeout paste the default.

Compared to upstream, We have several modifications around this issue:
1: keepalives have been moved back to being processed async, this means that you're less likely to have a disconnection due to a lag spike.

  1. Reverted some of the logic of keepalives that Mojang and spigot have made to keepalives, mainly; restoring the changes made by spigot to the interval that a keepalive will be sent, but also increasing the time allowed for a keepalive to reply.

In general, Mojang made some changes in 1.12.2 in to fix an issue in the server, meaning that the client no longer blindly sends keepalive packets during connection, this means that we're solely relying on the client to reply to keepalives in a timely manner.

Mitigation for this issue really falls down to the following

  • Ensure that no plugins are blocking the login process (This also includes plugins on async events, such as AsyncPlayerPreLoginEvent); For Some events, you can refer to timings for this, for Async events there are no timings against these, however using a tool like warm roast might be able to provide an insight beyond the typical "binary plugin search" (aka, remove plugins in batches to see if you can locate what is causing the issue)
  • Remove texture packs from being sent to the client on login - applying resource packs on some machines is just too slow, delaying the sending of a resource pack using a plugin can help allow the client get past the heavy work of the initial login, or allowing users to download the pack from your site and install it manually.

  • Check your connection - A common problem is poor connections, e.g. latency (players across the world) or slow throughput; Tweaking options such as view distance, or the recently added max chunk sends per tick, can aim to reduce the load on the network as a player is connecting. tools like mtr (or, WinMTR on windows) are handy for diagnosing network issues between server and players, either ran from the server (where possible), or by your players

  • check for anything that causes large amounts of packets on connection - common causes are the ill-fated scoreboard.dat becoming too large (The contents of this is sent to a player when they connect!), plugins which send particles to the client on connection, blocks which store data inside of them, e.g. chests, shulkers... can result in increasing the amount of data that is needed to connect to the server.

  • increase paper.playerconnection.keepalive - this should be considered more of a last resort. increasing the time allowed for a keepalive reply might aid in helping those with slower machines or connections connect, but really, you should aim to see if you can resolve this by using any of the steps above.
    (java properties are set in your startup flags, e.g. java … -Dpaper.playerconnection.keepalive=XX -jar …)

That is far from an inclusive list, anything which could cause a client to take longer than expected to connect can hit this (30 seconds really should be plenty of time to allow players to connect....)

Additionally, Paper Build 232+ In MC 1.15.2+ has an attempted improvement for servers who were using Anti Xray. No promises this will solve everything.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Decme picture Decme  Â·  3Comments

successed picture successed  Â·  3Comments

TNTUP picture TNTUP  Â·  3Comments

devcat picture devcat  Â·  3Comments

MarkElf picture MarkElf  Â·  3Comments