Minecraftforge: [1.12] Another strange Netty crash

Created on 7 Aug 2017  路  4Comments  路  Source: MinecraftForge/MinecraftForge

Similar to #4219, but not quite the same.

That issue (fixed by #4220) led to this exception:

java.lang.ClassCastException: Cannot cast net.minecraft.client.network.NetHandlerLoginClient to net.minecraft.client.network.NetHandlerPlayClient

This issue, as seen here (https://github.com/MinecraftForge/MinecraftForge/issues/4219#issuecomment-317061214), has a different exception:

java.lang.ClassCastException: net.minecraftforge.fml.common.network.handshake.FMLHandshakeMessage$ServerHello cannot be cast to net.minecraftforge.fml.common.network.handshake.FMLHandshakeMessage$ModList

I've just experienced this issue myself with a Forge dev environment, no mods installed (branched from build 2444).

Here's a copy of the fml-junk-earlystartup.log file (the error is here)

Most helpful comment

I want to explain what is the cause exactly, and get some feedback for submitting a PR, which fix would be better.

Before the commit https://github.com/MinecraftForge/MinecraftForge/commit/3fee319bc0513bd06ce956019a15308fb829ce65 fireUserEventTriggered was called from handlerAdded. The way before this commit made the old version work with a hack (that still had a different race condition, the hack should have been applied in handlerAdded). The change made it called directly from the server thread in insertIntoChannel instead of the server netty thread as handlerAdded did.

This means that the server can send a ServerHello packet to the client fromSTART handshake state from server thread. And sometimes, before this method returns the server receives a reply from the client, which is going to be processed in netty server thread, regardless of whether the new state has been set. Then the START state sends the ServerHello packet again, but since client now has the next state and the code just happens to actually use the packet, it crashes.

So here are a few ways I came up to fix it:

  • Revert commit https://github.com/MinecraftForge/MinecraftForge/commit/3fee319bc0513bd06ce956019a15308fb829ce65 and move the reflection hack to handlerAdded. The downside is that it's a reflection hack, and work has been done already to make the new way work correctly. On the other hand this way (without the hack) has been tested over the years and is known to work reliably
  • Less of a hack but still not a very good way would be to set the state to HELLO here before the method returns. This way by the time the packet is sent the new state will be already set. This should entirely fix the race condition, while keeping 2 threads involved. I didn't test this way yet
  • Add instanceof ServerHello check before this line and if it succeeds, call the HELLO state accept, and return. The packet can't be just ignored because then the server HELLO state will never actually be called, the packet actually needs to be resent. I don't know if it causes any side effects but it seemed to work for me
  • Rework all the handshake state code to always set the new state before sending any packets. This effectively does the same thing as the second solution, but in potentially cleaner way. But it's also more work
  • Somehow schedule this code to run in server netty thread but after handlerAdded?

Which one should I submit as a PR?

All 4 comments

Also see this forum thread.

Update:

I was able to get the issue to consistently occur by inserting a delay before the return here.

@Barteks2x had a look at the issue - it appears to be a race condition caused by https://github.com/MinecraftForge/MinecraftForge/commit/3fee319bc0513bd06ce956019a15308fb829ce65, which means this probably does need a rethink.

I want to explain what is the cause exactly, and get some feedback for submitting a PR, which fix would be better.

Before the commit https://github.com/MinecraftForge/MinecraftForge/commit/3fee319bc0513bd06ce956019a15308fb829ce65 fireUserEventTriggered was called from handlerAdded. The way before this commit made the old version work with a hack (that still had a different race condition, the hack should have been applied in handlerAdded). The change made it called directly from the server thread in insertIntoChannel instead of the server netty thread as handlerAdded did.

This means that the server can send a ServerHello packet to the client fromSTART handshake state from server thread. And sometimes, before this method returns the server receives a reply from the client, which is going to be processed in netty server thread, regardless of whether the new state has been set. Then the START state sends the ServerHello packet again, but since client now has the next state and the code just happens to actually use the packet, it crashes.

So here are a few ways I came up to fix it:

  • Revert commit https://github.com/MinecraftForge/MinecraftForge/commit/3fee319bc0513bd06ce956019a15308fb829ce65 and move the reflection hack to handlerAdded. The downside is that it's a reflection hack, and work has been done already to make the new way work correctly. On the other hand this way (without the hack) has been tested over the years and is known to work reliably
  • Less of a hack but still not a very good way would be to set the state to HELLO here before the method returns. This way by the time the packet is sent the new state will be already set. This should entirely fix the race condition, while keeping 2 threads involved. I didn't test this way yet
  • Add instanceof ServerHello check before this line and if it succeeds, call the HELLO state accept, and return. The packet can't be just ignored because then the server HELLO state will never actually be called, the packet actually needs to be resent. I don't know if it causes any side effects but it seemed to work for me
  • Rework all the handshake state code to always set the new state before sending any packets. This effectively does the same thing as the second solution, but in potentially cleaner way. But it's also more work
  • Somehow schedule this code to run in server netty thread but after handlerAdded?

Which one should I submit as a PR?

I investigated it further, and it seems like setAutoRead(false) should make the issue go away, but I think I hit this issue there: https://github.com/netty/netty/issues/3633

Was this page helpful?
0 / 5 - 0 ratings