1.9.0.0.12287
TL;DR: If a database error occurs on the lobby server, the exception serialized over RMI cannot be deserialized on the client side. This causes the client to disconnect from the lobby. The root cause is due to #3877 when the Postgres dependency was moved to the lobby project. Thus, it isn't available to game-headed, and instances of PSQLException cannot be deserialized by the lobby client.
@Cernelius originally reported this issue on the forum:
I was about the give a 2 weeks ban, right clicking, for both name and mac, to <redacted> when I got disconnected from the lobby.
So, I rejoined the lobby and gave the same ban and, again, I was disconnected. The error message was:
"Connection to Server Lost. Please close this instance and reconnect to the lobby."
This is the same thing that happened the first time I tried, as well.
I think this is a serious bug here.
A stack trace was provided:
Full log
ott 03, 2018 10:24:42 PM games.strategy.net.nio.Decoder loop
GRAVE: error reading object
java.io.IOException: java.lang.ClassNotFoundException: org.postgresql.util.PSQLException
at games.strategy.net.nio.Decoder.lambda$loop$0(Decoder.java:71)
at games.strategy.io.IoUtils.readFromMemory(IoUtils.java:60)
at games.strategy.net.nio.Decoder.loop(Decoder.java:67)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: org.postgresql.util.PSQLException
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at java.io.ObjectInputStream.resolveClass(Unknown Source)
at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
at java.io.ObjectInputStream.readClassDesc(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
at java.io.ObjectInputStream.defaultReadObject(Unknown Source)
at java.lang.Throwable.readObject(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
at java.io.ObjectInputStream.readSerialData(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at games.strategy.engine.message.RemoteMethodCallResults.readExternal(RemoteMethodCallResults.java:56)
at games.strategy.engine.message.unifiedmessenger.InvocationResults.readExternal(InvocationResults.java:46)
at games.strategy.net.nio.Decoder.readMessageHeader(Decoder.java:149)
at games.strategy.net.nio.Decoder.lambda$loop$0(Decoder.java:69)
... 3 more
ott 03, 2018 10:24:42 PM games.strategy.engine.framework.GameRunner lambda$start$0
GRAVE: Exception on remote
java.lang.RuntimeException: Exception on remote
at games.strategy.engine.message.UnifiedInvocationHandler.invoke(UnifiedInvocationHandler.java:54)
at com.sun.proxy.$Proxy2.banUsername(Unknown Source)
at games.strategy.engine.lobby.client.ui.LobbyFrame.lambda$createAdminActions$2(LobbyFrame.java:142)
at games.strategy.engine.lobby.client.ui.TimespanDialog.lambda$runAction$4(TimespanDialog.java:213)
at java.util.Optional.ifPresent(Unknown Source)
at games.strategy.engine.lobby.client.ui.TimespanDialog.runAction(TimespanDialog.java:211)
at games.strategy.engine.lobby.client.ui.TimespanDialog.prompt(TimespanDialog.java:206)
at games.strategy.engine.lobby.client.ui.LobbyFrame.lambda$createAdminActions$3(LobbyFrame.java:137)
at games.strategy.ui.SwingAction$1.actionPerformed(SwingAction.java:65)
at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
at javax.swing.AbstractButton.doClick(Unknown Source)
at javax.swing.plaf.basic.BasicMenuItemUI.doClick(Unknown Source)
at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(Unknown Source)
at java.awt.AWTEventMulticaster.mouseReleased(Unknown Source)
at java.awt.Component.processMouseEvent(Unknown Source)
at javax.swing.JComponent.processMouseEvent(Unknown Source)
at java.awt.Component.processEvent(Unknown Source)
at java.awt.Container.processEvent(Unknown Source)
at java.awt.Component.dispatchEventImpl(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Window.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
at java.awt.EventQueue.access$500(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue$4.run(Unknown Source)
at java.awt.EventQueue$4.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)
Caused by: java.io.IOException: java.lang.ClassNotFoundException: org.postgresql.util.PSQLException
at games.strategy.net.nio.Decoder.lambda$loop$0(Decoder.java:71)
at games.strategy.io.IoUtils.readFromMemory(IoUtils.java:60)
at games.strategy.net.nio.Decoder.loop(Decoder.java:67)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: org.postgresql.util.PSQLException
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at java.io.ObjectInputStream.resolveClass(Unknown Source)
at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
at java.io.ObjectInputStream.readClassDesc(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
at java.io.ObjectInputStream.defaultReadObject(Unknown Source)
at java.lang.Throwable.readObject(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
at java.io.ObjectInputStream.readSerialData(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at games.strategy.engine.message.RemoteMethodCallResults.readExternal(RemoteMethodCallResults.java:56)
at games.strategy.engine.message.unifiedmessenger.InvocationResults.readExternal(InvocationResults.java:46)
at games.strategy.net.nio.Decoder.readMessageHeader(Decoder.java:149)
at games.strategy.net.nio.Decoder.lambda$loop$0(Decoder.java:69)
... 3 more
I wasn't able to reproduce @Cernelius's original error, but I was able to manually trigger a database error and observed the exact same symptom reported above. (@Cernelius, any additional information you can provide beyond your forum post to help us reproduce the actual database error would be appreciated.)
I verified that if I moved the Postgres dependency back to the game-core project, the PSQLException is deserialized correctly, and the user is not disconnected, but rather is simply presented with our standard error dialog.
I tried to see if there was a better way to communicate errors over RMI. For example, by simply sending a string representation of the error, possibly including the stack trace, instead of serializing the actual exception. However, it would require breaking compatibility with the lobby, which I'm not sure we've committed to yet.
Another possibility is to never re-throw instances of SQLException up the call stack, either standalone or as the cause of another exception. That would ensure we never try to send the PSQLExceptions over the wire. The benefit of this solution is that it should only require deploying a new lobby; an updated client would not be required.
If we go with simply moving the dependency back to game-core, note that this is going to require pushing a hotfix for the current stable. Since we've already broke compatibility on master, we'll have to deal with publishing releases from a secondary branch sooner rather than later.
Since we've already broke compatibility on master
:disappointed:
How many changes are there, is reverting those an option?
With the choices suggested, an updated lobby seems like a good option
If we go with simply moving the dependency back to game-core, note that this is going to require pushing a hotfix for the current stable.
This is something we should probably practice. In theory the modus-operandi here would be to check out the tag of the latest release, make updates, then push a new tag to the repo triggering a build. AFAIK that is untested, perhaps not working.
How many changes are there, is reverting those an option?
AFAIK, #4123 is the only incompatible change committed so far. @ron-murhammer, are there any others?
@ssoloff I believe that is correct. I'm fine if someone wants to revert and then re-apply that PR in order to put out a fix in master for this. Otherwise I guess we need to create a bug fix branch and release configuration on it.
From infra point of view, just to confirm, the best way to deploy is to push to prod branch an updated version to this line: PROD_LOBBY_VERSION @ https://github.com/triplea-game/infrastructure/blob/master/roles/host_control.sh#L22
Ideally that would flow through master, at the moment a direct push to prod should have the lobby update itself within 5 minutes when the cronjob picks up and would be simplest.
@ssoloff @ron-murhammer if we revert the incompatible update, then essentially revert some more changes, at least do the necessary work client-side to fix this, push that to master, would mods be able to then download a fixed latest version? Do I understand that correctly? If so that avoids the lobby downtime.
If that is the case we can pretty easily hopefully redo the changes that we popped. Next best does seem like a lobby downtime to deploy a patch.
@ssoloff unless you object, please take the lead on this. I assume you'll work out the code fix, if not please assign some owners/request help on what you need.
One thing we don't understand at this time is just how _severe_ this problem is. I don't know what @Cernelius did to trigger a database error. The root cause may have been restricted to a very narrow window of time. I'd like to wait to hear back from @Cernelius to see if he can reliably reproduce this exception.
If @Cernelius can repro it, then I'll just submit three PRs (revert #4123, fix this issue, reapply #4123) to get this fixed for the mods quickly. Otherwise, I'd like to go the second release branch route (since we're going to need to do that eventually). Ultimately, I'll probably make the lobby-specific change so we can once again remove the Postgres dependency in the client.
In case it wasn't clear from the description, this issue _can_ affect non-mods, as well. If, for example, a database error is raised during login, the ClassNotFoundException will be thrown in a non-mod user's client. In that case, the user is obviously not even connected to the lobby, so there's no unexpected disconnection to worry about, but, at a minimum, the error message is probably going to be confusing.
First of all, I suggest limiting giving info publicly about how users are banned (even tho that particular users doesn't appear to need any help), since anyways everyone in here can go read bunker.
I'm available to help, but I've no idea how am I supposed to try to reproduce this, unless you are volunteering to be banned for 2 weeks (guess we fix an appointment). It happened in that case only; my only guess was that the name was too long, as that was the longest one (but probably just a coincidence).
Also, since when I logged back it was gone, maybe you want to ask in bunker if another mod banned or booted it successfully (without reporting) or he just left the lobby on its own. I think probably it is the second case, but you never know.
@Cernelius
my only guess was that the name was too long, as that was the longest one (but probably just a coincidence)
Thank you. That is the root cause of the problem. Banning a username longer than 40 characters triggers the database error, which then triggers the bug described above.
A registered user shouldn't be able to create an account with a username longer than 40 characters, so the user you were attempting to ban must have been anonymous. As a temporary workaround, please ensure you do not attempt to ban such a user by username. Even if the exception didn't occur in this case, the username ban wouldn't work because of the disparity between the constraints on an anonymous username and the banned username stored in the database.
@triplea-game/devs: Now that we know the root cause, I'm leaning more towards a lobby-specific fix because, even if we restore the Postgres dependency in the client, the operation will still fail, and it is effectively impossible to ban usernames longer than 40 characters. I think we should, at a minimum enforce the same constraints on anonymous usernames that we enforce for registered usernames.
@ssoloff Agree. For now just enforce 40 char max on anonymous usernames and deploy the lobby fix. I'd like to remove anonymous usernames all together but that's a bit bigger of a change.
Sounds good. I'll look into a two PR fix this weekend: 1) to enforce a maximum length for anonymous usernames, and 2) to ensure SQLExceptions are not sent back to the client (which I think is easier than it sounds).
However, note that any lobby update is going to be blocked by #4087. I believe that issue implies we also need to get PreRelease running.
Without reading through it all
this error will occur if the banned user has already been disconnected I have witnessed it before
@ssoloff and all
it comes back to ghosted users
still shown in lobby
just my 2 cents
to explain fully
another mod reacts to hi your a jew and we hate you
etc or whatever other form of bs you want to examine
so a mod reacts and bans
meanwhile the slowest mod gets above
if first mods work goes through first
good example in the forum about our latest culprit
I hope it is related to the length of the user name
but I am doubtfull. Either way 10 digit idea is great
Double checking the p1 label, is the impact here so large that we would immediately deploy a fix once available?
is the impact here so large that we would immediately deploy a fix once available?
An _immediate_ deployment is probably not necessary because there is a workaround (posted in bunker). But as soon as infrastructure is straightened out, we should test on staging and deploy to production at the earliest opportunity.
Most helpful comment
One thing we don't understand at this time is just how _severe_ this problem is. I don't know what @Cernelius did to trigger a database error. The root cause may have been restricted to a very narrow window of time. I'd like to wait to hear back from @Cernelius to see if he can reliably reproduce this exception.
If @Cernelius can repro it, then I'll just submit three PRs (revert #4123, fix this issue, reapply #4123) to get this fixed for the mods quickly. Otherwise, I'd like to go the second release branch route (since we're going to need to do that eventually). Ultimately, I'll probably make the lobby-specific change so we can once again remove the Postgres dependency in the client.
In case it wasn't clear from the description, this issue _can_ affect non-mods, as well. If, for example, a database error is raised during login, the
ClassNotFoundExceptionwill be thrown in a non-mod user's client. In that case, the user is obviously not even connected to the lobby, so there's no unexpected disconnection to worry about, but, at a minimum, the error message is probably going to be confusing.