Trinitycore: [3.3.5] Client freezes in connected message

Created on 22 Feb 2017 · 56Comments · Source: TrinityCore/TrinityCore

Description:
After the server starts with many connections sometimes the client freezes in connected message. It occurs after recent commits.

Current behaviour:
The client freezes in connected message.

Expected behaviour:
The client must enter the world.

Steps to reproduce the problem:

Try to sign in a server with concurrent conections.

Branch(es): 3.3.5

TC rev. hash/commit: b4b031b

TDB version: TDB_full_world_335.62_2016_10_17 + updates

Operating system: Windows

Branch-3.3.5a Comp-Core

Source

Noryad

Most helpful comment

This commit is the problem source: 3ddcf40037bb032c429c9ad6f0817f0796fda535
std::numeric_limits<time_t>::max() causes an overflow in linked respawn (Linked spawns are set to boss + 1-59 secs), this causes a database loop in the respawn function.

The easy fix is to just clear the respawn table with entries lower than 60 sec and replace std::numeric_limits<time_t>::max() with time(NULL) + YEAR, year should be enough

Cheers

A-Metaphysical-Drama on 3 Apr 2017

👍5

All 56 comments

That's existed for a very long time.

Treeston on 22 Feb 2017

It has never happened to me and the number of users is the same, only the code has changed.

Noryad on 22 Feb 2017

How recent? Can you pin point a commit where there were less client connection issues?

CDawg on 22 Feb 2017

Feb 20 and 21, 2017
In my case happens after about 50-60 players online.

Noryad on 22 Feb 2017

with https://github.com/jackpoz/BotFarm I can usually login with 1000 users at the same time

jackpoz on 22 Feb 2017

@jackpoz It has no connection with number of connections until the connections are "safe". It should be some unhandled exception or some unclosed socket.

Palabola on 22 Feb 2017

Last night it happened on f96f1ce, my fast solution was to return to 1beb2e5. I do not say that the number of online users is a pattern, just that it does not happen with the first ones logged. For now is the information i can provide. I can do more tests today at night.

Noryad on 22 Feb 2017

I suspect it's a client issue, really. This happened with some regularity on retail servers too, back in Wrath.

Treeston on 22 Feb 2017

I can confirm that, not sure how long ago, but it wasn't happening before (discovered that 2 days ago when i tried to login from same PC, 6 clients).

xjose93 on 23 Feb 2017

The client must enter the world

does the connected message appear before or after the character selection screen ?

jackpoz on 24 Feb 2017

I reproduced the bug 1 time, the client got stuck between login and character selection (no list of characters)

Aokromes on 24 Feb 2017

does the connected message appear before or after the character selection screen ?

Between login and character selection (no list of characters).
Exactly as @Aokromes says, when the server reaches "_that state_" if someone logs out or logs in can not get the list of characters. Yesterday it happened again on d939018 and this time i did not go back to an earlier version (i confirm that works ok until 1beb2e5), i disconnected the network cable for a few seconds and reconnected it, problem solved, obviously not a good practice.

Noryad on 24 Feb 2017

I saw this bug in my old server before 2015 (server with 1.5k players)
it's not a "new bug"

Keader on 24 Feb 2017

In my case at least since September 2016 the client never got stuck until now. The problem is that it happens sporadically and i have not identified a pattern.

Noryad on 24 Feb 2017

its happen some times for me to, but its not new bug, see this bug from dec 2016 when i back to WoW )

Viste on 24 Feb 2017

I've had this happen on my single-player server with just me logging in.

In my case it's usually when I haven't had the server running for days. When the client first starts up, AHBot has to expire a crap ton of auctions (I had mine set to 500,000 for testing) which makes the core unresponsive to login attempts.

MrSmite on 26 Feb 2017

It's doesn't seem client problem. After updating to the last version: TrinityCore rev. 278353673639 2017-02-21 21:02:12 +0500 (3.3.5 branch) (Unix, Release, Static)
This bug occured. After downgrading server worked like before. And players can join..

Adizbek on 26 Feb 2017

Can all of you at least try to be helpful? I would like all reports to provide two commit hashes, the one that worked and the one that didn't work

Shauren on 26 Feb 2017

Try to downgrade to ae9d01a3245c59a8a8d50516a79b79250337450d if it still freezes try 4eae29d421e1d7a28aaa50d401cbbf09c50bd476

Aokromes on 26 Feb 2017

If it's a random occurrence problem, any sort of "I downgraded to X and it worked again" report is fairly pointless.

Treeston on 26 Feb 2017

👍1

When this happen some errors appears on server log:
```
Received unexpected opcode [CMSG_CANCEL_TRADE 0x011C (284)] Status: STATUS_LOGGEDIN_OR_RECENTLY_LOGGOUT Reason: the player has not logged in yet and not recently logout from [Player: Account: 104225]
Received unexpected opcode [CMSG_CANCEL_TRADE 0x011C (284)] Status: STATUS_LOGGEDIN_OR_RECENTLY_LOGGOUT Reason: the player has not logged in yet and not recently logout from [Player: Account: 41985]
Received unexpected opcode [CMSG_CANCEL_TRADE 0x011C (284)] Status: STATUS_LOGGEDIN_OR_RECENTLY_LOGGOUT Reason: the player has not logged in yet and not recently logout from [Player: Account: 105562]
Received unexpected opcode [CMSG_CANCEL_TRADE 0x011C (284)] Status: STATUS_LOGGEDIN_OR_RECENTLY_LOGGOUT Reason: the player has not logged in yet and not recently logout from [Player: Account: 96033]

Killyana on 27 Feb 2017

again on f612b1c :(

Moorgoth on 3 Mar 2017

https://github.com/TrinityCore/TrinityCore/issues/19182#issuecomment-282546946

Aokromes on 3 Mar 2017

Just today i got 2 freezes with 90 players, it doesn't freeze at the startup, but after the server has been online for a while.

Rev: 340ce38e01fed2e523784ee7a80f9fa25782d447

Can anyone share the latest Rev where this issue does not happen?

Demonid on 11 Mar 2017

https://github.com/TrinityCore/TrinityCore/issues/19182#issuecomment-282546946

Aokromes on 11 Mar 2017

👍1

So here what i know so far...

This was the newest rev i tried that STILL has the issue: 4eae29d421e1d7a28aaa50d401cbbf09c50bd476

This rev doesn't have the issue d42faefe9a8c1a3f805e34cf0985dcab109ff8f5, been using it for 2 days now without issues.

After looking between all the commints between Commit A and Commit B, i think the issue is related to https://github.com/TrinityCore/TrinityCore/commit/4c27203c8f36dd2a5df0a4ae69fbdc4c9140b29d or the fixup commits after it.

Right now i cant test anymore for a couple of days to give players a sense of stability, but i'll try updating all the way to f57132b795a097a2c4c863a8153b0c1be5e008c0 (one commit before the one i think its the one causing the issue) and report feedback over here.

@Shauren

Demonid on 14 Mar 2017

Could you try reverting 684a5fd3f1895703a52cff7e7f762883c74c5aba?

ariel- on 14 Mar 2017

@ariel- this freez happen rarely, you can run the server over 7 days of up-time without any problem.

Killyana on 14 Mar 2017

@Noryad I'm confused, reading through all the commits and doing tons of commit testing.
You originally said that you went to https://github.com/TrinityCore/TrinityCore/commit/1beb2e5fd6e85332173b1f3e414d5b385c3022fb to revert back because of the incident of freezing.
However, later you say that it works UNTIL https://github.com/TrinityCore/TrinityCore/commit/1beb2e5fd6e85332173b1f3e414d5b385c3022fb
is that correct?

CDawg on 15 Mar 2017

I detect it after 1beb2e5

Noryad on 15 Mar 2017

@Killyana I had it happen 2 days in arrow like 4 - 5 times each day once reaching +70 players, before reaching 70 i had it working without issues for 5 days.

@Noryad What rev are you using right now and how many players do you have on average ?

@ariel- Tried but having issues installing Boost 1.63 on Debian

Demonid on 17 Mar 2017

Out of curiosity, did you try tweaking the player save interval? Maybe there's a bug with handling too many players.

MrSmite on 18 Mar 2017

@ariel- : Sadly reverting just this commit would not be enough, as this commit was fixing some compilation errors :(.

joshwhedon on 20 Mar 2017

@Tonghost Can you take a look at this? It looks like your commit https://github.com/TrinityCore/TrinityCore/commit/684a5fd3f1895703a52cff7e7f762883c74c5aba is involved.

Killyana on 20 Mar 2017

So when this issue starts, you have to wait a long time before the authserv responds to be connected, it could take more than 40 secs. But after some time, it will not respond at all. It looks like something is making it lag.

Killyana on 24 Mar 2017

But after some time, it will not respond at all. It looks like something is making it lag.

Probably a long shot but this is what happens to me if I have too many items from AHBot expiring at the same time and I haven't started the server in a while. Has anything changed in the bot?

MrSmite on 24 Mar 2017

@MrSmite I didn't had AHBot enabled, and the Save Interval was 90 seconds (Default) for 70-90 players on average.

The DB for that server was brand new so there wasn't much to load.

Demonid on 24 Mar 2017

hi my server update to 8e1e081d6cd3826a46e96cac211d5c12f5d8536b two days ago but after some minutes players cant log in, restarting authserver not work, i have to restart worldserver to fix it and it happens very soon after each restart

before i have 91201f11f805a7bd21291717af8008c6b5e855fe and dont have login prblems

this is really critical bug, please fix soon if you can

BrayamValero on 2 Apr 2017

The easy fix is to just clear the respawn table with entries lower than 60 sec and replace std::numeric_limits<time_t>::max() with time(NULL) + YEAR, year should be enough

Cheers

A-Metaphysical-Drama on 3 Apr 2017

👍5

Expiring auctions from AHBot also have a noticeable impact on login delay. I'll cap the amount of item deletions we process at the same time (like we do with session updates)

Oh noes, the deletes are executed as stand-alone statements, instead of sending the whole batch wrapped on a transaction! 🤢

EDIT: after transaction-wrapping, cap wasn't needed :P

ariel- on 4 Apr 2017

Expiring auctions from AHBot also have a noticeable impact on login delay

Indeed, as I mentioned earlier (comment). The OP said however that he didn't have AHBot enabled. While I like your fix for wrapping deletions in a transaction, it may not help this exact issue.

MrSmite on 4 Apr 2017

That's why I made the commit a reference to this issue, instead of closing. I'm fully aware of OP not having AHBot active :)

ariel- on 4 Apr 2017

So, we have encountered this issue for a few weeks too, following us disabling the graphana service. However, what we forgot to do is disabling the posting of the metrics, which resulted in a 404 not found. After disabling the metrics, we did not encounter that issue anymore. I am not saying it is the cause of your particular issue, but it was the cause of ours, with the same symptoms.

joshwhedon on 4 Apr 2017

@joshwhedon Just to know, what rev were you using and how many players did you had on average?

Demonid on 4 Apr 2017

I can't check the rev right now, but 3.3.5 no older than 1 month. 350 players on peak hours, 200 on average through the day.
After a day or two, players could not log in anymore. And if the world server was restarted, there was always a huge rollback (as if no db save happened during the time where players could not log in).
We haven't had that issue since a week and half or so (after we disabled the metrics).

joshwhedon on 4 Apr 2017

@joshwhedon Yeah i can confirm that, after the login freezes, the only way to fix it is by doing a worldserver restart, and the rollback is huge, our rollbacks were between 10 and 20 minutes since i was restarting the server as soon as a player reported me the login issue.

The weird part is that i just check my configs and didn't had any DB Stressful config enable, no metrics, no AHBot, so i guess my issue is related to what @A-Metaphysical-Drama said.

Thanks again for the info ;)

Demonid on 4 Apr 2017

This commit is the problem source: 3ddcf40
std::numeric_limits::max() causes an overflow in linked respawn (Linked spawns are set to boss + 1-59 secs), this causes a database loop in the respawn function.

The easy fix is to just clear the respawn table with entries lower than 60 sec and replace std::numeric_limits::max() with time(NULL) + YEAR, year should be enough

Cheers

Indeed this was the cause, thanks for taking your time :)

ariel- on 12 Apr 2017

Very good catch, :+1:

Treeston on 12 Apr 2017

@ariel-
``` sql
SQL(p): REPLACE INTO creature_respawn (guid, respawnTime, mapId, instanceId) VALUES (202796, 9223372036854775807, 724, 18)

Re3os on 19 Apr 2017

O_o
9.223.372.036.854.775.807

Aokromes on 19 Apr 2017

We all will die, our Sun will die, everything will turn to dust, but this creature will be still not respawned.

offl on 19 Apr 2017

hell, even universe will die and still not respawned.

Aokromes on 19 Apr 2017

If the wiki info is correct, (respawntime The time when the creature should be respawned in Unix time.),
https://www.epochconverter.com/

Assuming that this timestamp is in microseconds (1/1,000,000 second):
GMT: Fri, 11 Apr 2262 23:47:16 GMT
Your time zone: 12/04/2262, 01:47:16 GMT+2:00 DST

(edit) - roughly 245 years into the future....

BTW, cannot reproduce on a straight core (world DB replaced):

TrinityCore rev. 73ec3a1d3b34 2017-04-19 01:14:14 +0100 (3.3.5 branch) (Win64, Release, Static) (worldserver-daemon) ready...
TrinityCore rev. 73ec3a1d3b34 2017-04-19 01:14:14 +0100 (3.3.5 branch) (Win64, Release, Static) (authserver)
<Ctrl-C> to stop.