Description: Due to https://github.com/TrinityCore/TrinityCore/issues/19999 being closed (logically xD) I decided to start a new Issue as already suggested by @nawuko in that issue. You guys have to say me what to do and I can report you everything you want to without posting millions of comments in this issue. You already sent me some commands in IRC some weeks ago to see the state of a port (ntp command or something like that IIRC) but I forgot about it, if you want just send me the command and I can post the results.
As pointed out in that issue, it is related to https://github.com/TrinityCore/TrinityCore/commit/7874bee7bfb70e0e039f91173cff212e9572de09, I have already reverted it and it worked correctly.
I don't have SOAP enabled (default worldserver.conf with just logs, data folder paths and DB info changed).
Have to note that im on a test server which means im the only player in the server so no other players connecting at the same time as me, etc.
Current behaviour: Same as the issue closed. In my case I have to wait a lot less maybe due to having a better dedicated server. If worldserver is restarted ingame by using .worldserver restart and then you open it with an autorestarter or just manually (with ./worldserver) without waiting for the socket to be available (in my case ~30 seconds), the following error shows up and you have to try again.
StartNetwork failed to bind socket acceptor
Failed to initialize network
Couldn't bind to 0.0.0.0.8085
Just a curious note that might be useful, in my case if I close the worldserver with ctrl+C, the port gets available instantly, that means no wait time.
Expected behaviour: The port should be avaiable after closing the worldserver.
Steps to reproduce the problem:
Branch(es): 3.3.5
TC rev. hash/commit: https://github.com/TrinityCore/TrinityCore/commit/348b02155bcadb1cda78d1bcca222c37e170ab5a
TDB version: TDB 3.3.5 63
Operating system: Ubuntu 17.04 (also happened on Debian 7)
$TC - SUP
I also have this problem, but when I turn on the soap, the problem becomes fundamental
worldserver
on worldserver.cfg enable soap 7878
if "worldserver" restarted and stopped , To reboot the problem
.server restart
Not running Again
Error:
StartNetwork failed to bind socket acceptor
Failed to initialize network
Couldn't bind to 0.0.0.0:7878
terminate called without an active exception
Segmentation fault (core dumped)
Freeze worldserver.
To solve the problem, only the restart of the Linux system! ! !
This update is problematic for TrinityCore
7874bee
Maybe a gentleman helped. The problem was solved
it's not closed, it's locked to users. don't open a new ticket.
Actually, I want this issue to be open (better description) and not locked to users (but @igrc please stop posting)
No need to restart Linux helped me to restart any network daemon, such as ssh
I dont need to restart the network but I have to try a few times to start the server so it works
Not the network, and for example, ssh server
I confirm this problem appear when serveur crash ou when you ctrl + c your worldserveur while there are players connected. You have to reboot the server a second time to boot properly.
Confirm,
When you restart or crash the server, the server fails when startup X time depending on the players who have the server (I think)
2017-08-31_10:42:28 World initialized in 0 minutes 29 seconds
2017-08-31_10:42:28 StartNetwork failed to bind socket acceptor
2017-08-31_10:42:28 Failed to initialize network
This occurs after this commit: https://github.com/TrinityCore/TrinityCore/commit/7874bee7bfb70e0e039f91173cff212e9572de09
And if you have Metrics enabled you get a crash when the server startup after crash o restart (if you disable metrics not crash, but occurs the above explained)
(I think the crash only happens if the grafana are installed in another machine, not sure)
Crashlog: https://gist.github.com/Jildor/a0c1466109addd4311e1e3639f3bdda6
@Shauren Can you take a look here?
Try setting your Wired MTU to something really high (like >8192) - I know there was a bug for some Atheros cards (especially AR8161). Works for me :)
@Szone Thats a hell high lvl language you used there, can you translate that to normal human being for idiots like me, please?
@Raydor , he's talking about Maximum Transmission Unit. If your device supports large packets you can change your MTU value to 9000 bytes (by default it's 1500 bytes).
i can confirm this bug/problem on Ubuntu server 16.04 LTS.
When it hangs it doesnt come up again,
To fix it i need to take down my restarter and worldserver.. wait some time.
then start the restarter and then it works..
seems temp. sollution would be a function int he restarter that it waits longer before it restarts the worldserver..
havent looked into making that happen..
Valgrind reports lots of these when server stops (ctrl+C on console or .server restart ingame) that MIGHT (I don't know if this is even related to this issue, just posting so if someone with knowledge of this can confirm) be related to incorrect thread start that causes problems when closing them with what I said:
==3610== 4 bytes in 1 blocks are still reachable in loss record 1 of 54
==3610== at 0x4DAAB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3610== by 0x4FEE2E8: xmalloc (in /lib/x86_64-linux-gnu/libreadline.so.7.0)
==3610== by 0x4FCAB83: rl_set_prompt (in /lib/x86_64-linux-gnu/libreadline.so.7.0)
==3610== by 0x4FCBF81: readline (in /lib/x86_64-linux-gnu/libreadline.so.7.0)
==3610== by 0x1BE8477: CliThread() (CliRunnable.cpp:153)
==3610== by 0x1BE435A: void std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) (functional:1391)
==3610== by 0x1BE35DB: std::_Bind_simple<void (*())()>::operator()() (functional:1380)
==3610== by 0x1BE2373: std::thread::_State_impl<std::_Bind_simple<void (*())()> >::_M_run() (thread:197)
==3610== by 0x6F3383E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==3610== by 0x64356D9: start_thread (pthread_create.c:456)
==3610== by 0x7828D7E: clone (clone.S:105)
I don't know much about valgrind, just googled something on how to use it and just ran it with some flags (valgrind --log-file="valgrindlog" --leak-check=full --show-leak-kinds=all -v --num-callers=20 --tool=memcheck ./worldserver) so if what I posted is useless or means nothing at all, at least I tried :P.
Off topic: There are other reports by valgrind in instance_sunwell_plateau, boss_gothik, TCSoapThread and SpellEffectInfo::CalcValue but again, due to I don't know if those should be there or not, I'll wait before reporting them.
Try setting your Wired MTU to something really high (like >8192) - I know there was a bug for some Atheros cards (especially AR8161). Works for me :)
Apparently it's also works for me, no error for moment
PS : I have a network card "Intel"
Hitting dem driver bugs, heh.
Looks like the same issue is hit here. The best solution proposed there is about setting SO_REUSEADDR socket option. Not quite sure where is has to be put (except SOAP socket handling), perhaps in the Socket(tcp::socket&& socket) ctor.
@Olion17 SO_REUSEADDR is a bad choice here - thats exactly the flag which makes worldserver silently fail to accept incoming connections when port is already in use by another application
That flag doesnt fix the problem anyway, I tried it. (at least for me)
today this happen to me too twice.
2017-09-27_15:04:04 INFO [server.worldserver] World initialized in 0 minutes 32 seconds
2017-09-27_15:04:04 ERROR [network] StartNetwork failed to bind socket acceptor
2017-09-27_15:04:04 ERROR [server.worldserver] Failed to initialize network
now i have to start and restart every time.
Any ideas? i try with MTU up to 9000.. even put waiting 1min the restarter but dont work. also before restart fuser -k PORT/tcp
For the moment i revert commit 7874bee7bfb70e0e039f91173cff212e9572de09
Confirmed
confirm
any news?
Or any way to kill worldserver if "Failed to initialize network" ?
In case of that error worldserver should shut down by itself @Undergarun
It should with return 1; But not happens because daemon keeps awaiting for some thread not closed. So i added World::StopNow(ERROR_EXIT_CODE); before return to fix that case. @Shauren
There are some kind of problem with sockets.
"Prevented sending of [WHAT_EVER_OPCODE] to non existent socket 1 to [Player: Foo GUID Full: XXX Type: Player Entry: 0 Low: X, Account: X]"
Happens with socket 1 and 2 and with whatever opcode. For a unknown reason, of course because i am not good in networking. Sockets are not properly closed when player disconnects.
Confirmed with Ubuntu 17.10
increasing the MTU is not working
Confirmed in Ubuntu 16.04.
I don't see this problem through GDB.
confirmed in ubuntu 18.04
I'm getting the same error using Ubuntu 17.10
Same in Debian 9
Summon @Shauren
@Shauren
we should wait 2 or 3 , 4 minute for start again worldserver and authserver...
This is bad
@Shauren
we should wait 2 or 3 , 4 minute for start again worldserver and authserver...
This is bad
Or just kill all processes that bounded to 8085, 3724 ( eg: fuser -k 8085/tcp )
@tje3d this work?
@n4ndo comment: https://github.com/TrinityCore/TrinityCore/issues/20032#issuecomment-332889858
seems doesn't work.
@tje3d this work?
@n4ndo comment: #20032 (comment)
seems doesn't work.
It takes a few seconds to drop all connections, its enough for me
But I have to wait 2 to 3 minutes!
Center OS & windows Server 2016 Consoler
This was not the case before, and the server fast started to work quickly!
TDB 335.62 / TDB 335.63
@Shauren what was the issue again with SO_REUSEADDR being bad ?
Would SO_LINGER help in scenarios of normal shutdown/restart (no crashes) ?
If anyone would like to test https://github.com/TrinityCore/TrinityCore/pull/22935 and see if it makes any difference, it would be really nice :)