Triplea: linode scheduled reboot

Created on 15 Jan 2018  路  40Comments  路  Source: triplea-game/triplea

From @prastle
The following Linode(s) have now been assigned a maintenance window in which a reboot will occur. Please note that these times can change as a result of the actions mentioned above. At this time your reboot schedule in UTC is as follows:

2018-01-17 5:00:00 AM UTC - TripleA-Forum
2018-01-17 5:00:00 AM UTC - botserver60_GA_USA

Most helpful comment

(In case that wasn't obvious):
This means the forum is now up and running again, I took the oppurtunity and updated the forums software when I was already at it...

All 40 comments

P1 as restarts are going to kill games, and cause general confusion (which will be more work for us unless we can get in front of this).

For bot restarts, I think we can probably easily shut them down or recreate them as soon as all active games are drained. (so shut down bots on that server until there are no games left).

For forum reboot, a lobby message to notify players + forum thread post will be pretty sufficient I think. @RoiEXLab or @prastle , let's keep our eyes open for a lobby restart.

@RoiEXLab , is the dice server hosted on the forum server, or lobby server?

@DanVanAtta both dice servers, production and staging are hosted on the lobby server

yup

So maybe we fire the new marti and loby on a diff?
That they have already upgraded? Or fire a brand new server?
or just let it be down? @DanVanAtta @RoiEXLab

either way i cant bounce this one ;)

and yes dan i have been bouncing 2 as they have been doing it all good so far

Does bouncing the server solve the problem? It sounds like there will be a restart during the window whether we want it or not.

cor
except if we restarted now ourselves we can live tell them but yes either way it needs a restart

for the lobby forum and marti

@prastle It's probably not worth creating a temporary server for this downtime.
If I understand the people from linode correctly, the downtime will just be like a slow server restart, so the lobby won't be down longer than 10 Minutes

cor

last one was 2 min but we do need a message in lobby yaml

I guerss my real ? is this the excuse you all needed to force an update :) ?

Opps did I say that outloud?
;)

@RoiEXLab @prastle sounds like a lobby message with times of the downtime. If we can, we do have option to remove bots from the bot servers and let those be in a shutdown state.

Yes I have been doing that for the last week dan

through last 3

thus 2 bot servers missing right now when they return ii will remove ga

@prastle cool, you're on top of it then, sounds like we only need to worry about forum for right now. Do we have a restart window for the Lobby sever yet? @RoiEXLab or @prastle

no
we do not
What I am bluntly saying is you both have an unexpected window of opportunity here. To force an update! We have been using 8304 or newer since released in lobby for new tww. Also i just updated to yesterdays. I personally think you should make a forced update but that is your call you two are the brains :)

@DanVanAtta I tend to agree with @prastle that we could indeed use it as an opportunity to update software. However if the lobby server is getting restarted at a similar time, there's no way I'll be awake then 馃槄
@ssoloff If you have the time I'd like you or anyone else who has access to the server to update the forum software during that period ^^
I can tell you about the steps and all issues I had with it so far, and if everything goes wrong, I'll be awake shortly after to fix the remaining issues :P

While it is an opportunity, I would not make the updates at the exact same time. One good rule in operations is not to change two things at once. Let's say in this case if the linode team decides to roll back or something very weird. What if things do not work right when the server is back up?

I would instead suggest we do a software updated shortly before or after the window, but not quite during. This way we can verify that the lobby is not broken after each individual update, instead of leap of faith that all updates would work.

@RoiEXLab @prastle again, do we have a time window for the lobby server? I see only the forum server "2018-01-17 5:00:00 AM UTC - TripleA-Forum"

Scratch last question, looks like the answer is TBD still.

cor
np

Makes sense about before or after guess it was a bad idea. It was just a thought that since for the first time it has to reboot
We could take advantage of this

@ron-murhammer would you mind adding @prastle to the triplea-game org so we can assign issues to him?

I think we can assign this issue to @prastle , @RoiEXLab and potentially myself for follow up. This should no longer be a p1 assuming the following plan is agreed:

  • wait for notification on when lobby will be restarted and post an info message to lobby. @RoiEXLab, would you mind doing the lobby messaging? I can help otherwise. @prastle I think could help here notify players in-game.

We already have a forum post about the forum server: https://forums.triplea-game.org/topic/499/the-forum-will-be-down

Otherwise @prastle is on top of the bot server, I don't think there would be any further actions needed.

Thanks but I don't feel comfortable starting the forum or marti/lobby by myself first time. If @RoiEXLab wants to give me a crash course ill try but I would def prefer if others did it.

Bots np been doing it for months

Also @ I added to the bots @General-Dru-Zod

He has been doing iit for a while as well.

If you have the time I'd like you or anyone else who has access to the server to update the forum software during that period

@RoiEXLab Unfortunately, 0500 UTC is about 30 mins before I usually shut down for the night (insomnia not withstanding :smile:). I'm not sure it's a good idea for me to try to follow a production upgrade process for the first time at the tail end of my day.

@ssoloff No Problem, we can do it another time then.

  • 2018-01-17 6:00:00 PM UTC - botserver30_SG

Just for your information:

At the time of writing this (8.30 UTC+0) the forum is down.

Ok, it seems every linode has a fixed reboot time now.
The lobby server will be restarted in 14 hours it seems like.
The last server (bot 90 TX) will be restarted in 19 hours.

About the forum: I haven't noticed any downtime this morning a couple hours after @panther2 posted his information, so I thought this issue had been resolved.
It turns out the forum is currently only available via IPv6, not IPv4 I contacted the linode support about this and I'm currently waiting for a response.

Good news everyone:
There was an issue with the forum server related to a past unclean deinstallation of the sendmail tool which left a couple of script files in /etc/network/if-up.d/ that called non-existent files which prevented the networking configuration to properly start.
The guy from linode made me aware of this issue by telling me what commands to run ^^

(In case that wasn't obvious):
This means the forum is now up and running again, I took the oppurtunity and updated the forums software when I was already at it...

Hmm for some reason sudo ifdown eth0 makes the server crash
Will investigate later...
A reboot fixes this issue

Have i told you you ROCK! lately? 馃憤

All our servers have been upgraded so far, it seems there are no longer any problems.
Closing this issue

Was this page helpful?
0 / 5 - 0 ratings