Openfoodnetwork: DB Backups not being rotated

Created on 20 Jul 2018  路  26Comments  路  Source: openfoodfoundation/openfoodnetwork

Description

One of our instances had an issue today when their disk space ran out. The culprit was daily database backups going back to February at ~90mb each.

Expected Behavior

Backups should be removed when they reach a certain age.

Actual Behavior

Backups are made every day and kept indefinitely.

Steps to Reproduce

  1. Have an OFN instance
  2. Wait
  3. Look in the /backups folder

Context

Disk space issue on a production server.

Severity

S2. There is a workaround, which is to use AWS backups to store the files remotely. Some instances do this and some don't.

Possible Fix

Create a new cron job in schedule.rb to periodically delete old files form the backups folder if they are present.

spike

Most helpful comment

Ok to put S3 and say it has to be done in a month, I can put it in my agenda and switch back to S2 in a month... but I find this a bit complicated to be honest, maybe just better to do it :-) @daniellemoorhead process master what do you think?

IMO this is too complicated, swapping severity levels and setting due dates. Also, it isn't a bug, it's tech debt, so does the s1/2/3 etc bug level apply to this?

Looking at the conversation over the last few hours I believe this isn't a massive problem/risk anymore, regardless of the AWS worries/thoughts.

So, I say we change this issue's title to be a spike to do this:

Finding out which script was creating the backups would be a useful investigation though, as Maikel pointed out. It could be that we actually don't use the script at all.

And we put it in a tech debt backlog somewhere (opportunity to create one for all the small things that need doing?) so it can be managed by the developer team and done when it needs to be done.

Thoughts @myriamboure @RachL @Matt-Yorkley @luisramos0 @mkllnk?

All 26 comments

馃憤
on offrance it will be something like this:
find '/home/offrance/apps/openfoodnetwork/backups' -mtime +44 -type f

and then
find '/home/offrance/apps/openfoodnetwork/backups' -mtime +44 -type f -delete

Hmmm... there's a bit of a sysadmin standardisation clash here, as that shouldn't be the path to the backups folder. Maybe we should deal with the OFFrance server migration before implementing this in the main repo?

Also pau suggests using the system utility logrotate for this.

Ok let's do that, @pacodelaluna we talk when you come back in the coming days to plan the server migration, and then we can move this one forward. Thank you all for your support and investigation!

Hi @myriamboure @Matt-Yorkley @pacodelaluna @luisramos0 where are we at with this bug? It's a sev 2...but it hasn't been updated or changed for 3 weeks...

This is ready to be picked up by a dev. The initial problem was manually fixed on french production, so this is basically tech-debt backlog for now.

Ok....so as a sev 2 it needs to be picked up by a dev as a priority.

@Matt-Yorkley @luisramos0 @mkllnk who's going to volunteer to take it on?

Or is it not a severity 2 bug, but tech debt that needs to be managed as in a tech debt backlog?

@Matt-Yorkley Are you talking about /home/openfoodnetwork/apps/openfoodnetwork/backups? Which code is creating the backups? Or was this a problem with a manual backup script that is now obsolete?

Now that I'm looking for it I can't tell what was creating those backups. Unless db2fog adds them to that folder by default if AWS is not set up?

@luisramos0 found the backups folder full, maybe he has some more clues..?

Yes I think it still needs to be done in priority as else we might have again some full disk issue with cut of service and notice from users... which we absolutely want to avoid... If you agree then S2 goes before Spree upgrade in term of priority given our processes ;-) So if you disagree with priority choice you can argue but for me it's still S2聽(but I'm biased ;-))

@Matt-Yorkley I dont know what was creating the backups. I just manually deleted backups from Jan and Feb, so I think we still have 4 or 5 months until it happens again.
My priority hint is S3 with a due date :-)

This makes me a bit afraid, I agree that we probably have one month at least, but who knows? We have more users coming onboard so could go quicker...? Ok to put S3 and say it has to be done in a month, I can put it in my agenda and switch back to S2 in a month... but I find this a bit complicated to be honest, maybe just better to do it :-) @daniellemoorhead process master what do you think?

yes, I understand. we should get this done.
I am afraid I cant help with getting this done.

Maybe @Matt-Yorkley woudl have some time to work on it as product import is moving forward and we are approaching v1 release?

@myriamboure The new French production server is using Amazon S3 now for backups, so that folder will not be filling up. UK uses Amazon as well, so it's not an issue for us, and I'm not sure Aus has a problem here either, so I think it might just be s5 tech debt now...

Finding out which script was creating the backups would be a useful investigation though, as Maikel pointed out. It could be that we actually don't use the script at all.

Hum... I didn't see we decided to use Amazon for backups as well @Matt-Yorkley , I thought it was only for images. I didn't see the discussion on this point on Slack new French prod. I know we have some users very reluctant that their data are stored on US servers where at anypoint their privacy can be broken. Also important to know as if there is an impact on privacy we need to tell them in our privacy policy that there is a risk of privacy breach through that weak point. @RachL I know you were insistant on this point, do you have arguments for not using Amazon for backups?
If we stick with it and accept the privacy breach potential then I agree it's lower priority... but let us check that first!

Well if Amazon is used for backups, it means that our users' personnal data ends up on it, and it might be good to let them know and explain our choice of doing so.
Do we know which AWS data center is used? I know they have quite recently opened data centers in France, and they are for sure EU located data centers. Indeed, if the the server is located in the US then Patriot Act applies, which is even worse than just having Amazon being able to play with the data (I know in theory they can't...but well :) ).
So maybe let's first collect info on what we have today, try to reach our user to explain pros and cons before changing anything on the dev side? I mean, French companies (and especially retailers) hate to have their data located on their biggest opponant's server, but maybe our current users don't mind as long as they know it's under (at least) EU law? (or maybe they don't mind at all but I'm guessing using Eu data centers couldn't hurt :) ).
On a very personal point of view I find it a bit sad that we manage everything with alternative solutions to end up having our data on Amazon services... but I also do know that in terms of cloud services they are offering something really interesting. Or at least that's what I've understood so far while speaking with sys admins.

Ok to put S3 and say it has to be done in a month, I can put it in my agenda and switch back to S2 in a month... but I find this a bit complicated to be honest, maybe just better to do it :-) @daniellemoorhead process master what do you think?

IMO this is too complicated, swapping severity levels and setting due dates. Also, it isn't a bug, it's tech debt, so does the s1/2/3 etc bug level apply to this?

Looking at the conversation over the last few hours I believe this isn't a massive problem/risk anymore, regardless of the AWS worries/thoughts.

So, I say we change this issue's title to be a spike to do this:

Finding out which script was creating the backups would be a useful investigation though, as Maikel pointed out. It could be that we actually don't use the script at all.

And we put it in a tech debt backlog somewhere (opportunity to create one for all the small things that need doing?) so it can be managed by the developer team and done when it needs to be done.

Thoughts @myriamboure @RachL @Matt-Yorkley @luisramos0 @mkllnk?

Yep, sounds good, it seems there is no risk anymore of disk being full if backups are not done on disk... (that's what I understood from @Matt-Yorkley comment) I change the priority to s4 until we know if there is a risk of privacy breach for our users depending on server location, would be good to have the info as we need it anyway for our privacy policy document (you do as well for UK btw @Matt-Yorkley ;-)). But good if someone want to open and curate some tech debt backlog...

Hey @myriamboure my point was that it doesn't deserve a bug severity on it - it's not a bug, it's tech debt. We need to figure out how to quantify the severity of tech debt...but I don't believe that is with the same label that bugs use.

So, I've removed the bug label for now. And I encourage the developers to come together to discuss from a process perspective how they want to manage tech debt ongoing. Ping @mkllnk @sauloperez @luisramos0 @Matt-Yorkley @kristinalim @HugsDaniel who can take on this challenge :)

Also:

So, I say we change this issue's title to be a spike to do this:

Finding out which script was creating the backups would be a useful investigation though, as Maikel pointed out. It could be that we actually don't use the script at all.

And we put it in a tech debt backlog somewhere (opportunity to create one for all the small things that need doing?) so it can be managed by the developer team and done when it needs to be done.

Is this the right approach for this issue?

I am not sure this is up to date, but it's sysadmin so moving to the sys-admin baclog column.

I believe this is old. AFAIK all instances are now storing them in S3.

I would say most instances are using S3 but probably not all. This problem could also arise with new instances. And I don't think we are in a position to make S3 mandatory, especially because Amazon is not value aligned and we would like to find an alternative. Let's make OFN frictionless.

On this topic, I was wondering if we should migrate to the backup gem:
https://github.com/backup/backup
It's more up-to-date and much more popular. But it's also in maintenance mode now. I haven't seen any other new and coming backup system that is better.

Was this page helpful?
0 / 5 - 0 ratings