Openfoodnetwork: DB Backups not being rotated

Created on 20 Jul 2018 · 26Comments · Source: openfoodfoundation/openfoodnetwork

Description

One of our instances had an issue today when their disk space ran out. The culprit was daily database backups going back to February at ~90mb each.

Expected Behavior

Backups should be removed when they reach a certain age.

Actual Behavior

Backups are made every day and kept indefinitely.

Steps to Reproduce

Have an OFN instance
Wait
Look in the /backups folder

Context

Disk space issue on a production server.

Severity

S2. There is a workaround, which is to use AWS backups to store the files remotely. Some instances do this and some don't.

Possible Fix

Create a new cron job in schedule.rb to periodically delete old files form the backups folder if they are present.

spike

Source

Matt-Yorkley

Most helpful comment

Ok to put S3 and say it has to be done in a month, I can put it in my agenda and switch back to S2 in a month... but I find this a bit complicated to be honest, maybe just better to do it :-) @daniellemoorhead process master what do you think?

IMO this is too complicated, swapping severity levels and setting due dates. Also, it isn't a bug, it's tech debt, so does the s1/2/3 etc bug level apply to this?

Looking at the conversation over the last few hours I believe this isn't a massive problem/risk anymore, regardless of the AWS worries/thoughts.

So, I say we change this issue's title to be a spike to do this:

Finding out which script was creating the backups would be a useful investigation though, as Maikel pointed out. It could be that we actually don't use the script at all.

And we put it in a tech debt backlog somewhere (opportunity to create one for all the small things that need doing?) so it can be managed by the developer team and done when it needs to be done.

Thoughts @myriamboure @RachL @Matt-Yorkley @luisramos0 @mkllnk?

daniellemoorhead on 22 Aug 2018

👍2

All 26 comments

👍
on offrance it will be something like this:
find '/home/offrance/apps/openfoodnetwork/backups' -mtime +44 -type f

and then
find '/home/offrance/apps/openfoodnetwork/backups' -mtime +44 -type f -delete

luisramos0 on 20 Jul 2018

Hmmm... there's a bit of a sysadmin standardisation clash here, as that shouldn't be the path to the backups folder. Maybe we should deal with the OFFrance server migration before implementing this in the main repo?

Matt-Yorkley on 20 Jul 2018

👍1

Also pau suggests using the system utility logrotate for this.

Matt-Yorkley on 20 Jul 2018

👍1

Ok let's do that, @pacodelaluna we talk when you come back in the coming days to plan the server migration, and then we can move this one forward. Thank you all for your support and investigation!

myriamboure on 23 Jul 2018

Hi @myriamboure @Matt-Yorkley @pacodelaluna @luisramos0 where are we at with this bug? It's a sev 2...but it hasn't been updated or changed for 3 weeks...

daniellemoorhead on 17 Aug 2018

This is ready to be picked up by a dev. The initial problem was manually fixed on french production, so this is basically tech-debt backlog for now.

Matt-Yorkley on 17 Aug 2018

Ok....so as a sev 2 it needs to be picked up by a dev as a priority.

@Matt-Yorkley @luisramos0 @mkllnk who's going to volunteer to take it on?

daniellemoorhead on 20 Aug 2018

Or is it not a severity 2 bug, but tech debt that needs to be managed as in a tech debt backlog?

daniellemoorhead on 20 Aug 2018

@Matt-Yorkley Are you talking about /home/openfoodnetwork/apps/openfoodnetwork/backups? Which code is creating the backups? Or was this a problem with a manual backup script that is now obsolete?

mkllnk on 20 Aug 2018

Now that I'm looking for it I can't tell what was creating those backups. Unless db2fog adds them to that folder by default if AWS is not set up?

@luisramos0 found the backups folder full, maybe he has some more clues..?

Matt-Yorkley on 21 Aug 2018

Yes I think it still needs to be done in priority as else we might have again some full disk issue with cut of service and notice from users... which we absolutely want to avoid... If you agree then S2 goes before Spree upgrade in term of priority given our processes ;-) So if you disagree with priority choice you can argue but for me it's still S2 (but I'm biased ;-))

myriamboure on 21 Aug 2018

@Matt-Yorkley I dont know what was creating the backups. I just manually deleted backups from Jan and Feb, so I think we still have 4 or 5 months until it happens again.
My priority hint is S3 with a due date :-)

luisramos0 on 21 Aug 2018

This makes me a bit afraid, I agree that we probably have one month at least, but who knows? We have more users coming onboard so could go quicker...? Ok to put S3 and say it has to be done in a month, I can put it in my agenda and switch back to S2 in a month... but I find this a bit complicated to be honest, maybe just better to do it :-) @daniellemoorhead process master what do you think?

myriamboure on 21 Aug 2018

yes, I understand. we should get this done.
I am afraid I cant help with getting this done.

luisramos0 on 21 Aug 2018

Maybe @Matt-Yorkley woudl have some time to work on it as product import is moving forward and we are approaching v1 release?

myriamboure on 21 Aug 2018

@myriamboure The new French production server is using Amazon S3 now for backups, so that folder will not be filling up. UK uses Amazon as well, so it's not an issue for us, and I'm not sure Aus has a problem here either, so I think it might just be s5 tech debt now...

Matt-Yorkley on 21 Aug 2018

Finding out which script was creating the backups would be a useful investigation though, as Maikel pointed out. It could be that we actually don't use the script at all.

Matt-Yorkley on 21 Aug 2018

Hum... I didn't see we decided to use Amazon for backups as well @Matt-Yorkley , I thought it was only for images. I didn't see the discussion on this point on Slack new French prod. I know we have some users very reluctant that their data are stored on US servers where at anypoint their privacy can be broken. Also important to know as if there is an impact on privacy we need to tell them in our privacy policy that there is a risk of privacy breach through that weak point. @RachL I know you were insistant on this point, do you have arguments for not using Amazon for backups?
If we stick with it and accept the privacy breach potential then I agree it's lower priority... but let us check that first!

myriamboure on 21 Aug 2018

Well if Amazon is used for backups, it means that our users' personnal data ends up on it, and it might be good to let them know and explain our choice of doing so.
Do we know which AWS data center is used? I know they have quite recently opened data centers in France, and they are for sure EU located data centers. Indeed, if the the server is located in the US then Patriot Act applies, which is even worse than just having Amazon being able to play with the data (I know in theory they can't...but well :) ).
So maybe let's first collect info on what we have today, try to reach our user to explain pros and cons before changing anything on the dev side? I mean, French companies (and especially retailers) hate to have their data located on their biggest opponant's server, but maybe our current users don't mind as long as they know it's under (at least) EU law? (or maybe they don't mind at all but I'm guessing using Eu data centers couldn't hurt :) ).
On a very personal point of view I find it a bit sad that we manage everything with alternative solutions to end up having our data on Amazon services... but I also do know that in terms of cloud services they are offering something really interesting. Or at least that's what I've understood so far while speaking with sys admins.

RachL on 21 Aug 2018

Ok to put S3 and say it has to be done in a month, I can put it in my agenda and switch back to S2 in a month... but I find this a bit complicated to be honest, maybe just better to do it :-) @daniellemoorhead process master what do you think?

IMO this is too complicated, swapping severity levels and setting due dates. Also, it isn't a bug, it's tech debt, so does the s1/2/3 etc bug level apply to this?

Looking at the conversation over the last few hours I believe this isn't a massive problem/risk anymore, regardless of the AWS worries/thoughts.

So, I say we change this issue's title to be a spike to do this:

Finding out which script was creating the backups would be a useful investigation though, as Maikel pointed out. It could be that we actually don't use the script at all.

And we put it in a tech debt backlog somewhere (opportunity to create one for all the small things that need doing?) so it can be managed by the developer team and done when it needs to be done.

Thoughts @myriamboure @RachL @Matt-Yorkley @luisramos0 @mkllnk?

daniellemoorhead on 22 Aug 2018

👍2

Yep, sounds good, it seems there is no risk anymore of disk being full if backups are not done on disk... (that's what I understood from @Matt-Yorkley comment) I change the priority to s4 until we know if there is a risk of privacy breach for our users depending on server location, would be good to have the info as we need it anyway for our privacy policy document (you do as well for UK btw @Matt-Yorkley ;-)). But good if someone want to open and curate some tech debt backlog...

myriamboure on 22 Aug 2018

Hey @myriamboure my point was that it doesn't deserve a bug severity on it - it's not a bug, it's tech debt. We need to figure out how to quantify the severity of tech debt...but I don't believe that is with the same label that bugs use.

So, I've removed the bug label for now. And I encourage the developers to come together to discuss from a process perspective how they want to manage tech debt ongoing. Ping @mkllnk @sauloperez @luisramos0 @Matt-Yorkley @kristinalim @HugsDaniel who can take on this challenge :)

daniellemoorhead on 24 Aug 2018

Also:

So, I say we change this issue's title to be a spike to do this:

Finding out which script was creating the backups would be a useful investigation though, as Maikel pointed out. It could be that we actually don't use the script at all.

And we put it in a tech debt backlog somewhere (opportunity to create one for all the small things that need doing?) so it can be managed by the developer team and done when it needs to be done.

Is this the right approach for this issue?

daniellemoorhead on 24 Aug 2018

I am not sure this is up to date, but it's sysadmin so moving to the sys-admin baclog column.

luisramos0 on 14 Feb 2020

I believe this is old. AFAIK all instances are now storing them in S3.

sauloperez on 18 Feb 2020

I would say most instances are using S3 but probably not all. This problem could also arise with new instances. And I don't think we are in a position to make S3 mandatory, especially because Amazon is not value aligned and we would like to find an alternative. Let's make OFN frictionless.

On this topic, I was wondering if we should migrate to the backup gem:
https://github.com/backup/backup
It's more up-to-date and much more popular. But it's also in maintenance mode now. I haven't seen any other new and coming backup system that is better.

mkllnk on 19 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Fix error `Money::Currency::UnknownCurrency: Unknown currency ''` on setup

shen-sat · 3Comments

Missing translation "Refund"

filipefurtad0 · 3Comments

Replace some use of "entreprise" (fr) in the code with "enterprise" for consistency

kristinalim · 3Comments

Volume quantity 350mL gets transformed to 349.999999 on some products

myriamboure · 3Comments

v3.4.4 Swiss Chard

andrewpbrett · 3Comments