Darktable: Database integrity check on startup and/or exit

Created on 14 Apr 2020 · 25Comments · Source: darktable-org/darktable

Is your feature request related to a problem? Please describe.
While debugging some strange GUI problems in Darktable, I've found out that the library.db database file was corrupt.

sqlite> pragma integrity_check; *** in database main *** On tree page 16 cell 0: 2nd reference to page 957 On tree page 18 cell 22: invalid page number 958 wrong # of entries in index image_position_index wrong # of entries in index images_filename_index wrong # of entries in index images_film_id_index wrong # of entries in index images_group_id_index Error: database disk image is malformed
I do daily backups, but I've found out that the backup files from the last two days are corrupt, too. I've worked yesterday without any problems in Darktable, but in fact, the database was broken. I've started with deleting ~1500 old images in Darktable today and after that, the database was finally unusable.

Describe the solution you'd like
I would like to have an option to run a PRGAMA integrity_check on startup - or even better on exit.
This should be optional (for example, two buttons: Check database and Skip today).
If this is implemented as an exit dialog, it would be also nice to see a backup option. SQLite offers a Backup API, so this should be possible even if the database is locked by Darktable.

If the database is corrupt, this prevents a user from working with the database, unaware that at some point it is finally broken.

Alternatives
Backup is only nice-to-have, because we already has the XMP files as backups. But XMP can be disabled in options so there may be users out there who has their edits only in library.db.

Additional context
A "optimize database" option may be added to the exit dialog. This can run a VACUUM.

Edit: Link to the SQLite Backup API Spec: https://www.sqlite.org/c3ref/backup_finish.html#sqlite3backupinit

no-issue-activity

Source

cytrinox

All 25 comments

somewhat related to #4337 and #4379 (vacuum is already implemented)

parafin on 14 Apr 2020

As I said on IRC - vacuum is already done, it just doesn't do vacuums every time but on certain heuristic (configureable)

Regarding implementation details of it. I thought about it yesterday when the topic was discussed and I think it should follow this logic:

On start -> quick check unconditional
on Close -> quick check unconditional.

those are supposed to be "very quick" and can detect problems immediatelly. However I'm not sure those would detect problems that the integrity_check pragma found in your case.

For integrity checks i think those are better at startup since if the DB's malformed before starting dt, your edits WILL get lost if you don't use XMP files. During darktable run, database damage is hardly possible, however darktable doesn't guarantee ACID since most writes except I think db upgrades aren't transactional, so damages are possible on unclean exits and you'd only detect those after the fact... So i thought about following logic:

introduce clean_shutdown internal option that gets set to FALSE on every startup and to TRUE on every clean shutdown.
if on startup clean_shutdown is FALSE, do full integrity_check
do full integrity_check on startup every "set number" of startups and on every vacuum action based on same heuristic as vacuum

for backup:
As @cytrinox mentioned on IRC, raw processor from evil company has "every month/every week/every-exit" choice for database backup. I'd also do backup on vacuum separate from those options (as a check).

Now once we have backups and integrity checks there's the problem of messaging this to user. As I said I think having quick check at startup + full check on unclean shutdown + x startups means that in case of detecting problems, we could communicate to user "yo db's fucked, restore from bakap or start fresh?" and provide with choices: "Restore from last backup/remove DB and start clean/ close darktable"

johnny-bit on 15 Apr 2020

clean_shutdown option has to be implemented very careful to actually work. Consider power cutting out near the time this option was set to FALSE. Can you guarantee that this change has been written to disk? fsync probably solves this issue, but one has to think this through.

parafin on 15 Apr 2020

This is probably a timing issue - I'd set and write it as false at every startup before even db is initialized and then set and write it as last thing to be written to any file before actual darktable closed (so man calls after last close is done on db :) ) That technically should be enough, yes?

johnny-bit on 15 Apr 2020

no. that's what I'm saying - without doing fsync there is no ordering guarantee

parafin on 15 Apr 2020

👍1

SQLite itself calls fsync() on close, so writing the flag after darktable closed the database handle should be IMHO really safe.

cytrinox on 15 Apr 2020

closing isn't the critical part, opening is more interesting.

parafin on 15 Apr 2020

My idea for opening would be:

read clean_shutdown from config, remember it
write FALSE to clean_shutdown, fdatasync it.
open up database, do integrity checks etc...

for closing it would be a bit in reverse

do final vacuums, checks etc, close db, fsyncit
close down everything else that shuts down on darktable
write "true" to clean_shutdown as last possible write before program termination.

this should guarantee that clean shutdown is FALSE as long as darktable has open db, right?

johnny-bit on 15 Apr 2020

Not sure if this is good but: Instead of modify everytime the config file, introduce a new file, for example .config/darktable/shutdown.state. Write something like dirty at startup and clean at shutdown into it. Some people keep dotfiles (the darktable config would be an ideal candidate) in git and a temporary flag may be unneeded noise.

cytrinox on 15 Apr 2020

ok, mere existence of shutdown.state file would be nice way of communicating state ;)

johnny-bit on 15 Apr 2020

There's already database lock, isn't there? Also syncing file deletion/creation require syncing the whole filesystem instead of just fdatasync of one file. As I said, this idea is not so simple to implement correctly.

parafin on 15 Apr 2020

👍1

What kind of long test are we talking about at the startup? VACCUM or something else?

parafin on 15 Apr 2020

ok, so: darktablerc the clean_shutdown thingie is guaranteed to have "false" as value only during the darktable operations... and you don't git stuff during app run, don't you? I sync my ide configs after I exit them.

as for duration of tests: on rather large db (~over 700MB!) the tests take ~12sec for integrity_check and ~1sec for quick_check.

So those NEED to be communicated and integrity check is a bit time-expensive.

johnny-bit on 15 Apr 2020

👍1

Comparison quick_check vs. integrity_check:

````
sqlite> pragma integrity_check;
* in database main *
On tree page 16 cell 0: 2nd reference to page 957
On tree page 18 cell 22: invalid page number 958
wrong # of entries in index image_position_index
wrong # of entries in index images_filename_index
wrong # of entries in index images_film_id_index
wrong # of entries in index images_group_id_index
Error: database disk image is malformed

sqlite> pragma quick_check;
* in database main *
On tree page 16 cell 0: 2nd reference to page 957
On tree page 18 cell 22: invalid page number 958
Error: database disk image is malformed
````

So I think quick_check is fine.

cytrinox on 15 Apr 2020

OK, so that gives me an idea overall that satisfies every usecase (I think):

Checks

Judging by performance characteristics and checking logic - quick_check should be done always on startup. I don't see much sense to have it on close, but diff'nt srokes...

The suggested setting of clean_shutdown when false guarantees that procedure for closing darktable didn't happen so it being false should suggest doing full integrity_check instead of quick_check. The difference in detection according to sqlite is that full integrity_check also detects index problem

Repairs

there are possible repairs (but only found by integrity_check) in form of REINDEX command but only for wrong indexes. And VACUUM might help with data allocation...
However Repairs would be last case scenario and I wouldn't even consider them from any standpoint. Going "Attempt repair?" to "Nah, still fucked" would be imo more frustrating than "restore from backup/start fresh/close darktable" because with "close darktable" user might to try and fix his database ;)

Possible corruption causes

SQLite lists pottential cases for failures and some are obvious for darktable:

dt sets PRAGMA synchronous = OFF which could result in database corruption if a power failure or hard reset occurs prior to all content reaching persistent storage.
dt sets PRAGMA journal_mode=MEMORY which could result in database corruption if an application crash happend in the middle of a write transaction
all other known possibilites not related to dt settings are listed in https://sqlite.org/howtocorrupt.html

Backup times & storage

The mentioned "every month/every week" have side problems of storing date, but I wouldn't worry about that. Usually first run of fresh darktable creates (empty) backup file and that can be also used as reference date.
"every exit" would be my preference since it'd be freaking easy to implement ;)
also I'd do backup on every vacuum, since "vacuum into" is there ;) but full in-place vacuum + backup of vacuumed data is imo better due to very nice sqlite backup api.

in terms of storage - there are users that have library.db nearly 1GB of size, so unconstrained backup is "bad". I'd introduce an option for keeping backups to X backups back... however I'd default that number to 3 and limit it to max 10.

restore would be done by trying to restore newest one. If that failed, we have a problem (restore would also call integrity check). I think we could loop restore process to check for last known good...

johnny-bit on 15 Apr 2020

Limiting to max 10 seems to low to me. On a weekend day, I start/close darktable very often and if I choose "every exit" I've lost all backups from friday on sunday evening :smiley:
Please let the max limit open or very large, 999 would be a good start.

SQLite files has a very good compression ratio. A 1GIB file could get down to 200-300 MiB. I would recommend to put these backup files into a zip or tar(gz|bzip2) archive. As this is not directly possible with the backup API (it can only copy to a database handle, no direct byte stream), I recommend to create the backup file first at same location, but with .autobak suffix, then put the file into a ZIP archive, after that delete the autobak file. Restore is easy because the file must be only extracted from ZIP, there is no need to use the backup API again.

Another option may be to do a check and vacuum, close all handles to the database and read it directly into a ZIP archive. But I don't think we must so restrictive with disk space usage.

That's just a braindump, maybe I get a better idea after sleep about it...

cytrinox on 15 Apr 2020

My "idea" of not zipping the most recent backup is because then you can literally do anything on it like on normal database so just rename file and you're good to go! with unzipping that requires additional space + processing but who am I to judge ;)

johnny-bit on 16 Apr 2020

Why this error comes basically
On page 6139 at right child: 2nd reference to page 4378
Page 686 is never used
Anyone? Reason?
And please give the solution for this too
Thank you :)

sharma-byte on 16 Apr 2020

@sharma-byte - it's because you have invalid page references in DB index. technically it may be possible to restore db to working state by doing VACUUM on it followed by REINDEX but it's not guaranteed.

johnny-bit on 16 Apr 2020

@johnny-bit Sorry i did not get you can you please elaborate this "invalid page references in DB index".
So, what you mean to say is their is no permanent solution to this?

sharma-byte on 16 Apr 2020

This is nuanced a bit, but essentially the db is malformed and there is problem :)

if you have sqlite 3.29 or newer you can try running:

sqlite3 database_file.db ".recover"

if you have a bit older sqlite you can try running:

sqlite3 broken.db ".dump" | sqlite3 unbroken.db

and then replace broken one with unbroken etc...

anyway - at current stage, without backups (darktable does backup on every version change so you can try restore that) you can just remove broken db and start "fresh". depending on which db file is malformed (data.db or library.db) you loose yur presets or your filmrolls.

johnny-bit on 16 Apr 2020

👍1

Thank you so much :) Version that i am using is 3.8.7.1 and this issue is
coming every now and then.

On Thu, Apr 16, 2020, 2:13 PM Hubert Kowalski notifications@github.com
wrote:

This is nuanced a bit, but essentially the db is malformed and there is
problem :)

if you have sqlite 3.29 or newer you can try running:

sqlite3 database_file.db ".recover"

if you have a bit older sqlite you can try running:

sqlite3 broken.db ".dump" | sqlite3 unbroken.db

and then replace broken one with unbroken etc...

anyway - at current stage, without backups (darktable does backup on every
version change so you can try restore that) you can just remove broken db
and start "fresh". depending on which db file is malformed (data.db or
library.db) you loose yur presets or your filmrolls.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/darktable-org/darktable/issues/4728#issuecomment-614505277,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/APGCYANDJ4BSBL6KLDA22O3RM3AMRANCNFSM4MIBQMJA
.

sharma-byte on 16 Apr 2020

So, here is what is on my mind about all the backup&restore thing:

Basically, this issue/FR is for checking databases to prevent users from working with corrupt databases. How this could be done was already discussed and I think the ideas are pretty good. EOD from my side.

The other thing is: Backups. May I raise some thoughts and questions:

As an experienced user, I always do backups of all my files and systems. I've daily backups from my dotfiles (including library.db) and daily/monthly/yearly snapshots of my RAW and XMP files. Speaking for myself: I'm fine with just checking the database, let me know that there is anything broken and I can restore it by myself.
If the whole library.db could be restored from re-reading the XMP files, the recommended backup strategy for regular users should be to keep backups of their RAWs and XMP files. If the database gets corrupt, darktable could provide a UI workflow to re-create a empty database. After that, a user could simply re-import all the raws.
If we introduce something like scheduled backups, I would expect these features:
- Backup means: I could go back to this backup, not only because the database is corrupted, but because I've done something stupid and want to go back to -X days. Had once this situation with Lightroom where I want to go back 1 week. Not sure if this could be integrated into the darktable GUI workflow. IIRC on Lightroom you could hold shift-key on startup, then a dialog for choosing another catalog or restore from backup appears (not tested it).
- Backup (generally) means too: I could restore any choosen backup on another machine (useful for PC and Notebook usage with a NAS for shared file storage).
- I could configure a default backup path, for example a NAS storage or USB disk. If a scheduled backup is started on exit darktable, this path must be reachable, otherwise skip the backup with a hint.
- The backup file itself must include a user-readable timestamp when is was created.
- It should be prevented that a user could restore a backup created by a newer version of darktable into an older version, as I expect this would lead to other errors.
- With scheduled backups, a keep_max setting would be very useful. I've Lightroom backups on every exit and the history goes back to 2017. But to be honest: I don't think there is a scenario where I want to go back to 2017. But at least I should be able to go back to last 30 days.
- As mentioned before, SQLite files getting very small with compression. If the directions goes into keeping dozens of backup files, compression should be a must-have.
- Compression also helps to bundle all database files into one, single backup-file.
- Because we must dealing with two database (library.db and data.db) the whole thing get's a bit difficult. I would say that a "darktable backup" should contain both files and even including the config file may be useful. When restoring a backup, someone may want to just restore a single item (for example if just the data.db gets corrupt). This requires lots of GUI interaction I think.

I've implemented a backup/restore on a commercial application a few years ago. At some point you always asking yourself if you have just re-implemented a regular backup-tool again and again. Sometimes is may be better to give the job for backups to a dedicated application (like duplicity on Linux, Windows Backup etc.), then you have not to deal with the support for it 😄

@johnny-bit If I understand you right, your plan is to "just" keep a low limited number of plain database copies and if darktable detects curruption, restore (more or less) automatically the latest working database copy. Thats what I would call database snapshots or database savepoints to not confuse with the expected benefits of backups like restoring on another machine. And it leaves the door open for feature-rich backup&restore functionality at some later stage.

For such snapshots, it's fine to not compress them and only keep 2-3 working copies. And it's analog to the process when darktable upgrades from a older version. The old library.db is just moved to a dedicated name. For example, you could use the sqlite backup api to produce a library.db.snap-20200415233412. Sorting and deleting old ones should be easy in this case.

cytrinox on 16 Apr 2020

OK, so as to not confuse things with "full fledged backup" let's call it "snapshots" and be fine with it :wink:

johnny-bit on 16 Apr 2020

👍1

This issue did not get any activity in the past 30 days and will be closed in 365 days if no update occurs. Please check if the master branch has fixed it and report again or close the issue.