Sonarr 🚀 - SQLite on Network Share

docker and other virtualization systems often fail with sqlite locking or corruption errors when using WAL with sqlite file on host shared paths.

Do you have a link for that with some evidence? I'd expect a linux docker to behave largely the same. A linux docker in a windows hyper-v, that's different, since it's essentially a network share unless you mount a datacontainer or mobylinux mount, instead of a windows host mount/share.
I'm not saying it can't go wrong, docker is a special animal, but I need more info to be able to determine the best course of action.

sqlite databases need to live on a SMB/CIFS/NFS path.

It shouldn't, like never. Even other synchronization modes aren't reliable over networks, it's horrible for performance too.

This was already addressed for OSX in #167 - may as well just make it an option.

Euh, no, rule 1 of the Fool Proof Handbook is to never add an option that the user must set to avoid breaking stuff. Either detect the edge-case and deal with it automatically, or throw a big fat warning saying it's unsupported. :smile:
For example, we might be able to detect if the db is on a known reliable fs and use wal in those cases. Or detect it's on a network or cloud drive and simply refuse to start. Or inside a docker, and force non-wal mode. Things like that.

But as I said, need more info.... please

Taloth on 2 May 2017

Updated the description a bit.

To be clear, as the title of the issue said, I'm only discussing docker for windows; which uses CIFS/SMB to mount host paths. It mounts them with the "nobrl" option, which causes lock requests not to be sent to the server (https://github.com/docker/for-win/issues/11). This is unique to docker for windows, though similar problems arise on docker for osx.

If your solution is that network paths are not supported for the database files, then that's fine; it just means that anyone using docker for windows will have massive problems; and perhaps a startup warning that the appdata filesystem must be local would be nice.

I agree that requiring the user to add an option for normal behavior is bad; the flip side of that rule is that anything you set automatically should be able to be overridden by the user. Go ahead and set it on OSX; but let the end user override it if they want. I don't think application code should know about every edge case; that's what configuration files or advanced command line options are for.

At any rate, there are various complaints of Sonarr (and Radarr, and Plex) not working right/at all/being corrupted on docker for windows; CIFS is, I believe, the root cause.

lokkju on 2 May 2017

If your solution is that network paths are not supported for the database files, then that's fine; ...; and perhaps a startup warning that the appdata filesystem must be local would be nice.

That's my preferred solution for network shares, coz it's just inviting disaster regardless of sync mode.
For docker for windows i'd just recommend not mount on the windows host, but mount on the mobyvm or use a data volume or datacontainer (volumes_from). I think we might be able to detect that scenario and force non-wal mode, but I'll have to do some testing of /proc/mount show how the volume is mounted.
Tnx for the info. btw: Docker for Windows via hyper-v (win10) or virtualbox (win7)?

I agree that requiring the user to add an option for normal behavior is bad; the flip side of that rule is that anything you set automatically should be able to be overridden by the user. Go ahead and set it on OSX; but let the end user override it if they want. I don't think application code should know about every edge case; that's what configuration files or advanced command line options are for.

In our experience you shouldn't. Yes, advanced users are quite capable of making those decisions. But Sonarr isn't intended for advanced users and any option is likely to be abused/misused (we have empirical evidence on that, and dozens of wasted support hours to drive the point home). So any (hidden/config-file only) option should be carefully considered, and avoided as much as possible. There usually is a better solution.
As long as the average user doesn't even bother reading the info tooltips in the UI... _sigh_ I digress.
I'd argue that if Sonarr errs on the side of caution, and only use WAL in cases it knows it would work. Then no option is needed. We just need to find out if that's feasible.

Taloth on 2 May 2017

My understanding of the teminology is that Docker for Windows uses Hyper-V (Windows 10 only), while Docker Toolbox for Windows uses Virtualbox (Windows 7+). In this issue, I'm discussing Hyper-V with MobyLinuxVM; the host paths are from the parent Windows host; using a config path in the MobyLinuxVM is an option, but there is no easy way to tell docker to do that; and afaik, all docker volume drivers on windows will use CIFS as well.

I'm a long-time backend services coder, so don't think so well about normal user usability issues grin. That said, perhaps taking a progressive enhancement approach for things like WAL mode might be better: by default, use the most compatible journaling mode (DELETE, iirc); if you detect a supported filesystem, enable WAL mode. This will allow for usage on unknown filesystems without code changes.

Still, I agree using sqlite on what is essentially a network filesystem is a bad idea; I just don't know of a better solution. The only other options I can think of involve rsync/unison with inotify; and that has it's own problems.

Really, though, this isn't a Sonarr problem; it's a Docker for Windows problem, that they made worse by disabling file locking.

lokkju on 2 May 2017

Hi,

look at this config file. Maybe it's worth a shot: https://system.data.sqlite.org/index.html/artifact?ci=trunk&filename=System.Data.SQLite/Configurations/System.Data.SQLite.dll.config

And just for information: If docker is running on top of a Linux system (virtualized in Hyper-V or not), path mapping works as expected and the database works as expected.

I'm running a Linux VM inside Hyper-V that contains a docker environment containing Sonarr. The storage backend is LVM and the config and data paths are mapped into the container. Works.

Have a look at creater_container():

# cat /etc/docker/containers/sonarr.on
container_name="sonarr"
container_hostname="$container_name"
container_image="linuxserver/sonarr"
container_update_auto=1

function stop_container {
    docker stop "$container_name"
}
function start_container {
    docker start "$container_name";
}

function delete_container {
    docker rm "$container_name"
}
function create_container {

    docker create \
        -e PUID=1002 \
        -e PGID=1006 \
        --hostname "$container_hostname" \
        --ip 10.1.1.3 \
        --name "$container_name" \
        --net vidnet \
        --restart always \
        -v /etc/ssl/certs:/etc/ssl/certs:ro \
        -v /dev/rtc:/dev/rtc:ro \
        -v /srv/data/sonarr/config:/config \
        -v /srv/data/sabnzbd/downloads:/downloads \
        -v /srv/data/sonarr/pickup:/pickup \
        -v /srv/data/sonarr/recycle:/recycle \
        -v /srv/videos/tv:/tv \
        "$container_image"
}


function canbestopped_container {
        return 0;
}

And then in /srv/data/sonarr/config:

# ls {logs,nzbdrone}.*
logs.db  logs.db-shm  logs.db-wal  nzbdrone.db  nzbdrone.db-shm  nzbdrone.db-wal

Cu

Grimeton on 11 May 2017

@Grimeton - of course it does. We're specifically discussing Docker for Windows, which uses MobyLinuxVM running on HyperVM, with paths on the windows host. In this (standard) configuration, any paths on the Windows host are mounted via SMB/CIFS.

lokkju on 12 May 2017

@lokkju Yeah the question came up, so I clarified it.

Grimeton on 12 May 2017

I have an armv7 docker swarm cluster, running a sonarr container among a lot of other things. This cluster, have a glusterfs server on all the nodes, setup as a replication. I mount locally on all nodes using glusterfs fuse filesystem to localhost. In short, I have a local filesystem on all the nodes with the same data.

This works for everything, but sonarr, that corrupt the sqlite3 database in average once each two days in average.

My workaround is to backup the database (.dump) every hour. If database corrupts, it automatically remove all sqlite databases and restore a new one from the last working dump.

Would be nice to have an option "use WAL" or something like that on the configuration to get rid of this. Or support external relational databases (mysql, postgres, ...). I think external databases would be a lot of work mainly because of version migrations, advanced selects, so on, but the option to use wal or not, should be simple to add.

trunet on 18 Jul 2017

👍3

Just some update, the latest way to use docker on windows is LCOW, which uses "linuxkit" running inside hyper-v. It seems they now use 9p to share the volumes, which also results in a lot of errors and makes any container that uses sqlite in WAL mode unusable or any locking operation for that matter.

I got the answer #1385 that this is a Microsoft problem. Its still crazy to me, that after all those years docker can't handle sqlite + WAL on windows. Its a real shame since LCOW works great otherwise and is a huge improvement over the old mode and docker toolkit.

Andy2244 on 9 May 2018

This isn't a Docker/Windows/CIFS issue. I get the same behaviour on Docker Swarm on Ubuntu using NFS. Oddly, this worked fine with Kubernetes even though the NFS server was the same.

fergalmoran on 29 Aug 2018

👍1

As others have mentioned, there actually are valid scenarios in which you may have to mount configuration and the database from a network share.
I also run Sonarr within Docker Swarm (with only one replica), and it is quite common for the container to be moved from one node to another when the original one goes offline or while re-balancing load. Local storage isn't an option in this scenario.
For it to work on CIFS, the nobrl option was necessary, and when using NFS, it is very common for background tasks to throw the "The database is locked" error in the logs. Fortunately I haven't seen database corruption yet.
It clearly wasn't built with the intention to be deployed in this manner, so I'm quite surprised that it actually works and performance isn't bad at all. But the database locking is indeed still an issue, and at some point I assume I'll start seeing database corruption.
Given that supporting remote databases would be a huge rewrite, it seems the WAL setting might be an interesting workaround.
It would be great if Sonarr could determine what it should use on its own, but it might be a bit difficult. I mount the network shares on the hosts, and use docker bind volumes, so Sonarr would just see it as any other volume. Maybe it could try to execute one of these commands that cause locking problems, and make the suggestion to change the setting, or something like that.

someCynic on 17 Sep 2018

👍4

This isn't a Windows only issue. I get the same errors on Rancher 2.1 Kubernetes, using NFS Persistent Volume to a ReadyNAS NFSv4.

My research shows it is a known issue with sqlite not playing nice with NFS's locking and the answer might be to allow nolock as an option.

yamlCase on 1 Nov 2018

👍3

I'd like to chime in too with this problem. I use a Docker container for Sonarr. It only happens when I use NFS as the datastore. This would be great to get working as others also want their persistent data stored on a NFS server. The nolock mount option does nothing in my case. Sonarr appears to be functioning just fine giving these System.Data.SQLite.SQLiteException (0x80004005): database is locked errors, but I could see it leading greater problems.

My NFS mount options are:

[Mount]
What=freenas.lan:/mnt/Pergamum/Docker
Where=/nas/freenas.lan/Docker
Type=nfs4
Options=noatime,nolock,soft,rsize=32768,wsize=32768,timeo=900,retrans=5,_netdev

Logs:
https://pastebin.com/rXZR7yxq

onedr0p on 24 Nov 2018

The nolock mount option does nothing in my case.

Oh well, it was a stab in the dark and thanks for eliminating that.

yamlCase on 24 Nov 2018

I would love to have the priority bumped on this issue. Out of 30 containers, Lidarr, Radarr & Sonarr are the only applications I run that cannot use NFS for application data. :(

For now I have just stored their application data to the VM instead of my NFS share.

onedr0p on 24 Nov 2018

👍2

Can confirm - this is still an issue in the Sonarrv3 previews.
REALLY sucky...

fergalmoran on 27 Nov 2018

👍2

Just wanted to give my $.02. I have the same issue. Tryong to run sonarr in a kubernetes cluster has been... Painful.

I ended up having a container first grabbing a copy of the data from the nfs share and then putting it on a local share. Then start sonarr and have another container do a copy back to the nfs share to have a somewhat reliable backup of it.

It's gross. The db is going to get corrupted someday because the container is gonna crash in the middle of the transfer. And while it's in no way sonarr's fault that sqlite is garbage over network share... It would be really nice to have a fix, or be able to run against mysql/postgres/...

And yeah like mentioned before, the nolock option doesn't solve that issue.

Xaelias on 4 Feb 2019

👍11

Looks like it does this with sqlite on nfs for me too. nolock did not fix the issue. @Xaelias I will probably do the same thing as you.

pryorda on 12 Mar 2019

@markus101 @Taloth would it be possible to include a start up argument to disable wal? Seeing that it's disabled on OSX it would be nice to make it configurable for people that use NFS shares.

onedr0p on 24 Mar 2019

As linked in the comment above, I'm getting 'database disk image is malformed' errors I have my persistent docker storage mounted by a glusterfs share. I tried using the local disk as suggested by @markus101 and I'm not getting the errors anymore, but I really want the safety and redundancy of the glusterfs server I painstakingly setup.

Can't we opt for a separate mariadb or postgres db instead of sqlite?

tscibilia on 15 Apr 2019

👍6

FYI to everyone saying "nolock didn't work", the option @yamlCase mentioned/linked is a SQLite option used when loading the database file, not an NFS mount option.

yacn on 21 Apr 2019

👍4

+1 would love to see an enhancement to address this.

Grafana and Sonarqube have the option to connect to external persistent data stores such as MySQL. It would be wonderful to see something similar with Sonarr.

benfff85 on 5 May 2019

👍3

This is still a headache for me. If an config flag to disable WAL mode isn't an option, how about just an environment variable that advanced users can set?

lokkju on 25 May 2019

👍3

Did retest this with all the latest docker/lcow/windows stuff and still get disk I/O error and NzbDrone.Core.Datastore.CorruptDatabaseException.

Docker version master-dockerproject-2019-06-05, build c02f389c
Kernel Version: 10.0 18362 (18362.1.amd64fre.19h1_release.190318-1202)
Operating System: Windows 10 Pro Version 1903 (OS Build 18362.145)
4.19.27-linuxkit

Seems the 9p filesystem still lacks compatible locking options and many linux containers wont run correctly via LCOW, see: linux-containers

Andy2244 on 6 Jun 2019

connectionBuilder.JournalMode = OsInfo.IsOsx ? SQLiteJournalModeEnum.Truncate : SQLiteJournalModeEnum.Wal;

If it's work with Osx why couldn't it be used with Linux?

ggzengel on 16 Jun 2019

👍1

@ggzengel that is exactly what I asked. There should be a start up parameter that disables wal

onedr0p on 17 Jun 2019

Hitting the same issues here with config directories hosted on NFS and mounted to container running via rancher2/k8s.

System.Data.SQLite.SQLiteException (0x80004005): database is locked database is locked at System.Data.SQLite.SQLite3.Step (System.Data.SQLite.SQLiteStatement stmt) [0x00088] in <61a20cde294d4a3eb43b9d9f6284613b>:0 at System.Data.SQLite.SQLiteDataReader.NextResult () [0x0016b] in <61a20cde294d4a3eb43b9d9f6284613b>:0 at System.Data.SQLite.SQLiteDataReader..ctor (System.Data.SQLite.SQLiteCommand cmd, System.Data.CommandBehavior behave) [0x00090] in <61a20cde294d4a3eb43b9d9f6284613b>:0 at (wrapper remoting-invoke-with-check) System.Data.SQLite.SQLiteDataReader..ctor(System.Data.SQLite.SQLiteCommand,System.Data.CommandBehavior) at System.Data.SQLite.SQLiteCommand.ExecuteReader (System.Data.CommandBehavior behavior) [0x0000c] in <61a20cde294d4a3eb43b9d9f6284613b>:0 at System.Data.SQLite.SQLiteCommand.ExecuteNonQuery (System.Data.CommandBehavior behavior) [0x00006] in <61a20cde294d4a3eb43b9d9f6284613b>:0 at System.Data.SQLite.SQLiteCommand.ExecuteNonQuery () [0x00006] in <61a20cde294d4a3eb43b9d9f6284613b>:0 at Marr.Data.QGen.UpdateQueryBuilder1[T].Execute () [0x0003b] in C:BuildAgentwork5d7581516c0ee5b3srcMarr.DataQGenUpdateQueryBuilder.cs:157
at Marr.Data.DataMapper.Update[T] (T entity, System.Linq.Expressions.Expression1[TDelegate] filter) [0x00000] in C:\BuildAgent\work\5d7581516c0ee5b3\src\Marr.Data\DataMapper.cs:674 at NzbDrone.Core.Datastore.BasicRepository1[TModel].Update (TModel model) [0x0002a] in C:BuildAgentwork5d7581516c0ee5b3srcNzbDrone.CoreDatastoreBasicRepository.cs:125
at NzbDrone.Core.Tv.SeriesService.UpdateSeries (NzbDrone.Core.Tv.Series series, System.Boolean updateEpisodesToMatchSeason) [0x000a9] in C:BuildAgentwork5d7581516c0ee5b3srcNzbDrone.CoreTvSeriesService.cs:160
at NzbDrone.Core.Tv.RefreshSeriesService.RefreshSeriesInfo (NzbDrone.Core.Tv.Series series) [0x00213] in C:BuildAgentwork5d7581516c0ee5b3srcNzbDrone.CoreTvRefreshSeriesService.cs:110
at NzbDrone.Core.Tv.RefreshSeriesService.Execute (NzbDrone.Core.Tv.Commands.RefreshSeriesCommand message) [0x00072] in C:BuildAgentwork5d7581516c0ee5b3srcNzbDrone.CoreTvRefreshSeriesService.cs:175
at NzbDrone.Core.Messaging.Commands.CommandExecutor.ExecuteCommand[TCommand] (TCommand command, NzbDrone.Core.Messaging.Commands.CommandModel commandModel) [0x000f6] in C:BuildAgentwork5d7581516c0ee5b3srcNzbDrone.CoreMessagingCommandsCommandExecutor.cs:95
at (wrapper dynamic-method) System.Object.CallSite.Target(System.Runtime.CompilerServices.Closure,System.Runtime.CompilerServices.CallSite,NzbDrone.Core.Messaging.Commands.CommandExecutor,object,NzbDrone.Core.Messaging.Commands.CommandModel)
at System.Dynamic.UpdateDelegates.UpdateAndExecuteVoid3[T0,T1,T2] (System.Runtime.CompilerServices.CallSite site, T0 arg0, T1 arg1, T2 arg2) [0x00035] in <35ad2ebb203f4577b22a9d30eca3ec1f>:0
at (wrapper dynamic-method) System.Object.CallSite.Target(System.Runtime.CompilerServices.Closure,System.Runtime.CompilerServices.CallSite,NzbDrone.Core.Messaging.Commands.CommandExecutor,object,NzbDrone.Core.Messaging.Commands.CommandModel)
at NzbDrone.Core.Messaging.Commands.CommandExecutor.ExecuteCommands () [0x00027] in C:BuildAgentwork5d7581516c0ee5b3srcNzbDrone.CoreMessagingCommandsCommandExecutor.cs:41 `

namebrandon on 20 Jun 2019

@markus101 @Taloth would you be open to a MR that adds a command line argument to disable WAL? It's only a work around but it's honestly all we have besides adding a way to connect to an external DB.

onedr0p on 24 Jun 2019

@onedr0p Preferably not, as mentioned before it's a nice workaround for advanced user, but you don't want users to actively have to configure something for it work, if it can be helped at all. (The irony of that statement doesn't escape me with respect to how long this issue has been open.)
I'd prefer it it works in reverse: use wal only if it's on a local drive.
I can whip something up on a v3 feature branch, but that will have to be tested on various setups.
In fact, if you wish you can try make the necessary change yourself: Run IDiskProvider.GetMount(...) on the appdata dir during the ConnectionStringFactory call, it should contain the necessary info to determine whether the appdata dir is a local drive and use WAL/Journal accordingly. Getting an IDiskProvider instance is possible by adding it to the ConnectionStringFactory constructor.

Taloth on 24 Jun 2019

I'd prefer it it works in reverse: use wal only if it's on a local drive.

The issue has nothing to-do with local vs network, as the underlying problem is about locking and other FS features, which are not implemented on some FS or work "funny", limited in others. As noted we get similar errors with a 9P "local" filesystem on Docker for Windows mounts. So if WAL needs certain FS features to work correctly, that it needs to probe for the specific features on the FS itself.

Andy2244 on 24 Jun 2019

@Taloth like @Andy2244 said it is much more than network filesystems, but that is what most people are struggling with including me.

I have opened PR #3180 to add a start up arg to disable wal. I have tested locally and should work :)

onedr0p on 24 Jun 2019

🎉2

that it needs to probe for the specific features on the FS itself.

That's a valid point. It might be doable but I'm not sure if we can do that properly for all supported platforms, it's worth an attempt.
Although I have to note that 9P is a network filesystem protocol, not a local filesystem. I'm not sure if it's detected as such by mono and/or /proc/mounts but that we can deal with.
moby used to connect to the windows host via CIFS, also a network filesystem protocol and is already detected as such by sonarr.

Taloth on 24 Jun 2019

PR updated to https://github.com/Sonarr/Sonarr/pull/3183

onedr0p on 25 Jun 2019

Jumping in to share my ~~shitty hack~~ _workaround_ for Kubernetes users that rely on NFS for persistence; use a sidecar container, mount an ext4 image file backed by a local disk or ram, and fsfreeze it every 5 minutes to copy a snapshot of Sonarr's database files.

YMMV, but this seems to work because sqlite still makes atomic writes to disk, even in WAL mode. Using fsfreeze on a real filesystem like ext4 prevents Sonarr from writing further changes until you've finished copying them off to NFS storage.

Tradeoff is that you might lose the last 5 minutes of activity if there's an unexpected outage.

Show YAML

yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: labels: app: sonarr name: sonarr spec: replicas: 1 selector: matchLabels: app: sonarr strategy: type: Recreate template: metadata: labels: app: sonarr spec: containers: - command: - sh - -c - |- dd if=/dev/zero of=/ramdisk/image.ext4 count=0 bs=1 seek=400M; \ mkfs.ext4 /ramdisk/image.ext4; \ mount /mnt/sonarr-ramdisk-mount /ramdisk/image.ext4; \ cp -fvp /sonarr-config/*.* /mnt/sonarr-ramdisk-mount; \ while true; do \ sleep 890; \ sync /mnt/sonarr-ramdisk-mount/*.*; \ fsfreeze --freeze /mnt/sonarr-ramdisk-mount; \ sleep 10; \ cp -fvp /mnt/sonarr-ramdisk-mount/*.* /sonarr-config/; \ fsfreeze --unfreeze /mnt/sonarr-ramdisk-mount; \ done; image: ubuntu imagePullPolicy: Always lifecycle: preStop: exec: command: - umount /mnt/sonarr-ramdisk-mount; name: sonarr-config resources: {} securityContext: privileged: true volumeMounts: - mountPath: /ramdisk name: ramdisk - mountPath: /sonarr-config name: sonarr-config - mountPath: /mnt/sonarr-ramdisk-mount mountPropagation: Bidirectional name: sonarr-ramdisk-mount - command: - sh - -c - until [ -f "/config/sonarr.db" ]; do sleep 1; done; /init env: - name: PUID value: "1000" - name: PGID value: "1000" image: linuxserver/sonarr:preview imagePullPolicy: IfNotPresent name: sonarr ports: - containerPort: 8989 name: sonarr protocol: TCP readinessProbe: tcpSocket: port: 8989 resources: {} volumeMounts: - mountPath: /config mountPropagation: HostToContainer name: sonarr-ramdisk-mount - mountPath: /config/Backups name: sonarr-config subPath: Backups - mountPath: /config/MediaCover name: sonarr-config subPath: MediaCover - mountPath: /config/logs name: sonarr-config subPath: logs - mountPath: /config/xdg name: sonarr-config subPath: xdg - mountPath: /media mountPropagation: HostToContainer name: media volumes: - emptyDir: medium: Memory sizeLimit: 400M name: ramdisk - name: sonarr-config persistentVolumeClaim: claimName: sonarr-config - emptyDir: {} name: sonarr-ramdisk-mount - hostPath: path: /tmp/media type: Directory name: media

putty182 on 9 Jul 2019

🎉3

A few tips for those on network filesystems:

NFS
- Make sure you are using a very recent NFS server and NFS client
- Make sure you are on a recent version of the linux kernel (afaik, 3.12+ should be ok which is still pretty old)
- If you are using NFSv3, make sure you have rpcbind, lockd, and rpc.statd daemons running on every NFS client and server
- If possible, use NFSv4. If an NFSv3 client can't connect to lockd, locks won't work. The filesystem can still mount though!
- If you use use Sonarr only on a single host and it will never move, try local_lock=all.
  - If you use Kubernetes, you cannot use a deployment. It must be a statefulset. Even then, the database can become corrupted if the host/pod is marked abandoned and a replacement is started. A standard eviction/host restart should be fine though.
- Turn off all caching. e.g. lookupcache=none, noac,sync,sharecache,forcedirectio,
- Note: I do not use NFS so there may be more things to consider
- GlusterFS
- Make sure you are on at least 3.8.
- Make sure you are using FUSE on a recent kernel (2.x kernels will not work, 3.x not sure what the minimum is)
- Locks should "just work" with the above. If they don't, you have set a volume option which is not compatible.
  - Make sure you set locks.mandatory-locking: forced
  - Use direct-io-mode=enable when mounting
  - Disable all performance translators, especially:
    - performance.strict-o-direct: on
    - performance.stat-prefetch: off
    - performance.write-behind: off
    - performance.open-behind: off
  - See https://docs.gluster.org/en/latest/Administrator%20Guide/Mandatory%20Locks/ for more information
- Note: There are bugs like https://bugzilla.redhat.com/show_bug.cgi?id=1397085 where Sonarr may hang but it should not cause corruption
- Note: There is no lock healing in GlusterFS. If your network connection is bad, locks will be lost!

Note: This probably won't apply to windows as locks are completely broken there

Hope this helps!

SerialVelocity on 31 Aug 2019

Adding yet another voice to the chorus of people here; we definitely need a workaround for filesystems which don't support WAL.

btowntkd on 16 Sep 2019

@btowntkd the issue we found was that disabling WAL causes major lag when using a large collection of series. I still want this problem to be solved don't get me wrong. It seems like SQLite and Sonarr don't play well for our use case. :(

onedr0p on 17 Sep 2019

I have also experienced this issue when running Sonarr with /config mapped to a volume on a remote NFS or CIFS share.

+1, would be awesome to see this issue get resolved.

phishsticks on 5 Oct 2019

@btowntkd the issue we found was that disabling WAL causes major lag when using a large collection of series. I still want this problem to be solved don't get me wrong. It seems like SQLite and Sonarr don't play well for our use case. :(

So then - can we not put this behind an off by default setting? Personally, I would much rather deal with lag than deal with weekly db corruption.

fergalmoran on 8 Oct 2019

👍2

So then - can we not put this behind an off by default setting? Personally, I would much rather deal with lag than deal with weekly db corruption.

I'm not sure I agree no. Weekly DB corruption sounds like you have something else going on.
Me and a couple other people have shown how we work around this specific problem. I've been running this for months now. And never once had an issue.

Don't get me wrong, I would love for Sonarr to have a support for real postgres or something. But until someone does the work, I don't think there is a real solution. Now yes, a flag that allows enabling/disabling the offending option. Sure, that can't really hurt. The default should probably stay what it currently is IMO.

Xaelias on 8 Oct 2019

@Xaelias - so you're agreeing with me then, a flag that is off by default can't really hurt?

@SerialVelocity 's solution doesn't work for me, I'm actually doing something similar to your solution myself but as you say, "it's gross".

fergalmoran on 8 Oct 2019

@fergalmoran I may have misread your proposition. Yeah we probably agree then.

Xaelias on 9 Oct 2019

👍1

Update for the Windows/docker users out there.
I just re-tested the latest Docker Desktop CE 2.1.6.0 (Edge) for Windows and got working sqlite + WAL for Sonarr/Headphones containers!

This is using the default hyper-v "Linux Container" backend + "shared drive" feature, not Lcow/WSL2. So maybe give it a try again and see if the sqlite db's don't corrupt anymore and maybe even network shares might work, not sure how smb over those new "shared drives" behaves. I did not test the latest inotify stuff, but if it works as well a lot of containers should now run correctly from windows bind mounts.
This means if all is stable, we can finally use the Docker CE to get our containers working "natively" on Win10. Currently as hack, i use a hyper-v vm with ClearLinux/Docker + reverse samba4 server for my Sonarr setup, so this "new" way via direct bind mounts might finally work.

2.1.5.0 introduced this:

New file sharing implementation: Docker Desktop introduces a new file sharing implementation which uses gRPC, FUSE, and Hypervisor sockets instead of Samba, CIFS, and Hyper-V networking. The new implementation offers improved I/O performance.

2.1.6.0 this:

Docker Desktop now supports inotify events on shared filesystems for Windows file sharing.

All my quick tests that failed before now work correctly, while using a bind mount from my host NTFS drive. _(You need to add your drive to share via settings and than can directly use it, but make sure the folders exists.)_

Examples via PowerShell:
docker run -it --name=sonarr -v f:\docker\test2:/config -e PGID=0 -e PUID=0 -p 8989:8989 linuxserver/sonarr

docker run -it --name="headphones" -v f:\docker\test1:/config -p 8181:8181 linuxserver/headphones

PS: I assume the same stack (_gRPC, FUSE, and Hypervisor sockets_) is utilized for there WSL2 backend, while the old experimental Lcow backend (_Windows Containers_) will not use it, since it has no "shared drives" option.
Maybe someone brave can test the latest DD/Edge version with latest WSL2(_19018_) and check if it behaves correctly as well? I'm confused how Docker Desktop + WSL2 actually works regarding bind mount from the host.

Andy2244 on 19 Nov 2019

👍1

Just set up a homelab cluster based on Nomad and NFS as a shared data store. Quite discouraged after hours of efforts to find this issue and realize I can't (at least with my skills) get Sonarr up and running. One more vote for some movement on a "real" solution for this this or a flag that can be set.

natelandau on 3 Feb 2020

@natelandau If iSCSI is an option for you, using it as persistent volume will avoid this problem with SQlite and NFS.

chrisadas on 3 Feb 2020

So I came here from a thread on reddit about Bazarr. I have the following setup:

Synology NAS running docker
Virtual DSM on the NAS, which has the VPN enabled, also running docker
Sonarr v3, Radarr v3, Lidarr, Jackett and Bazarr all running in the VDSM docker instance
All of the config data for all of those containers stored in a network share in the VDSM, which maps to the volume in the 'real' DSM
Note that the NFS share exposed to the Virtual NAS maps to exactly the same physical disk as the drive on the 'real' DSM - it's just mapped to expose it to the virtual one.

So the interesting part is that Sonarr, Radarr and Lidarr are all running fine on the virtual DSM, with the configuration stored on the NFS share. I installed Bazarr, and it immediately failed with a locking error which is obviously related to WAL.

Moving Bazarr container onto the 'real' DSM, and storing the config in exactly the same place on the volume, just using the direct path rather than mounting that folder as an NFS share, works just fine.

What's weird is that in theory, Sonarr, Radarr and Lidarr should all fail with the NFS share, if they have WAL enabled...

Either way, another vote here for a DISABLE_SQLITE_WAL option for all of these containers. :)

Webreaper on 2 Apr 2020

Jumping in to share my ~shitty hack~ _workaround_ for Kubernetes users that rely on NFS for persistence; use a sidecar container, mount an ext4 image file backed by a local disk or ram, and fsfreeze it every 5 minutes to copy a snapshot of Sonarr's database files.

YMMV, but this seems to work because sqlite still makes atomic writes to disk, even in WAL mode. Using fsfreeze on a real filesystem like ext4 prevents Sonarr from writing further changes until you've finished copying them off to NFS storage.

Tradeoff is that you might lose the last 5 minutes of activity if there's an unexpected outage.

Show YAML

Came here looking for a solution for the same type of SQLite WAL database corruption issues on Gluster... Seems like the above workaround from @putty182 may be a good idea... Now just have to try to figure out how to translate the workaround to GlusterFS using docker swarm services?

n1nj4888 on 15 Apr 2020

Jumping in to share my ~shitty hack~ _workaround_ for Kubernetes users that rely on NFS for persistence; use a sidecar container, mount an ext4 image file backed by a local disk or ram, and fsfreeze it every 5 minutes to copy a snapshot of Sonarr's database files.
YMMV, but this seems to work because sqlite still makes atomic writes to disk, even in WAL mode. Using fsfreeze on a real filesystem like ext4 prevents Sonarr from writing further changes until you've finished copying them off to NFS storage.
Tradeoff is that you might lose the last 5 minutes of activity if there's an unexpected outage.
Show YAML

Came here looking for a solution for the same type of SQLite WAL database corruption issues on Gluster... Seems like the above workaround from @putty182 may be a good idea... Now just have to try to figure out how to translate the workaround to GlusterFS using docker swarm services?

Today I moved from glusterfs 3.x to 7.x, both Sonarr and Radarr are no longer corrupting their databases for me it seems so far, doing some testing at the moment to confirm.

JohnShortland on 4 May 2020

I feel like SQLite's WAL-mode is might be unfairly attacked in this thread. The suggestion that Sonarr should introduce an option to disable it entirely seems like an unnecessarily blunt instrument.

The SQLite page on WAL does indeed say that "WAL does not work over a network filesystem". I think it says this though because the way WAL is implemented is by using shared memory, in this case though a memory-mapped file. Since this method of sharing memory though a memory-mapped file isn't well-supported by network filesystems, SQLite can't make the necessary correctness guarantees for separate hosts that are reading the SQLite database off a network share. However, if you can guarantee that you have no more than one machine using the SQLite database, I don't think that there's anything _inherent_ to the way WAL mode works that it should cause corruption. I imagine that for most deployments of Sonarr, having a single instance deployed is reasonable (I can't imagine many people are load-balancing Sonarr or have it set up in HA configuration).

I experience database is locked errors myself when running Sonarr in a Kubernetes cluster, with its configuration database served from a NFSv3 share mounted with the nolock option. I wanted to create a reproducible test case that demonstrates the problem, but I have not been successful so far. What I have tried is just reading and writing to a WAL-mode SQLite database on a NFS share mounted inside a Docker container. I ran the example scripts listed in a blog post about WAL and those executed just fine. This demonstrates that at least in very simple scenarios, WAL seems like it could be just fine.

I'm interested in reproducing the corruption and locking errors that we see in Sonarr, but I think I need to learn more about how Sonarr interacts with its SQLite database in order to do it. Specifically: does Sonarr _read_ from the SQLite database concurrently from separate threads? Does it _write_ to the SQLite database concurrently from separate threads? Does it use WAL in EXCLUSIVE locking mode?

cjlarose on 6 May 2020

👍1

no plans to change this at this time; closing per markus

bakerboy448 on 18 Nov 2020

👎4

IMO this is a big issue for some users @bakerboy448 & @markus101. I know having an external db may never be supported but maybe some future versions of sqlite may have some features to mitigate this.

Anyways, would it make sense to document this on the FAQ that Sonarr's application data is not supported over NFS/network shares and link to this issue?

onedr0p on 18 Nov 2020

👍1

Has anybody tried putting the DB on a GFS2 share? I have a 3-node Kubernetes cluster, and I fancy creating a "local" PV as a shared LVM thin partition formatted to GFS2 attached to the nodes. In theory, only one pod will access the DB, so it would not matter which node writes the SQLite file on the shared partition. This filesystem was specifically created for shared access, I wonder if SQLite would work on it fine.

immanuelfodor on 27 Nov 2020

@immanuelfodor if running in a kubernetes cluster I suggest iSCSI or other block storage like rook-ceph, longhorn or openebs.

onedr0p on 27 Nov 2020

@immanuelfodor if running in a kubernetes cluster I suggest iSCSI or other block storage like rook-ceph, longhorn or openebs.

Is iSCSI a solution that will allow the database on shared storage to work without these issues?

2fst4u on 6 Dec 2020

Based on @onedr0p 's suggestion, I've started to experiment with Piraeus (wrapper of Linstor which is wrapper of DBRD) to provide high speed NVMe storage for my cluster. It can also use iSCSI under the hood as network block storage protocol.

Find some of my questions about Piraeus usage here: https://github.com/piraeusdatastore/piraeus-operator/issues/125

My experiment is not yet complete to share the final conclusions regarding SQLite, I have had a busy week since then.

immanuelfodor on 6 Dec 2020

@2fst4u not using kubernetes, but on Docker my issue had go away once I switched from NFS to iSCSI

chrisadas on 6 Dec 2020

We're starting to get off in the weeds, but yes, iSCSI will "solve" this issue because under the hood, it works with local copies of the files. It just uses network (async) to report these changes to the NAS.
So there is no reason for SQLite to freak out when used with iSCSI.

Xaelias on 6 Dec 2020

Sonarr: SQLite on Network Share

Most helpful comment

All 57 comments

Related issues