Core: Serialization failure, Deadlock found when trying to get lock

Created on 9 Mar 2015 · 40Comments · Source: owncloud/core

Steps to reproduce

Prepare a local folder "lots" with 1600+ files
Setup the sync client and let it upload all the files
Rename "lots" to "lots2", wait for sync
Rename "lots2" to "lots3"

Expected result

Renamed properly.

Actual result

Very strange internal server error:

{"reqId":"913558a89371e56c7248d7c0ac24f820","remoteAddr":"127.0.0.1","app":"remote","message":"An exception occurred while executing 'UPDATE `oc_filecache` SET `path` = ?, `path_hash` = ?, `name` = ?, `parent` =? WHERE `fileid` = ?' with params [\"files\\\/lots3\", \"6496454381429272daca5f1453986396\", \"lots3\", \"3\", 8]:\n\nSQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction","level":4,"time":"2015-03-09T09:45:40+00:00","method":"MOVE","url":"\/owncloud\/remote.php\/webdav\/lots2"}

And then the sync client reuploads all the files which takes a while...

Versions

ownCloud 8.0.0 / stable8 found while testing https://github.com/owncloud/core/pull/14752
mariadb-10.0.14-2.3.x86_64
php5-5.6.6-1.1.x86_64
openSUSE Factory

Not sure if this can be reproduced consistently, will try again.

That could be another argument for closeCursor.

@DeepDiver1975 @icewind1991 @MorrisJobke

Bug blue-ticket mariadb locking sev2-high

Source

PVince81

Most helpful comment

We were able to find the reason for the error in our configuration. The problem was the MariaDB Galera cluster when write requests were sent to all servers deadlock can occur. The solution was to send all write requests to a single server.
A more detailed explanation can be found under:
http://severalnines.com/blog/avoiding-deadlocks-galera-set-haproxy-single-node-writes-and-multi-node-reads

michi-roth on 3 Jun 2016

👍2

All 40 comments

Hmm I wonder if it's due to https://github.com/owncloud/core/issues/14752 which introduces a transaction because that's the branch I got the issue on.

PVince81 on 9 Mar 2015

@PVince81 Also @felixboehm mentioned that we maybe need to handle aborted transactions because of conflicts somehow.

MorrisJobke on 9 Mar 2015

Hmmm just now I noticed that while I renamed "lots3" to "lots4", now I have two duplicate local folders called "lots3" and "lots4".

Aborted transactions ? You mean when the sync client tries to do something but the server is still busy from the previous parallel operation ?

PVince81 on 9 Mar 2015

No aborted transaction as in: The database says "no I can't do this transaction, because there was a parallel concurrent transaction and this transaction isn't valid anymore"

MorrisJobke on 9 Mar 2015

I tried to reproduce the issue but this run didn't. One thing I observed though is that the sync client sends a MOVE operation for EVERY file inside the renamed folder instead of just renaming the folder. It doesn't seem to always do that though.

PVince81 on 9 Mar 2015

I'm trying to sequentially rename "lotsX" to "lotsY" and see what happens, in the hope to reproduce the issue.
There's more weird stuff happening:

{"reqId":"174cfbf66b485a8274cb2bc8071e6a21","remoteAddr":"127.0.0.1","app":"core","message":"unable to rename, file does not exists : files_versions\/lots9.v","level":3,"time":"2015-03-09T10:12:04+00:00","method":"MOVE","url":"\/owncloud\/remote.php\/webdav\/lots9"}

(notice that lots9 is a folder, not a file)

PVince81 on 9 Mar 2015

I tried several times and didn't manage to reproduce the deadlock issue.
Maybe it was a one off or race condition of some sorts... but with the nasty side effect of having the folder duplicate itself.

PVince81 on 9 Mar 2015

Maybe this has something to do with one these bugs that I previously reported?

10711

11847

12659

12661

christianrj on 9 Mar 2015

We've trying OC 8.1.0.6 beta2 on a cloud (HaProxy on frontend apache, HaProxy on frontend MariaDB Cluster) and there is the same problem. Not due to renaming directory, so not due to #14752 cause we've the last version where it's resolved, and we caused the problem without renaming directory.

lparmentier-quatral on 2 Jun 2015

hi all, we have the same problem with OC8.0.5 and client version 1.8.4 ( build 2531 ), when we upload big files ( eg 300MB), the synchronization will abort with "Internal server error"
In the owncloud log, we can find:

Jul 17 09:29:02 sdcoca15 ownCloud[24989]: {remote} An exception occurred while executing 'UPDATE oc_filecache SET mtime = ?, etag=? WHERE (mtime <> ? OR etag <> ? ) AND fileid = ? ' with params [1437118142, "55a8aebee9f3a", 1437118142, "55a8aebee9f3a", "11251183"]:#012#012SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

Probably is the same bug?
thanks

MCoLoL on 17 Jul 2015

Is anyone still seeing this with 8.1.3 with transactional redis locking enabled ? (https://doc.owncloud.org/server/8.1/admin_manual/configuration_files/files_locking_transactional.html)

PVince81 on 8 Oct 2015

We are using OC 8.1.2 with transactional filelocking and MariaDB and are still getting this error.

michi-roth on 16 Oct 2015

Can you check the "Admin" page and under "Status" you should see a message that says that locking is enabled.

PVince81 on 16 Oct 2015

The message is visible.
Before activating the new locking we often had deadlocks in the table "file_locks" (I am not sure of the correct name, but is was the table for the file locking). After activating the new lock we only have deadlocks while updating mtime and etag in the "file_cache" table.#

Also our behaviour is, that after the message the sync client does not sync the new files. I think this is because of the unchanged etag

michi-roth on 16 Oct 2015

Hmmm, the table "file_locks" doesn't exist in 8.1.x. The app "files_locking" is something different and doesn't use the database.
Did you actually upgrade to 8.2RC2 ?

PVince81 on 16 Oct 2015

I just upgraded to OC 8.2.0 and got the same error again. The updated file is shown in the web interface but no client downloads the changes because the etag of the parentfolder is not changed.

The oc_file_locks table is still in the database and the admin page does not show the message for the activated transactional locking. But I don't think that this has something to do with the error while updating the database. What is ownCloud doing when a DB transaction fails? I would assume that ownCloud retries to change the DB but I doesn't look so.

The exact error message:

{"reqId":"DEPld\/dVX2BKKGWwRuGK","remoteAddr":"137.250.18.11","app":"index","message":"Exception: {\"Exception\":\"Doctrine\DBAL\Exception\DriverException\",\"Message\":\"An exception occurred while executing 'UPDATE oc_filecache SET mtime = ?, etag=? WHERE (mtime <> ? OR etag <> ? ) AND fileid = ? ' with params [1445422901, \"56276735aadd0\", 1445422901, \"56276735aadd0\", 2511469]:\n\nSQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction\",\"Code\":0,\"Trace\":\"#0 \/var\/www\/owncloud-8.2.0\/3rdparty\/doctrine\/dbal\/lib\/Doctrine\/DBAL\/DBALException.php(116): Doctrine\DBAL\Driver\AbstractMySQLDriver->convertException('An exception oc...', Object(Doctrine\DBAL\Driver\PDOException))\n#1 \/var\/www\/owncloud-8.2.0\/3rdparty\/doctrine\/dbal\/lib\/Doctrine\/DBAL\/Statement.php(172): Doctrine\DBAL\DBALException::driverExceptionDuringQuery(Object(Doctrine\DBAL\Driver\PDOMySql\Driver), Object(Doctrine\DBAL\Driver\PDOException), 'UPDATE oc_file...', Array)\\n#2 \\\/var\\\/www\\\/owncloud-8.2.0\\\/lib\\\/private\\\/db\\\/statementwrapper.php(73): Doctrine\\\\DBAL\\\\Statement->execute(Array)\\n#3 \\\/var\\\/www\\\/owncloud-8.2.0\\\/lib\\\/private\\\/db.php(147): OC_DB_StatementWrapper->execute(Array)\\n#4 \\\/var\\\/www\\\/owncloud-8.2.0\\\/lib\\\/private\\\/files\\\/cache\\\/cache.php(300): OC_DB::executeAudited('UPDATE*PREFIX...', Array)\n#5 \/var\/www\/owncloud-8.2.0\/lib\/private\/files\/cache\/changepropagator.php(79): OC\Files\Cache\Cache->update(2511469, Array)\n#6 \/var\/www\/owncloud-8.2.0\/lib\/private\/files\/cache\/updater.php(117): OC\Files\Cache\ChangePropagator->propagateChanges(NULL)\n#7 \/var\/www\/owncloud-8.2.0\/lib\/private\/files\/view.php(1038): OC\Files\Cache\Updater->update('\/thumbnails\/251...')\n#8 \/var\/www\/owncloud-8.2.0\/lib\/private\/files\/view.php(585): OC\Files\View->basicOperation('file_put_conten...', 'thumbnails\/2517...', Array, '\x89PNG\r\n\x1A\n\x00\x00\x00\rIHD...')\n#9 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(1048): OC\Files\View->file_put_contents('thumbnails\/2517...', '\x89PNG\r\n\x1A\n\x00\x00\x00\rIHD...')\n#10 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(894): OC\Preview->storePreview(2517738, 32, 32)\n#11 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(1162): OC\Preview->resizeAndStore(2517738)\n#12 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(770): OC\Preview->generatePreview(2517738)\n#13 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(795): OC\Preview->getPreview()\n#14 \/var\/www\/owncloud-8.2.0\/core\/ajax\/preview.php(63): OC\Preview->showPreview()\n#15 \/var\/www\/owncloud-8.2.0\/lib\/private\/route\/route.php(154) : runtime-created function(1): require_once('\/var\/www\/ownclo...')\n#16 [internal function]: __lambda_func(Array)\n#17 \/var\/www\/owncloud-8.2.0\/lib\/private\/route\/router.php(282): call_user_func('\x00lambda_83', Array)\n#18 \/var\/www\/owncloud-8.2.0\/lib\/base.php(831): OC\Route\Router->match('\/core\/preview.p...')\n#19 \/var\/www\/owncloud-8.2.0\/index.php(39): OC::handleRequest()\n#20 {main}\",\"File\":\"\/var\/www\/owncloud-8.2.0\/3rdparty\/doctrine\/dbal\/lib\/Doctrine\/DBAL\/Driver\/AbstractMySQLDriver.php\",\"Line\":115}","level":3,"time":"2015-10-21T12:21:41+02:00"}

michi-roth on 21 Oct 2015

Duplicate of #20226 and #20555 ?

nickvergessen on 17 Nov 2015

20555 refers to the `oc_file_locks` table whereas our problem occurs in the `oc_filecache` table. Can this to problems be related?

michi-roth on 17 Nov 2015

Actually with the new locking this should not happen any more.

But if you observed it on 8.2.0 where the new locking exists, then there might be an additional issue.

PVince81 on 22 Dec 2015

Hi,
i am using 8.2.2 and windows client 2.1.0

I add a file to sync folder which is a big file about 4Gb.
Firstly, application try to sync that file and give error like insufficent space. Then i change the quota and change the file name...

Then every sync operation gives

Jul 17 09:29:02 sdcoca15 ownCloud[24989]: {remote} An exception occurred while executing 'UPDATE oc_filecache SET mtime = ?, etag=? WHERE (mtime <> ? OR etag <> ? ) AND fileid = ? ' with params [1437118142, "55a8aebee9f3a", 1437118142, "55a8aebee9f3a", "11251183"]:#12#012SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

error.

i am not using redis cache, only php cache...

catborise on 1 Feb 2016

Please update to 8.2.2, it solves a few locking issues when using DB locking (like stray locks)

@catborise you seem to have a different issue, so you say it always happen when hitting insufficent space ? Maybe that code path somehow doesn't free the database table ?

PVince81 on 12 Feb 2016

Hello everybody,
we updated to 9.0RC1 on our test system and are still getting the problem.
We are using redis and mariadb.

{"reqId":"SuM+doLziYR6e8RC6p6c","remoteAddr":"xxx.xxx.xxx.xxx","app":"webdav","message":"Exception: {\"Message\":\"An exception occurred while executing 'UPDATE `oc_filecache` SET `mtime` = ?, `etag` = ?, `size`=? WHERE (`mtime` <> ? OR `etag` <> ? OR `size` <> ? OR `mtime` IS NULL OR `etag` IS NULL OR `size` IS NULL) AND `fileid` = ? ' with params [1455720965, \\\"56c48a05b0476\\\", -1, 1455720965, \\\"56c48a05b0476\\\", -1, 2334640]:

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction\",\"Exception\":\"Doctrine\DBAL\Exception\DriverException\",\"Code\":0,\"Trace\":\"#0 3rdparty/doctrine/dbal/lib/Doctrine/DBAL/DBALException.php(116): Doctrine\DBAL\Driver\AbstractMySQLDriver->convertException('An exception oc...', Object(Doctrine\DBAL\Driver\PDOException))
#1 3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Connection.php(836): Doctrine\DBAL\DBALException::driverExceptionDuringQuery(Object(Doctrine\DBAL\Driver\PDOMySql\Driver), Object(Doctrine\DBAL\Driver\PDOException), 'UPDATE `oc_file...', Array)
#2 lib/private/db/connection.php(183): Doctrine\DBAL\Connection->executeQuery('UPDATE `oc_file...', Array, Array, NULL)
#3 lib/private/files/cache/cache.php(308): OC\DB\Connection->executeQuery('UPDATE `*PREFIX...', Array)
#4 lib/private/files/cache/propagator.php(67): OC\Files\Cache\Cache->update(2334640, Array)
#5 lib/private/files/view.php(1242): OC\Files\Cache\Propagator->propagateChange('.local/share/ev...', 1455720965)
#6 lib/private/files/view.php(1337): OC\Files\View->getCacheEntry(Object(OC\Files\Storage\Wrapper\Encryption), '.local/share/ev...', '/home/.local/sh...')
#7 apps/dav/lib/connector/sabre/directory.php(225): OC\Files\View->getDirectoryContent('/home/.local/sh...')
#8 3rdparty/sabre/dav/lib/DAV/Tree.php(195): OCA\DAV\Connector\Sabre\Directory->getChildren()
#9 3rdparty/sabre/dav/lib/DAV/Server.php(857): Sabre\DAV\Tree->getChildren('home/.local/sha...')
#10 3rdparty/sabre/dav/lib/DAV/Server.php(911): Sabre\DAV\Server->addPathNodesRecursively(Array, Object(Sabre\DAV\PropFind))
#11 3rdparty/sabre/dav/lib/DAV/CorePlugin.php(336): Sabre\DAV\Server->getPropertiesForPath('home/.local/sha...', Array, 1)
#12 [internal function]: Sabre\DAV\CorePlugin->httpPropFind(Object(Sabre\HTTP\Request), Object(Sabre\HTTP\Response))
#13 3rdparty/sabre/event/lib/EventEmitterTrait.php(105): call_user_func_array(Array, Array)
#14 3rdparty/sabre/dav/lib/DAV/Server.php(459): Sabre\Event\EventEmitter->emit('method:PROPFIND', Array)
#15 3rdparty/sabre/dav/lib/DAV/Server.php(248): Sabre\DAV\Server->invokeMethod(Object(Sabre\HTTP\Request), Object(Sabre\HTTP\Response))
#16 apps/dav/appinfo/v1/webdav.php(54): Sabre\DAV\Server->exec()
#17 remote.php(137): require_once('/var/www/ownclo...')
#18 {main}\",\"File\":\"3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Driver/AbstractMySQLDriver.php\",\"Line\":115}","level":4,"time":"2016-02-17T14:56:05+00:00"}

michi-roth on 17 Feb 2016

Some cleanup background jobs can also cause the DB to be locked for too long: https://github.com/owncloud/core/issues/22243

PVince81 on 17 Feb 2016

But if you observed it on 8.2.0 where the new locking exists, then there might be an additional issue.

8.2.2: https://github.com/owncloud/enterprise/issues/1150

MorrisJobke on 11 Apr 2016

@PVince81 @icewind1991 I could have a look at this instance - what are we looking for? What information is needed to debug this?

MorrisJobke on 11 Apr 2016

@MorrisJobke not sure. It might depend for what query it happens, then need to find out why it happens in the first place.

PVince81 on 21 Apr 2016

Looking at @michi-roth's stack trace here https://github.com/owncloud/core/issues/14757#issuecomment-185243499 it looks like the file was modified outside of ownCloud, and at the time of the PROPFIND ownCloud noticed the change and tried to update the cache and propagate it.

@michi-roth did you explicitly modify files outside of ownCloud, directly in the data folder ? If not, then this might be a side-effect of another issue.
Or potentially an occurrence of a similar issue before where the exception happened after updating the file on-disk but before the cache could be updated.

PVince81 on 22 Apr 2016

So far there are only to possibly long-running transactions:

file scanner, when scanning a folder for changes: https://github.com/owncloud/core/blob/v9.0.1/lib/private/files/cache/scanner.php#L364
background scanner, mostly used when initially scanning ext storages: https://github.com/owncloud/core/blob/v9.0.1/lib/private/files/utils/scanner.php#L157
when renaming a folder, and all sub-entries must be adjusted: https://github.com/owncloud/core/blob/v9.0.1/lib/private/files/cache/cache.php#L517

Normally as I said before, the new file locking from 8.2/9.0 should be able to prevent the DB code to be reached in the first place.

Let's see whether there are some code paths that slip through the cracks and still produce this condition.

PVince81 on 22 Apr 2016

I found this fix https://github.com/owncloud/core/commit/26939a28730eb7b3417cd02bb6c35ef748cced11 but it seems that it was already in 9.0 RC1, where the report has been reported again.

PVince81 on 22 Apr 2016

@michi-roth can you check whether you have a setting "file_system_check_changes" in your config.php ? Your last stack trace seems to imply that you have explicitly set this value to 1, while the default is 0. Usually setting it to 0 would solve the issue.

PVince81 on 22 Apr 2016

For now I'll assume that you had "filesystem_check_changes" set to 1. Or maybe it was an external storage for which update detection is enabled anyway, so the situation is likely to happen there too.

Here is my first test run:

Set "filesystem_check_changes" to 1 in config.php
Create a folder "test" in the web UI
Upload about 70 files into the "test" folder
In the source code, add sleep(10) here in the loop https://github.com/owncloud/core/blob/v9.0.1/lib/private/files/cache/cache.php#L521
Rename "test" to "test2" in the web UI.
Set breakpoint in https://github.com/owncloud/core/blob/v9.0.1/lib/private/files/cache/scanner.php#L364
On the CLI with cadaver, do a "ls" on the root folder, the one containing the "test" folder

The sleep call will make sure that every entry in the folder takes 10 to be renamed in the database.
Then when doing the "ls" / PROPFIND, continued debugging until the scanner tries to scan "files/test2". At this point, observe that the lock is preventing it to enter the folder and returns the old cache data.

This test did NOT reproduce the issue and showed that locking worked properly so far.

PVince81 on 22 Apr 2016

Side note: the web UI's regular "getstoragestats.php" call will also trigger a scan while trying to computer the used space. That one also successfully was excluded thanks to the lock.

PVince81 on 22 Apr 2016

Next step: try the reverse condition: have the scanner run first and then try and rename.

PVince81 on 22 Apr 2016

So far I could not reproduce this exact issues. All my attempts at reproducing it lead to a deadlock on oc_file_lock, but not oc_filecache, which were documented here: https://github.com/owncloud/core/issues/20555#issuecomment-213423194

PVince81 on 22 Apr 2016

I'm out of ideas for this and was unable to reproduce the deadlock on oc_filecache.
With a bit of luck the fix from https://github.com/owncloud/core/issues/20555#issuecomment-214197654 might help mitigate/eliminate this case altogether.

Moving to 9.0.3 to keep in scope, and hopefully you guys will have upgraded to 9.0.2 and reported back whether the fix worked.

Thanks!

PVince81 on 25 Apr 2016

Is anyone still seeing this in 9.0.2 ?
Also make sure to have the proper transaction isolation mode: https://doc.owncloud.org/server/9.0/admin_manual/configuration_database/linux_database_configuration.html#mysql-mariadb-with-binary-logging-enabled

PVince81 on 2 Jun 2016

michi-roth on 3 Jun 2016

👍2

Verified ! No error since I changed my HAproxy configuration, thank you !

centredecalcul on 8 Jun 2016

@michi-roth thanks a lot for the hint. I raised a ticket for the documentation, maybe having it as a hint there would help other people with this issue: https://github.com/owncloud/documentation/issues/2490

PVince81 on 17 Jun 2016

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

lock[bot] on 4 Aug 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Sync didn't work with Android Caldav

rehoehle · 4Comments

Add column for execution duration in oc_jobs table

PVince81 · 4Comments

Ubuntu 16.04 packages of 8.2 depend on (unavailable) php5?

emmenlau · 5Comments

Admin user with language "en@pirate" can't access Admin page

michaelstingl · 5Comments

Refactor PHPUnit Test Suite and adjust Recommendations

patrickjahns · 4Comments

Core: Serialization failure, Deadlock found when trying to get lock

Steps to reproduce

Expected result

Actual result

Versions

Most helpful comment

All 40 comments

10711

11847

12659

12661

20555 refers to the oc_file_locks table whereas our problem occurs in the oc_filecache table. Can this to problems be related?

Related issues

20555 refers to the `oc_file_locks` table whereas our problem occurs in the `oc_filecache` table. Can this to problems be related?