Core: Serialization failure, Deadlock found when trying to get lock

Created on 9 Mar 2015  路  40Comments  路  Source: owncloud/core

Steps to reproduce

  1. Prepare a local folder "lots" with 1600+ files
  2. Setup the sync client and let it upload all the files
  3. Rename "lots" to "lots2", wait for sync
  4. Rename "lots2" to "lots3"

    Expected result

Renamed properly.

Actual result

Very strange internal server error:

{"reqId":"913558a89371e56c7248d7c0ac24f820","remoteAddr":"127.0.0.1","app":"remote","message":"An exception occurred while executing 'UPDATE `oc_filecache` SET `path` = ?, `path_hash` = ?, `name` = ?, `parent` =? WHERE `fileid` = ?' with params [\"files\\\/lots3\", \"6496454381429272daca5f1453986396\", \"lots3\", \"3\", 8]:\n\nSQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction","level":4,"time":"2015-03-09T09:45:40+00:00","method":"MOVE","url":"\/owncloud\/remote.php\/webdav\/lots2"}

And then the sync client reuploads all the files which takes a while...

Versions

ownCloud 8.0.0 / stable8 found while testing https://github.com/owncloud/core/pull/14752
mariadb-10.0.14-2.3.x86_64
php5-5.6.6-1.1.x86_64
openSUSE Factory

Not sure if this can be reproduced consistently, will try again.

That could be another argument for closeCursor.

@DeepDiver1975 @icewind1991 @MorrisJobke

Bug blue-ticket mariadb locking sev2-high

Most helpful comment

We were able to find the reason for the error in our configuration. The problem was the MariaDB Galera cluster when write requests were sent to all servers deadlock can occur. The solution was to send all write requests to a single server.
A more detailed explanation can be found under:
http://severalnines.com/blog/avoiding-deadlocks-galera-set-haproxy-single-node-writes-and-multi-node-reads

All 40 comments

Hmm I wonder if it's due to https://github.com/owncloud/core/issues/14752 which introduces a transaction because that's the branch I got the issue on.

@PVince81 Also @felixboehm mentioned that we maybe need to handle aborted transactions because of conflicts somehow.

Hmmm just now I noticed that while I renamed "lots3" to "lots4", now I have two duplicate local folders called "lots3" and "lots4".

Aborted transactions ? You mean when the sync client tries to do something but the server is still busy from the previous parallel operation ?

No aborted transaction as in: The database says "no I can't do this transaction, because there was a parallel concurrent transaction and this transaction isn't valid anymore"

I tried to reproduce the issue but this run didn't. One thing I observed though is that the sync client sends a MOVE operation for EVERY file inside the renamed folder instead of just renaming the folder. It doesn't seem to always do that though.

I'm trying to sequentially rename "lotsX" to "lotsY" and see what happens, in the hope to reproduce the issue.
There's more weird stuff happening:

{"reqId":"174cfbf66b485a8274cb2bc8071e6a21","remoteAddr":"127.0.0.1","app":"core","message":"unable to rename, file does not exists : files_versions\/lots9.v","level":3,"time":"2015-03-09T10:12:04+00:00","method":"MOVE","url":"\/owncloud\/remote.php\/webdav\/lots9"}

(notice that lots9 is a folder, not a file)

I tried several times and didn't manage to reproduce the deadlock issue.
Maybe it was a one off or race condition of some sorts... but with the nasty side effect of having the folder duplicate itself.

Maybe this has something to do with one these bugs that I previously reported?

10711

11847

12659

12661

We've trying OC 8.1.0.6 beta2 on a cloud (HaProxy on frontend apache, HaProxy on frontend MariaDB Cluster) and there is the same problem. Not due to renaming directory, so not due to #14752 cause we've the last version where it's resolved, and we caused the problem without renaming directory.

hi all, we have the same problem with OC8.0.5 and client version 1.8.4 ( build 2531 ), when we upload big files ( eg 300MB), the synchronization will abort with "Internal server error"
In the owncloud log, we can find:

Jul 17 09:29:02 sdcoca15 ownCloud[24989]: {remote} An exception occurred while executing 'UPDATE oc_filecache SET mtime = ?, etag=? WHERE (mtime <> ? OR etag <> ? ) AND fileid = ? ' with params [1437118142, "55a8aebee9f3a", 1437118142, "55a8aebee9f3a", "11251183"]:#012#012SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

Probably is the same bug?
thanks

Is anyone still seeing this with 8.1.3 with transactional redis locking enabled ? (https://doc.owncloud.org/server/8.1/admin_manual/configuration_files/files_locking_transactional.html)

We are using OC 8.1.2 with transactional filelocking and MariaDB and are still getting this error.

Can you check the "Admin" page and under "Status" you should see a message that says that locking is enabled.

The message is visible.
Before activating the new locking we often had deadlocks in the table "file_locks" (I am not sure of the correct name, but is was the table for the file locking). After activating the new lock we only have deadlocks while updating mtime and etag in the "file_cache" table.#

Also our behaviour is, that after the message the sync client does not sync the new files. I think this is because of the unchanged etag

Hmmm, the table "file_locks" doesn't exist in 8.1.x. The app "files_locking" is something different and doesn't use the database.
Did you actually upgrade to 8.2RC2 ?

I just upgraded to OC 8.2.0 and got the same error again. The updated file is shown in the web interface but no client downloads the changes because the etag of the parentfolder is not changed.

The oc_file_locks table is still in the database and the admin page does not show the message for the activated transactional locking. But I don't think that this has something to do with the error while updating the database. What is ownCloud doing when a DB transaction fails? I would assume that ownCloud retries to change the DB but I doesn't look so.

The exact error message:

{"reqId":"DEPld\/dVX2BKKGWwRuGK","remoteAddr":"137.250.18.11","app":"index","message":"Exception: {\"Exception\":\"Doctrine\DBAL\Exception\DriverException\",\"Message\":\"An exception occurred while executing 'UPDATE oc_filecache SET mtime = ?, etag=? WHERE (mtime <> ? OR etag <> ? ) AND fileid = ? ' with params [1445422901, \"56276735aadd0\", 1445422901, \"56276735aadd0\", 2511469]:\n\nSQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction\",\"Code\":0,\"Trace\":\"#0 \/var\/www\/owncloud-8.2.0\/3rdparty\/doctrine\/dbal\/lib\/Doctrine\/DBAL\/DBALException.php(116): Doctrine\DBAL\Driver\AbstractMySQLDriver->convertException('An exception oc...', Object(Doctrine\DBAL\Driver\PDOException))\n#1 \/var\/www\/owncloud-8.2.0\/3rdparty\/doctrine\/dbal\/lib\/Doctrine\/DBAL\/Statement.php(172): Doctrine\DBAL\DBALException::driverExceptionDuringQuery(Object(Doctrine\DBAL\Driver\PDOMySql\Driver), Object(Doctrine\DBAL\Driver\PDOException), 'UPDATE oc_file...', Array)\\n#2 \\\/var\\\/www\\\/owncloud-8.2.0\\\/lib\\\/private\\\/db\\\/statementwrapper.php(73): Doctrine\\\\DBAL\\\\Statement->execute(Array)\\n#3 \\\/var\\\/www\\\/owncloud-8.2.0\\\/lib\\\/private\\\/db.php(147): OC_DB_StatementWrapper->execute(Array)\\n#4 \\\/var\\\/www\\\/owncloud-8.2.0\\\/lib\\\/private\\\/files\\\/cache\\\/cache.php(300): OC_DB::executeAudited('UPDATE*PREFIX...', Array)\n#5 \/var\/www\/owncloud-8.2.0\/lib\/private\/files\/cache\/changepropagator.php(79): OC\Files\Cache\Cache->update(2511469, Array)\n#6 \/var\/www\/owncloud-8.2.0\/lib\/private\/files\/cache\/updater.php(117): OC\Files\Cache\ChangePropagator->propagateChanges(NULL)\n#7 \/var\/www\/owncloud-8.2.0\/lib\/private\/files\/view.php(1038): OC\Files\Cache\Updater->update('\/thumbnails\/251...')\n#8 \/var\/www\/owncloud-8.2.0\/lib\/private\/files\/view.php(585): OC\Files\View->basicOperation('file_put_conten...', 'thumbnails\/2517...', Array, '\x89PNG\r\n\x1A\n\x00\x00\x00\rIHD...')\n#9 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(1048): OC\Files\View->file_put_contents('thumbnails\/2517...', '\x89PNG\r\n\x1A\n\x00\x00\x00\rIHD...')\n#10 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(894): OC\Preview->storePreview(2517738, 32, 32)\n#11 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(1162): OC\Preview->resizeAndStore(2517738)\n#12 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(770): OC\Preview->generatePreview(2517738)\n#13 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(795): OC\Preview->getPreview()\n#14 \/var\/www\/owncloud-8.2.0\/core\/ajax\/preview.php(63): OC\Preview->showPreview()\n#15 \/var\/www\/owncloud-8.2.0\/lib\/private\/route\/route.php(154) : runtime-created function(1): require_once('\/var\/www\/ownclo...')\n#16 [internal function]: __lambda_func(Array)\n#17 \/var\/www\/owncloud-8.2.0\/lib\/private\/route\/router.php(282): call_user_func('\x00lambda_83', Array)\n#18 \/var\/www\/owncloud-8.2.0\/lib\/base.php(831): OC\Route\Router->match('\/core\/preview.p...')\n#19 \/var\/www\/owncloud-8.2.0\/index.php(39): OC::handleRequest()\n#20 {main}\",\"File\":\"\/var\/www\/owncloud-8.2.0\/3rdparty\/doctrine\/dbal\/lib\/Doctrine\/DBAL\/Driver\/AbstractMySQLDriver.php\",\"Line\":115}","level":3,"time":"2015-10-21T12:21:41+02:00"}

Duplicate of #20226 and #20555 ?

20555 refers to the oc_file_locks table whereas our problem occurs in the oc_filecache table. Can this to problems be related?

Actually with the new locking this should not happen any more.

But if you observed it on 8.2.0 where the new locking exists, then there might be an additional issue.

Hi,
i am using 8.2.2 and windows client 2.1.0

I add a file to sync folder which is a big file about 4Gb.
Firstly, application try to sync that file and give error like insufficent space. Then i change the quota and change the file name...

Then every sync operation gives

Jul 17 09:29:02 sdcoca15 ownCloud[24989]: {remote} An exception occurred while executing 'UPDATE oc_filecache SET mtime = ?, etag=? WHERE (mtime <> ? OR etag <> ? ) AND fileid = ? ' with params [1437118142, "55a8aebee9f3a", 1437118142, "55a8aebee9f3a", "11251183"]:#12#012SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

error.

i am not using redis cache, only php cache...

Please update to 8.2.2, it solves a few locking issues when using DB locking (like stray locks)

@catborise you seem to have a different issue, so you say it always happen when hitting insufficent space ? Maybe that code path somehow doesn't free the database table ?

Hello everybody,
we updated to 9.0RC1 on our test system and are still getting the problem.
We are using redis and mariadb.

{"reqId":"SuM+doLziYR6e8RC6p6c","remoteAddr":"xxx.xxx.xxx.xxx","app":"webdav","message":"Exception: {\"Message\":\"An exception occurred while executing 'UPDATE `oc_filecache` SET `mtime` = ?, `etag` = ?, `size`=? WHERE (`mtime` <> ? OR `etag` <> ? OR `size` <> ? OR `mtime` IS NULL OR `etag` IS NULL OR `size` IS NULL) AND `fileid` = ? ' with params [1455720965, \\\"56c48a05b0476\\\", -1, 1455720965, \\\"56c48a05b0476\\\", -1, 2334640]:

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction\",\"Exception\":\"Doctrine\DBAL\Exception\DriverException\",\"Code\":0,\"Trace\":\"#0 3rdparty/doctrine/dbal/lib/Doctrine/DBAL/DBALException.php(116): Doctrine\DBAL\Driver\AbstractMySQLDriver->convertException('An exception oc...', Object(Doctrine\DBAL\Driver\PDOException))
#1 3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Connection.php(836): Doctrine\DBAL\DBALException::driverExceptionDuringQuery(Object(Doctrine\DBAL\Driver\PDOMySql\Driver), Object(Doctrine\DBAL\Driver\PDOException), 'UPDATE `oc_file...', Array)
#2 lib/private/db/connection.php(183): Doctrine\DBAL\Connection->executeQuery('UPDATE `oc_file...', Array, Array, NULL)
#3 lib/private/files/cache/cache.php(308): OC\DB\Connection->executeQuery('UPDATE `*PREFIX...', Array)
#4 lib/private/files/cache/propagator.php(67): OC\Files\Cache\Cache->update(2334640, Array)
#5 lib/private/files/view.php(1242): OC\Files\Cache\Propagator->propagateChange('.local/share/ev...', 1455720965)
#6 lib/private/files/view.php(1337): OC\Files\View->getCacheEntry(Object(OC\Files\Storage\Wrapper\Encryption), '.local/share/ev...', '/home/.local/sh...')
#7 apps/dav/lib/connector/sabre/directory.php(225): OC\Files\View->getDirectoryContent('/home/.local/sh...')
#8 3rdparty/sabre/dav/lib/DAV/Tree.php(195): OCA\DAV\Connector\Sabre\Directory->getChildren()
#9 3rdparty/sabre/dav/lib/DAV/Server.php(857): Sabre\DAV\Tree->getChildren('home/.local/sha...')
#10 3rdparty/sabre/dav/lib/DAV/Server.php(911): Sabre\DAV\Server->addPathNodesRecursively(Array, Object(Sabre\DAV\PropFind))
#11 3rdparty/sabre/dav/lib/DAV/CorePlugin.php(336): Sabre\DAV\Server->getPropertiesForPath('home/.local/sha...', Array, 1)
#12 [internal function]: Sabre\DAV\CorePlugin->httpPropFind(Object(Sabre\HTTP\Request), Object(Sabre\HTTP\Response))
#13 3rdparty/sabre/event/lib/EventEmitterTrait.php(105): call_user_func_array(Array, Array)
#14 3rdparty/sabre/dav/lib/DAV/Server.php(459): Sabre\Event\EventEmitter->emit('method:PROPFIND', Array)
#15 3rdparty/sabre/dav/lib/DAV/Server.php(248): Sabre\DAV\Server->invokeMethod(Object(Sabre\HTTP\Request), Object(Sabre\HTTP\Response))
#16 apps/dav/appinfo/v1/webdav.php(54): Sabre\DAV\Server->exec()
#17 remote.php(137): require_once('/var/www/ownclo...')
#18 {main}\",\"File\":\"3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Driver/AbstractMySQLDriver.php\",\"Line\":115}","level":4,"time":"2016-02-17T14:56:05+00:00"}

Some cleanup background jobs can also cause the DB to be locked for too long: https://github.com/owncloud/core/issues/22243

But if you observed it on 8.2.0 where the new locking exists, then there might be an additional issue.

8.2.2: https://github.com/owncloud/enterprise/issues/1150

@PVince81 @icewind1991 I could have a look at this instance - what are we looking for? What information is needed to debug this?

@MorrisJobke not sure. It might depend for what query it happens, then need to find out why it happens in the first place.

Looking at @michi-roth's stack trace here https://github.com/owncloud/core/issues/14757#issuecomment-185243499 it looks like the file was modified outside of ownCloud, and at the time of the PROPFIND ownCloud noticed the change and tried to update the cache and propagate it.

@michi-roth did you explicitly modify files outside of ownCloud, directly in the data folder ? If not, then this might be a side-effect of another issue.
Or potentially an occurrence of a similar issue before where the exception happened after updating the file on-disk but before the cache could be updated.

So far there are only to possibly long-running transactions:

Normally as I said before, the new file locking from 8.2/9.0 should be able to prevent the DB code to be reached in the first place.

Let's see whether there are some code paths that slip through the cracks and still produce this condition.

I found this fix https://github.com/owncloud/core/commit/26939a28730eb7b3417cd02bb6c35ef748cced11 but it seems that it was already in 9.0 RC1, where the report has been reported again.

@michi-roth can you check whether you have a setting "file_system_check_changes" in your config.php ? Your last stack trace seems to imply that you have explicitly set this value to 1, while the default is 0. Usually setting it to 0 would solve the issue.

For now I'll assume that you had "filesystem_check_changes" set to 1. Or maybe it was an external storage for which update detection is enabled anyway, so the situation is likely to happen there too.

Here is my first test run:

  1. Set "filesystem_check_changes" to 1 in config.php
  2. Create a folder "test" in the web UI
  3. Upload about 70 files into the "test" folder
  4. In the source code, add sleep(10) here in the loop https://github.com/owncloud/core/blob/v9.0.1/lib/private/files/cache/cache.php#L521
  5. Rename "test" to "test2" in the web UI.
  6. Set breakpoint in https://github.com/owncloud/core/blob/v9.0.1/lib/private/files/cache/scanner.php#L364
  7. On the CLI with cadaver, do a "ls" on the root folder, the one containing the "test" folder

The sleep call will make sure that every entry in the folder takes 10 to be renamed in the database.
Then when doing the "ls" / PROPFIND, continued debugging until the scanner tries to scan "files/test2". At this point, observe that the lock is preventing it to enter the folder and returns the old cache data.

This test did NOT reproduce the issue and showed that locking worked properly so far.

Side note: the web UI's regular "getstoragestats.php" call will also trigger a scan while trying to computer the used space. That one also successfully was excluded thanks to the lock.

Next step: try the reverse condition: have the scanner run first and then try and rename.

So far I could not reproduce this exact issues. All my attempts at reproducing it lead to a deadlock on oc_file_lock, but not oc_filecache, which were documented here: https://github.com/owncloud/core/issues/20555#issuecomment-213423194

I'm out of ideas for this and was unable to reproduce the deadlock on oc_filecache.
With a bit of luck the fix from https://github.com/owncloud/core/issues/20555#issuecomment-214197654 might help mitigate/eliminate this case altogether.

Moving to 9.0.3 to keep in scope, and hopefully you guys will have upgraded to 9.0.2 and reported back whether the fix worked.

Thanks!

Is anyone still seeing this in 9.0.2 ?
Also make sure to have the proper transaction isolation mode: https://doc.owncloud.org/server/9.0/admin_manual/configuration_database/linux_database_configuration.html#mysql-mariadb-with-binary-logging-enabled

We were able to find the reason for the error in our configuration. The problem was the MariaDB Galera cluster when write requests were sent to all servers deadlock can occur. The solution was to send all write requests to a single server.
A more detailed explanation can be found under:
http://severalnines.com/blog/avoiding-deadlocks-galera-set-haproxy-single-node-writes-and-multi-node-reads

Verified ! No error since I changed my HAproxy configuration, thank you !

@michi-roth thanks a lot for the hint. I raised a ticket for the documentation, maybe having it as a hint there would help other people with this issue: https://github.com/owncloud/documentation/issues/2490

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings