Core: race condition in insertOrUpdate queries

Created on 22 Feb 2016  路  28Comments  路  Source: owncloud/core

race condition in insertOrUpdate queries (completely reworked in 9.0 - try to curricumvent them in stable8.1/2 code too) - the reason for this is that those queries are technically two queries (there is a subquery) and if a parallel insert happens after the subquery is executed this could happen

error message:

{"Message":"An exception occurred while executing 'INSERT INTO "oc_preferences" ("configvalue","userid","appid","configkey") SELECT ?,?,?,? FROM "oc_preferences" WHERE "configkey" = ? AND "userid" = ? AND "appid" = ? HAVING COUNT(*) = 0' with params [1455358499.591, "[email protected]", "files_sharing", "last_propagate", "last_propagate", "[email protected]", "files_sharing"]:

SQLSTATE[23505]: Unique violation: 7 ERROR:  duplicate key value violates unique constraint "oc_preferences_pkey"
DETAIL:  Key (userid, appid, configkey)=([email protected], files_sharing, last_propagate) already exists.","Code":0,"Trace":"
#0 3rdparty/doctrine/dbal/lib/Doctrine/DBAL/DBALException.php(116): Doctrine\DBAL\Driver\AbstractPostgreSQLDriver->convertException('An exception oc...', Object(Doctrine\DBAL\Driver\PDOException))
#1 3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Connection.php(996): Doctrine\DBAL\DBALException::driverExceptionDuringQuery(Object(Doctrine\DBAL\Driver\PDOPgSql\Driver), Object(Doctrine\DBAL\Driver\PDOException), 'INSERT INTO "oc...', Array)
#2 lib/private/db/connection.php(151): Doctrine\DBAL\Connection->executeUpdate('INSERT INTO "oc...', Array, Array)
#3 lib/private/db/adapter.php(93): OC\DB\Connection->executeUpdate('INSERT INTO `*P...', Array)
#4 lib/private/db/connection.php(190): OC\DB\Adapter->insertIfNotExist('*PREFIX*prefere...', Array, Array)
#5 lib/private/allconfig.php(217): OC\DB\Connection->insertIfNotExist('*PREFIX*prefere...', Array, Array)
#6 apps/files_sharing/lib/propagation/recipientpropagator.php(83): OC\AllConfig->setUserValue('[email protected]', 'files_sharing', 'last_propagate', 1455358499.591)
#7 apps/files_sharing/lib/mountprovider.php(64): OCA\Files_Sharing\Propagation\RecipientPropagator->propagateDirtyMountPoints(Array)
#8 lib/private/files/config/mountprovidercollection.php(61): OCA\Files_Sharing\MountProvider->getMountsForUser(Object(OC\User\User), Object(OC\Files\Storage\StorageFactory))
#9 [internal function]: OC\Files\Config\MountProviderCollection->OC\Files\Config\{closure}(Object(OCA\Files_Sharing\MountProvider))
#10 lib/private/files/config/mountprovidercollection.php(62): array_map(Object(Closure), Array)
#11 lib/private/files/filesystem.php(425): OC\Files\Config\MountProviderCollection->getMountsForUser(Object(OC\User\User))
#12 apps/files_sharing/lib/cache.php(67): OC\Files\Filesystem::initMountPoints('[email protected]')
#13 apps/files_sharing/lib/cache.php(104): OC\Files\Cache\Shared_Cache->getSourceCache('')
#14 lib/private/files/view.php(1215): OC\Files\Cache\Shared_Cache->get('')
#15 apps/files/appinfo/remote.php(59): OC\Files\View->getFileInfo('')
#16 [internal function]: {closure}(Object(Sabre\HTTP\Request), Object(Sabre\HTTP\Response))
#17 3rdparty/sabre/event/lib/EventEmitterTrait.php(105): call_user_func_array(Object(Closure), Array)
#18 3rdparty/sabre/dav/lib/DAV/Server.php(456): Sabre\Event\EventEmitter->emit('beforeMethod', Array)
#19 3rdparty/sabre/dav/lib/DAV/Server.php(254): Sabre\DAV\Server->invokeMethod(Object(Sabre\HTTP\Request), Object(Sabre\HTTP\Response))
#20 apps/files/appinfo/remote.php(83): Sabre\DAV\Server->exec()
#21 remote.php(132): require_once('/var/www/ownclo...')
#22 {main}","File":"3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Driver/AbstractPostgreSQLDriver.php","Line":63}

and:

error while running hook (OCA\Files_Versions\Hooks::write_hook): An exception occurred while executing 'INSERT INTO "oc_filecache" ("mimepart","mimetype","mtime","size","etag","storage_mtime","permissions","parent","path_hash","path","name","storage") SELECT ?,?,?,?,?,?,?,?,?,?,?,? FROM "oc_filecache" WHERE "storage" = ? AND "path_hash" = ? HAVING COUNT(*) = 0' with params [1, 2, 1455369346, -1, "56bf2c8286739", 1455369346, 31, 92362664, "6b25e3ed6d9bb265731382b35b1d54ff", "files_versions\/data\/scripts\/dtc", "dtc", 17184, 17184, "6b25e3ed6d9bb265731382b35b1d54ff"]:

SQLSTATE[23505]: Unique violation: 7 ERROR:  duplicate key value violates unique constraint "fs_storage_path_hash"
DETAIL:  Key (storage, path_hash)=(17184, 6b25e3ed6d9bb265731382b35b1d54ff) already exists.

Since we implemented a non standard insertIfNotExists we cannot fix this by db specific SQL extensions. While the above use cases are fixed for OC 9 because the code for the propagation has changed other code paths using the current insertIfNotExists may run into this race condition.

Depending on the not exists parameter this race condition might throw the uniqe index violation error. Adding the primary key to the conditions is not always possible, would change the semantics of the query and we could still run into the race condition. It might be possible to use DB specific SQL extensions to ignore the error. Or we just catch the Exception and log it as a warning?

cc @MorrisJobke, @DeepDiver1975, @nickvergessen

Bug core p3-medium statuSTALE technical debt

All 28 comments

cc @icewind1991

Or we just catch the Exception and log it as a warning?

How about this ? ^

But then, what if two processes are trying to write two different values there instead of the same one ? Which one is allowed to legimitately fail ?

@butonic @icewind1991 any update ?

Yeah, had a try with this today: https://github.com/owncloud/core/pull/25512

likely still an issue

I am not sure, if it helps, but in our 9.0.4 installation we see some hundred occurrences of messages containing insertIfNotExist() like the following:

{"reqId":"V@I@CXeneXPr4CZBK2LajwAAAIc","remoteAddr":"some_ip_address","app":"index","message":"Exception: {"Exception":"RuntimeException","Message":"File entry could not be inserted with insertIfNotExist() but could also not be selected with getId() in order to perform an update. Please try again.","Code":0,"Trace":"
#0 /var/www/html/uni-ms/owncloud_9.0.4/apps/workflow/autotagging/cachewrapper.php(149): OC\Files\Cache\Cache->insert('thumbnails/4565...', Array)
#1 /var/www/html/uni-ms/owncloud_9.0.4/lib/private/files/cache/wrapper/cachewrapper.php(106): OCA\Workflow\AutoTagging\CacheWrapper->insert('thumbnails/4565...', Array)
#2 /var/www/html/uni-ms/owncloud_9.0.4/lib/private/files/cache/scanner.php(264): OC\Files\Cache\Wrapper\CacheWrapper->put('thumbnails/4565...', Array)
#3 /var/www/html/uni-ms/owncloud_9.0.4/lib/private/files/cache/scanner.php(204): OC\Files\Cache\Scanner->addToCache('thumbnails/4565...', Array, -1)
#4 /var/www/html/uni-ms/owncloud_9.0.4/lib/private/files/cache/scanner.php(307): OC\Files\Cache\Scanner->scanFile('thumbnails/4565...', 3, -1, NULL, true)
#5 /var/www/html/uni-ms/owncloud_9.0.4/lib/private/files/view.php(1274): OC\Files\Cache\Scanner->scan('thumbnails/4565...', false)
#6 /var/www/html/uni-ms/owncloud_9.0.4/lib/private/files/view.php(1375): OC\Files\View->getCacheEntry(Object(OCA\Workflow\AutoTagging\StorageWrapper), 'thumbnails/4565...', 'thumbnails/4565...')
#7 /var/www/html/uni-ms/owncloud_9.0.4/lib/private/preview.php(1192): OC\Files\View->getDirectoryContent('thumbnails/4565...')
#8 /var/www/html/uni-ms/owncloud_9.0.4/lib/private/preview.php(1155): OC\Preview->storeMaxPreview('thumbnails/4565...')
#9 /var/www/html/uni-ms/owncloud_9.0.4/lib/private/preview.php(771): OC\Preview->generatePreview(456562545)
#10 /var/www/html/uni-ms/owncloud_9.0.4/lib/private/preview.php(796): OC\Preview->getPreview()
#11 /var/www/html/uni-ms/owncloud_9.0.4/core/ajax/preview.php(63): OC\Preview->showPreview()
#12 /var/www/html/uni-ms/owncloud_9.0.4/lib/private/route/route.php(154) : runtime-created function(1): require_once('/var/www/html/u...')
#13 [internal function]: __lambda_func(Array)
#14 /var/www/html/uni-ms/owncloud_9.0.4/lib/private/route/router.php(276): call_user_func('\x00lambda_1875', Array)
#15 /var/www/html/uni-ms/owncloud_9.0.4/lib/base.php(907): OC\Route\Router->match('/core/preview.p...')
#16 /var/www/html/uni-ms/owncloud_9.0.4/index.php(39): OC::handleRequest()
#17 {main}","File":"/var/www/html/uni-ms/owncloud_9.0.4/lib/private/files/cache/cache.php","Line":275}","level":3,"time":"2016-09-21T10:00:09+02:00","method":"GET","url":"/index.php/core/preview.png?file=some_filename&c=1931007350&x=32&y=32&forceIcon=0","user":"some_username"}

Please tell me, if I can provide you more information.

for the preferences case, it seems to happen here:
https://github.com/owncloud/core/blob/v9.1.1/lib/private/DB/Connection.php#L273

at least for the case reported by @karakayasemi which writes LDAP's "lastFeatureRefresh" quite often.

okay, seems that since 9.1.1 the code path is different and setUserValue goes to DBConnection::setValues https://github.com/owncloud/core/blob/v9.1.1/lib/private/AllConfig.php#L223. That one will first attempt an INSERT, and catch the exception. In case of exception it does an UPDATE.

Previous versions might not do that.

Anyway, this means that for oc_preferences the problem would be solved that way, but `insertOrUpdate' might still have problems.

Cross-posting the error from newer oC versions (from #27369 and #28461) so this open issue here will pop up in the search when searching for that error:

Exception: {"Message":"An exception occurred while executing 'UPDATE `oc_filecache` SET `storage` = ?, `path` = ?, `path_hash` = ?, `name` = ?, `parent` =? WHERE `fileid` = ?' with params [\"1\", \"files_encryption\\\/keys\\\/files_trashbin\\\/files\\\/<FILENAME>.d1489350583\", \"21c74ade176a765731b6b68b697b3859\", \"<FILENAME>.d1489350583\", 1274, 925]:\n\nSQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1-21c74ade176a765731b6b68b697b3859' for key 'fs_storage_path_hash'",

Cross-posting from #28512 as suggested by @kdslkdsaldsal:

The race condition can be reliably triggered by doing something like this on a synced folder on the client:

> for x in {0..99}; do mkdir $x; echo $x > $x/test.txt; done
# wait for changes to be synced
> for x in {0..99}; do rm $x/test.txt; done

Since the sync client uses multiple threads, different test.txt files will be attempted to be moved in parallel to the single trashbin folder on the server, eventually trigggering the duplicate key error in oc_filecache.

The trashbin case is a legitimate bug that we can fix by preventing to use the same name.

@mrow4a Do you want to write a smashbox test triggering this? https://github.com/owncloud/core/issues/22563#issuecomment-318592231

@PVince81 (the second backtrace in the original report above says (OCA\Files_Versions\Hooks::write_hook) not trashbin?)
So I think this is something that needs to be generally fixed in insertOrUpdate
CC @dragotin

@guruz will have a look in the issue.

Isnt the original issue with oc_preferences solved by https://github.com/owncloud/core/pull/27576 ? @PVince81

@mrow4a no. The oc_preferences was not using insertIfNotExists and since that PR it now uses it. And if there is a race condition in insertIfNotExists then the preferences thing will likely also suffer from it at some point in random situations.

While I can reproduce issue with trashbin ( @PVince81 what is the priority of fixing this? ), the original issue remains mistery to me

Originally found when googling a reason for https://github.com/owncloud/diagnostics/pull/18#issuecomment-321815568

First: some food for thought: https://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/

AFAICT it seems to make most sense for oc to implement UPDATE, then INSERT, wrapped in a transaction but capable of retrying on an already exists. Similar to https://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE but in php ... should work for all dbs. Why did I study computer science again?

Also IMO we need to get rid of the very long running transaction for filescans. The interfere with the transaction that we need around this UPSERT because doctrine will only count the number of transactions and do a single commit at the end. Locking far too many rows in the process, which raises the chance of a deadlock in Galera cluster situation ... but we need to verify that first. While my UPSERT opinion is mature this is just a gut feeling for now. One piece of :hankey: after another ...

n8

Problem with trashbin just poped up in our smashbox CI, the same as in here https://github.com/owncloud/core/issues/23151 -> https://jenkins.owncloud.com/view/smashbox/job/smashbox-daily-master-basicSync/10/client=2.3/

there's a PR here https://github.com/owncloud/core/pull/28698 but it has troubles with Oracle and also will need more testing before being put into production.

Will likely move it to 10.0.4 to leave more time for testing (and have a 10.0.4beta early for that)

Yup, OK with me. Plus we can also review where else we can replace insertIfNotExists with upsert.

10.0.6 will come with upsert for filecache queries.

The quest is likely not over yet, need to port the other queries to use upsert.

setting to p2 as the upsert is supposed to fix a lot of things

where is the UPSERT PR? It is breaking the sqlite I guess also :> (worth noting here). But definitely upsert should fix a lot of things, not mentioning other things which are not available to sqlite.

@guruz I think I once tried it in some PR, and it did not work on sqlite properly.

moving to "triage" for rescheduling

Was this page helpful?
0 / 5 - 0 ratings

Related issues

laurivosandi picture laurivosandi  路  65Comments

michaelstingl picture michaelstingl  路  64Comments

sonanchenko picture sonanchenko  路  81Comments

dataflake picture dataflake  路  113Comments

razyr picture razyr  路  63Comments