Client: [Regression] Client crashes as soon as sync dir becomes unavailable or when sync dir is not available on startup

Created on 23 Sep 2017  路  31Comments  路  Source: owncloud/client

Expected behaviour

If storage becomes unavailable, the client should notice and pause sync.

Actual behaviour

The client crashes.

Steps to reproduce

  1. Sync to folder /x
  2. Unmount /x
  3. Observe ownCloud crashing

Client configuration

Client version: 2.3.3 (8242)
Operating system: macOS Sierra 10.12.6
OS language: English
Installation path of client: Default

ReadyToTest bug macOS p3-medium

Most helpful comment

@michaelstingl @SamuAlfageme If enough people report this, it might mandate a 2.3.4.. let's see.

All 31 comments

@michaelstingl @SamuAlfageme If enough people report this, it might mandate a 2.3.4.. let's see.

Reproduced & confirmed the regression. (e.g. on OS X):

$ sudo bindfs ~/ownCloud /Volumes/OC
$ diskutil unmountDisk force /Volumes/OC

I think there's some kind of delay on the crash-reporter (last update I see was 18h ago)

I am seeing similar behaviour on Windows 10 Pro x64 (more similar to #6050 actually).
When the user runs the owncloud client and doesn't have write access to the owncloud folder, the client won't start.

Actually it will crash before generating any logging. Or it will just hang. And then crash silently.

Also the crash reporter doesn't work in this case, it will crash as well. So no crash reports

Workaround is to run as administrator. Or to assign write permissions to the stack folder.

It makes sense the client won't work without write permissions, but it should inform the user about the missing permissions.

@valentijnscholten This is a little different from my problem. I don't have problem with permissions, but with non existing paths (e.g. external hard drive not mounted) causing the client to crash.

@progmem64 yes, my issue is more similar to #6050 (folder being readonly), but that is closed and points to this issue for follow up.

@SamuAlfageme @guruz I can't reproduce on linux. You can't unmount the filesystem with the sync folder while the client is open because it keeps the journal file openend. If I put it on an USB stick and plug it out without unmounting, nothing bad happens. I get Error: "unable to open database file" for "/media/kamm/E9B9-AE9E/test/._sync_dba920715f43.db" and the sync fails with something like Unable to find sync dir

@ckamm hmmm.. will try to reproduce on leenux (report in https://github.com/owncloud/client/issues/6050#issue-260077761 was over readonly-FS in Fedora and I assumed the same reason was behind it).

I'm trying the force unmount thing on macOS, and I'm getting #3046, which is not a regression (it not really fixed and still appears in our crash reports).
Force unmounting is not a common scenario, and I can't get it to crash if I unmount normally.

So the regression part could be specific to Windows.

Just saw that the bug report was created for macOS so this can't really be specific to Windows. I've been trying to reproduce it with current master though, not 2.3.3.

@jturcotte @ckamm then I was wrong to assume https://github.com/owncloud/client/issues/6050 and this one were the same. 馃槙

It's also possible that one needs a bit of luck in order to reproduce it, on any of the involved OSes. Getting the OS to unmount the disk at the right moment is quite difficult. I tried a few times on macOS and Windows, but could only get it to crash once out of my 10 tries (and it was #3046, so it's either a different issue, or it's not a regression).

Having a crash report would help figuring out what is happening. @progmem64 Could take note of the time you submit one or many of the crash reports, so that we can fish them out of the crash reporter?

On the crash reporter, i see crash because of the

        ENFORCE(_errId == SQLITE_OK, "Error when closing DB");

in SqlDatabase::close .

Maybe we should remove this ENFORCE and let the code recover. (I thin most of the code should be able to recover)
I don't know if this is the actual reason for this particular crash tough.

@ogoffart Was the stacktrace at all interesting?

Yes, it seems this could be an ASSERT instead. It doesn't fix the bug, but there's a reasonable chance of things working anyway.

https://sentry.io/owncloud/desktop-win-and-mac/issues/370119057/

QtCore in qt_message_fatal
QtCore in QMessageLogger::fatal
libocsync.2.4.0.dylib in OCC::SqlDatabase::close
libocsync.2.4.0.dylib in OCC::SyncJournalDb::sqlFail
libocsync.2.4.0.dylib in OCC::SyncJournalDb::checkConnect
libocsync.2.4.0.dylib in OCC::SyncJournalDb::getSelectiveSyncList
owncloud in OCC::AccountSettings::refreshSelectiveSyncStatus
owncloud in OCC::AccountSettings::AccountSettings
owncloud in OCC::SettingsDialogMac::accountAdded
owncloud in OCC::SettingsDialogMac::SettingsDialogMac

As I said, i am not sure it is indeed the backtrace relative to this one crash.

@ogoffart doesn't look so much like it if it's in accountAdded?
@SamuAlfageme Do you still have the verbose client log?
@progmem64 How were you unmounting? Just by clicking the eject button in Finder? Or unmount in console? Or diskutil eject?

(@guruz: accountAdded is called on startup when the settingsdialog is created)

It might be that #6129 helps with this problem. @progmem64 would you be up for testing the 2.4 nightly?

@progmem64 Yes, exactly. Thank you!

@ckamm Crashed again. Bug report submitted 10.11. 18:20:40 GMT

I don't think that something like #6129 causes this issue. I think you simply just don't check whether /path/to/syncfolder exists before perfoming a sync. Instead you directly try to read/write from/to that /path.

@progmem64 We do check whether the local folder exists in SyncEngine::startSync.

I think your report might be https://sentry.io/owncloud/desktop-win-and-mac/issues/402884797/ which points towards

SyncEngine::finalize
SyncJournalDb::close
SqlDatabase::close
_sqlite3Close
...

possibly invoked exactly from within that "does the folder exist?" check.

But the fact that we attempt to close the db means that it has to have been opened successfully before. Maybe there's a crash bug in sqlite that happens when one attempts to close a db that's no longer available?

@guruz Do you have an opinion? It seems like skipping the sqlite_close call when the db file is no longer reachable might prevent crashes. But there's a lot of uncertainty while we haven't reproduced the issue. So far it's only known to appear on osx.

@progmem64 Can you still answer the questions above? thanks

@progmem64 How were you unmounting? Just by clicking the eject button in Finder? Or unmount in console? Or diskutil eject?

Or is this for a disk not mounted on startup?

@guruz Unmounting by ejecting via finder & diskutil

Tired it with several volume types: exfat usb stick, hfs+ usb hdd, ntfs usb hdd (mounted via paragon)

Can confirm it still reproduces on 2.4.1 (build 9367)

@SamuAlfageme How did you unmount?

@guruz followed the same procedure as in https://github.com/owncloud/client/issues/6049#issuecomment-331865293

I get this on my machine with 2.4 and an idle sync:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=10, address=0x10cdf6020)
  * frame #0: 0x00007fff5a874f6d libsystem_platform.dylib`_platform_memmove$VARIANT$Haswell + 77
    frame #1: 0x00000001083b8d8e libocsync.0.dylib`walIndexTryHdr + 78
    frame #2: 0x00000001083b89b1 libocsync.0.dylib`walIndexReadHdr + 81
    frame #3: 0x00000001083bfd00 libocsync.0.dylib`sqlite3WalCheckpoint + 288
    frame #4: 0x00000001083bfa61 libocsync.0.dylib`sqlite3WalClose + 193
    frame #5: 0x00000001083b4f34 libocsync.0.dylib`sqlite3PagerClose + 180
    frame #6: 0x00000001083c89ed libocsync.0.dylib`sqlite3BtreeClose + 189
    frame #7: 0x0000000108399d6f libocsync.0.dylib`sqlite3LeaveMutexAndCloseZombie + 159
    frame #8: 0x00000001083a59c4 libocsync.0.dylib`sqlite3Close + 276
    frame #9: 0x00000001083a58a7 libocsync.0.dylib`sqlite3_close + 23
    frame #10: 0x000000010834b34b libocsync.0.dylib`OCC::SqlDatabase::close() + 59
    frame #11: 0x0000000108358038 libocsync.0.dylib`OCC::SyncJournalDb::close() + 984
    frame #12: 0x0000000107413281 libowncloudsync.0.dylib`OCC::SyncEngine::finalize(bool) + 129
    frame #13: 0x00000001074139f8 libowncloudsync.0.dylib`OCC::SyncEngine::startSync() + 1352
    frame #14: 0x0000000107453ba4 libowncloudsync.0.dylib`OCC::SyncEngine::qt_static_metacall(QObject*, Q

This looks to me the same as https://sentry.io/owncloud/desktop-win-and-mac/issues/402884797/ mentioned by @ckamm (Why is it marked as 'resolved' in sentry?)

With master branch (newer sqlite3 version) I don't get the crash so easily, I however get it while downloading a file with a different back trace:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=10, address=0x10ffdd000)
  * frame #0: 0x00007fff5a870bc1 libsystem_platform.dylib`_platform_memcmp + 33
    frame #1: 0x000000010970affe libocsync.0.dylib`sqlite3WalBeginWriteTransaction + 126
    frame #2: 0x0000000109709db2 libocsync.0.dylib`sqlite3PagerBegin + 226
    frame #3: 0x00000001096e06ae libocsync.0.dylib`sqlite3BtreeBeginTrans + 606
    frame #4: 0x0000000109718fdd libocsync.0.dylib`sqlite3VdbeExec + 16205
    frame #5: 0x00000001096e4076 libocsync.0.dylib`sqlite3Step + 518
    frame #6: 0x00000001096e3cdd libocsync.0.dylib`sqlite3_step + 125
    frame #7: 0x000000010967f3d4 libocsync.0.dylib`OCC::SqlQuery::exec() + 564
    frame #8: 0x0000000109692779 libocsync.0.dylib`OCC::SyncJournalDb::setDownloadInfo(QString const&, OCC::SyncJournalDb::DownloadInfo const&) + 921
    frame #9: 0x0000000108deb856 libowncloudsync.0.dylib`OCC::PropagateDownloadFile::slotGetFinished() + 1222
    frame #10: 0x0000000108df13dc libowncloudsync.0.dylib`QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, void (OCC::PropagateDownloadFile::*)()>::call(void (OCC::PropagateDownloadFile::*)(), OCC::PropagateDownloadFile*, void**) + 140
    frame #11: 0x0000000108df1343 libowncloudsync.0.dylib`void QtPrivate::FunctionPointer<void (OCC::PropagateDownloadFile::*)()>::call<QtPrivate::List<>, void>(void (OCC::PropagateDownloadFile::*)(), OCC::PropagateDownloadFile*, void**) + 99
    frame #12: 0x0000000108df1276 libowncloudsync.0.dylib`QtPrivate::QSlotObject<void (OCC::PropagateDownloadFile::*)(), QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) + 166
    frame #13: 0x000000010a019f7b QtCore`QMetaObject::activate(QObject*, int, int, void**) + 2347
    frame #14: 0x0000000108e61f02 libowncloudsync.0.dylib`OCC::GETFileJob::finishedSignal() + 34
    frame #15: 0x0000000108e66b6a libowncloudsync.0.dylib`OCC::GETFileJob::finished() + 154

BTW
I also get something like

05-14 20:15:40:652 [ warning default ]: QIODevice::write (QFile, "/Volumes/SC256MB/syncfolder/.owncloudsync.log"): device not open

or

05-14 20:35:13:312 [ warning sync.networkjob.get ]: Error while writing to file -1 8192 "Input/output error"

With JOURNAL_MODE=delete I manage to get a a crash only in beginning of sync when creating the tables:

05-14 20:41:33:538 [ warning sync.database ]:   ERROR committing to the database:  "cannot commit - no transaction is active"
05-14 20:41:33:538 [ debug sync.database ]  [ OCC::SyncJournalDb::startTransaction ]:   Database Transaction is running, not starting another one!
05-14 20:41:33:538 [ debug sync.database.sql ]  [ OCC::SqlQuery::exec ]:    SQL exec "CREATE INDEX IF NOT EXISTS blacklist_index ON blacklist(path collate nocase);"
05-14 20:41:33:538 [ warning sync.database.sql ]:   Sqlite exec statement error: 10 "disk I/O error" in "CREATE INDEX IF NOT EXISTS blacklist_index ON blacklist(path collate nocase);"
05-14 20:41:33:538 [ warning sync.database.sql ]:   IOERR extended errcode:  266
05-14 20:41:33:538 [ warning sync.database.sql ]:   IOERR system errno:  5
05-14 20:41:33:538 [ warning sync.database ]:   ERROR committing to the database:  "cannot commit - no transaction is active"
05-14 20:41:33:538 [ warning sync.database ]:   SQL Error "updateErrorBlacklistTableStructure: create index blacklit" "disk I/O error"
05-14 20:41:33:539 [ critical default ]:    ASSERT: "false" in file /Users/guruz/woboq/owncloud/client/mirall/src/common/syncjournaldb.cpp, line 268
05-14 20:41:33:539 [ warning sync.database ]:   Failed to update the database structure!
05-14 20:41:33:539 [ info sync.database ]:  Forcing remote re-discovery by deleting folder Etags
05-14 20:41:33:539 [ warning sync.database.sql ]:   Sqlite prepare statement error: "out of memory" in "UPDATE metadata SET md5='_invalid_' WHERE type=2;"
05-14 20:41:33:539 [ fatal default ]:   ENFORCE: "allow_failure" in file /Users/guruz/woboq/owncloud/client/mirall/src/common/ownsql.cpp, line 269 with message: SQLITE Prepare error

Then while the download it doesn't crash, it just errors 馃憤 :

05-14 20:45:23:350 [ warning sync.database.sql ]:   Sqlite exec statement error: 10 "disk I/O error" in "INSERT OR REPLACE INTO blacklist (path, lastTryEtag, lastTryModtime, retrycount, errorstring, lastTryTime, ignoreDuration, renameTarget, errorCategory) VALUES ( ?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)"
05-14 20:45:23:350 [ warning sync.database.sql ]:   IOERR extended errcode:  266
05-14 20:45:23:350 [ warning sync.database.sql ]:   IOERR system errno:  5
05-14 20:45:23:350 [ warning sync.propagator ]: Could not complete propagation of "default.lay" by OCC::PropagateDownloadFile(0x7fdeee5b9e70) with status 2 and error: "Input/output error"

I'll add a commit that checks for "/Volumes" in sync directory and sets JOURNAL_MODE=delete in code. This should help with most cases..

Hmm. It's really unfortunate we can make sqlite crash like that. Might a strategically placed file.exists() before interacting with the db help? We could check whether _dbFile exists in checkConnect, instead of relying on _db.isOpen().

I've been stressing out the client with forced unmounts both on win and macOS at different points in the sync cycle and I don't seem to be able to make it crash. Closing here as resolved 馃帀

Was this page helpful?
0 / 5 - 0 ratings