Desktop: Automatically recover from timeouts (QNetworkReply::OperationCanceledError)

Created on 2 Nov 2020  路  11Comments  路  Source: nextcloud/desktop

Situation

The NextCloud sync client is installed on a Windows client, running mostly 24/7 connecting to a server over an unstable network.
Sometimes the connection drops, however NextCloud enters a failed state which requires manual intervention, which is very unhandy, considering the fix is only to close and reopen the sync client.

Logs

The (anonymized) log shows the following timeout error:

2020-11-01 18:22:41:791 [ warning nextcloud.sync.networkjob ]:  QNetworkReply::OperationCanceledError "Zeit眉berschreitung bei der Verbindung" QVariant(Invalid)
2020-11-01 18:22:41:791 [ warning nextcloud.sync.credentials.webflow ]: QNetworkReply::OperationCanceledError
2020-11-01 18:22:41:791 [ warning nextcloud.sync.credentials.webflow ]: "Operation abgebrochen"
2020-11-01 18:22:41:791 [ info nextcloud.sync.networkjob.lscol ]:       LSCOL of QUrl("https://cloud.xxxxxx.xx/remote.php/dav/files/company/FolderNameABCDE") FINISHED WITH STATUS "OperationCanceledError Zeit眉berschreitung bei der Verbindung"
2020-11-01 18:22:41:791 [ warning nextcloud.sync.discovery ]:   LSCOL job error "Operation abgebrochen" 0 QNetworkReply::OperationCanceledError
2020-11-01 18:22:41:791 [ debug nextcloud.sync.discovery ]      [ OCC::DiscoveryMainThread::singleDirectoryJobFinishedWithErrorSlot ]:  5 "Operation abgebrochen"
2020-11-01 18:22:41:791 [ debug nextcloud.sync.networkjob ]     [ OCC::AbstractNetworkJob::slotFinished ]:      Network job OCC::LsColJob finished for "/FolderNameABCDE"
2020-11-01 18:22:41:791 [ debug nextcloud.sync.discovery ]      [ OCC::DiscoveryJob::remote_vio_opendir_hook ]: OCC::DiscoveryJob(0x263cdd7270)  ...Returned from main thread
2020-11-01 18:22:41:791 [ debug nextcloud.sync.discovery ]      [ OCC::DiscoveryJob::remote_vio_opendir_hook ]: 5 when opening  msg= "Operation abgebrochen"
2020-11-01 18:22:41:791 [ warning nextcloud.sync.csync.updater ]:       opendir failed for  - errno 5
2020-11-01 18:22:41:791 [ warning nextcloud.sync.engine ]:      ERROR during  csync_update :  "Beim 脰ffnen eines Ordners ist ein Fehler aufgetreten. Operation abgebrochen"
2020-11-01 18:22:41:791 [ warning nextcloud.gui.activity ]:     Item  "D:\\daten\\company_bk\\files\\files_folder"  retrieved resulted in  "Beim 脰ffnen eines Ordners ist ein Fehler aufgetreten. Operation abgebrochen"
2020-11-01 18:22:41:791 [ info nextcloud.gui.activity ]:        Error successfully added to the notification list:  "Beim 脰ffnen eines Ordners ist ein Fehler aufgetreten. Operation abgebrochen"
2020-11-01 18:22:41:791 [ info nextcloud.sync.database ]:       Closing DB "C:/Users/Administrator/AppData/Roaming/Nextcloud/._sync_574955ced5da.db-shm"
2020-11-01 18:22:41:791 [ debug nextcloud.sync.database ]       [ OCC::SyncJournalDb::commitTransaction ]:      No database Transaction to commit
2020-11-01 18:22:41:791 [ info nextcloud.sync.engine ]: CSync run took  7200002 ms
2020-11-01 18:22:41:791 [ info nextcloud.gui.folder ]:  Client version 3.0.2stable-Win64 (build 20200924)  Qt 5.12.9  SSL  OpenSSL 1.1.1g  21 Apr 2020
2020-11-01 18:22:41:791 [ warning nextcloud.gui.folder ]:       SyncEngine finished with ERROR
2020-11-01 18:22:41:791 [ info nextcloud.gui.folder ]:  Folder sync result:  1
2020-11-01 18:22:41:791 [ info nextcloud.gui.folder ]:  the last 8 syncs failed
2020-11-01 18:22:41:791 [ debug nextcloud.gui.folder ]  [ OCC::Folder::slotSyncFinished ]:      Sync failed, keeping last sync's local discovery path list
2020-11-01 18:22:41:791 [ info nextcloud.gui.application ]:     Sync state changed for folder  "https://cloud.xxxxxx.xx/remote.php/dav/files/company/FolderNameABCDE" :  "Error"

Steps to reproduce

  1. Add one synced folder to NextCloud
  2. Let it sync for a while
  3. Let the connection fail for a short amount of time
  4. Client reports "Operation cancelled: An error occured while opening a folder"
  5. Right-clicking and "Enforce sync now" does not seem to work either, even if the connection is back. Need to restart the sync client.

This occurs even there are ridiculously high connection timeout values set.

Client configuration

Client version: Client version 3.0.2stable-Win64 (build 20200924) Qt 5.12.9 SSL OpenSSL 1.1.1g 21 Apr 2020

Operating system: Windows Server 2008

OS language: German

Installation path of client: Default path

bug feature sync engine

All 11 comments

Probably relevant to this:

A GET timeout should possibly only be considered a SoftError?

https://github.com/nextcloud/desktop/blob/6445a61d807974be2148ef424b3efd9ba291fe3e/src/libsync/propagatedownload.cpp#L326

Would it be required to set propagator()->_anotherSyncNeeded = true in such cases?

https://github.com/nextcloud/desktop/blob/6445a61d807974be2148ef424b3efd9ba291fe3e/src/libsync/propagatedownload.cpp#L614

I'd say going to SoftError would be dangerous (it might be assumed down the line that the file is in a "better" state than it is). Maybe reclassifying as a NormalError would be doable but that'd need to be heavily tested.

Obviously I'd rather we find a proper fix to avoid those timeouts in the first place. They're supposed to be transient situation that's why the FatalError is a safe bet: when that happens it's likely everything else asked to the server will fail as well with further timeouts. Thus continuing the sync is generally not going to help get the server load down for instance.

If that's something happening often, it's better to fix the root cause which can be:

  • badly configured server or under powered server
  • a bug in some server component
  • something done by the client triggering a high load

For the last one there's obviously something at play around E2EE right now, spotted in 3.0.2, tried to workaround it for 3.0.3. I'm in discussion with the E2EE app maintainer to try to move this forward.

Obviously I'd rather we find a proper fix to avoid those timeouts in the first place.

Absolutely! I know the connectivity is not the best and I'm not blaming NextCloud for it, there is actually a different bunch of factors to this.

Thus continuing the sync is generally not going to help get the server load down for instance.

I agree constant retrying might not be a good approach, however I think a retry every few minutes would not hurt.
Going to the FatalError state and waiting for a person to manually restart sync is not really reducing the workload either :grimacing:

I assume if the status is set to NormalError, the normal sync process will still be triggered? (i.e. 30s later?)

I'm in discussion with the E2EE app maintainer to try to move this forward.

Thanks a lot!

I assume if the status is set to NormalError, the normal sync process will still be triggered? (i.e. 30s later?)

Got to admit I'm not 100% sure but that'd be my expectation though. It might ignore the file for a little while as well (it's generally not long though, for the next sync or so).

Thank you.

I'd love to try this (i.e. insert some debug logs for me to understand the error handling and classify such errors as NormalError) but I was struggling setting up the entire Windows Build Environment. In case that would work out eventually, I'll report my insights back to you.

I've been working on a check list for windows builds. Paint is still fresh that's why I didn't commit it yet. If you want to try that alternative route and let me know how this goes for you here it is (in your context the dokany step is optional):

We're assuming the following to be installed before this procedure starts:

  • Windows 10 SDK
  • Visual Studio 2019

Steps:

  • In the Visual Studio installer pick the 'desktop development with c++'
    workload
  • Install Qt: https://download.qt.io/official_releases/online_installers/

    • Select the latest LTS release, 64bit MSVC 2017 variant

    • Select QtWebEngine

    • Select source and debug symbols

  • Install Dokany: https://github.com/dokan-dev/dokany/releases
    (use a Dokan_x64.msi, check all the development options)
  • Install chocolatey: https://chocolatey.org/install
  • In an admin cmd:

    • choco install -y git.install openssl cmake ninja

    • RefreshEnv

  • Open a git cmd then:

    • C:\Qt\5.12.9\msvc2017_64\bin\qtenv2.bat

    • "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64

    • git clone https://github.com/madler/zlib

    • mkdir zlib\build

    • cd zlib\build

    • cmake -GNinja -DCMAKE_INSTALL_PREFIX=../../root ..

    • ninja

    • ninja install

    • cd ..\..

    • git clone https://github.com/frankosterfeld/qtkeychain

    • mkdir qtkeychain\build

    • cd qtkeychain\build

    • cmake -GNinja -DCMAKE_INSTALL_PREFIX=C:\Qt\5.12.9\msvc2017_64 ..

    • ninja

    • ninja install

    • cd ..\..

  • Open a new admin cmd then:

    • setx PATH "%PATH%;pathto\root"

    • RefreshEnv

Thank you! I'll give this a shot as soon as I find some time and a clean Windows VM.

(When looking for the build process used I saw the appveyor.yml files but I guess they are outdated now, at least non-functional at the moment?)

Yes... I'd love to see someone revive the appveyor so that we got Windows coverage. Didn't find time for it yet.

So far things went smooth, but what root are you referring to?

* Open a new admin cmd then:

  * setx PATH "%PATH%;path\to\root"

So far things went smooth, but what root are you referring to?

The root folder in which zlib got installed. See a few steps above. Not necessarily nicely named... I'll change it.

Ah whoops, I totally missed this when copying the other instructions.

I assume we add the bin folder inside the root folder to the PATH, not the root folder itself?

Edit:
Wrong assumption, it's only used later on to find zlib development files, I figured

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jospoortvliet picture jospoortvliet  路  4Comments

despens picture despens  路  3Comments

rguenther-dz picture rguenther-dz  路  3Comments

Engineer-of-Stuff picture Engineer-of-Stuff  路  3Comments

TP75 picture TP75  路  3Comments