Client: infinite 'Connection timed out' errors during large upload

Created on 5 Sep 2019  路  14Comments  路  Source: owncloud/client

Client 2.6.0rc1 with server 10.3.alpha

Empty server, client exited.
rm -rf syncfolder; mkdir syncfolder
rm -rf ~/.config/ownCloud\ Testpilot\ Edition

This is a very deep structure, 500000 files in ca 100 subdirectory levels:
mv randfile_500k_105 syncfolder/
-> start client using this syncfolder, connect to server

client grows to 3.158 GB rss during discovery
#### Discovery end #### 211855 ms
Connection timed out

This procedure repeats ca. every 5 Minutes. Nothing appears on the server.
Typical messages from the client log are like this:

09-04 18:10:43:572 [ info sync.engine ]:        #### Discovery end ####################################################  134185 ms
09-04 18:11:03:895 [ info sync.engine ]:        #### Post-Reconcile end ####################################################  154508 ms
09-04 18:11:04:031 [ info gui.folder ]: #### Propagation start ####################################################
09-04 18:11:04:031 [ info gui.application ]:    Sync state changed for folder  "http://localhost/owncloud/remote.php/dav/files/admin/" :  "Sync Running"
09-04 18:11:04:056 [ info sync.networkjob.propfind ]:   PROPFIND of QUrl("http://localhost/owncloud/remote.php/dav/files/admin/") FINISHED WITH STATUS "OperationCanceledError Connection timed out"
09-04 18:11:04:065 [ info gui.account.state ]:  AccountState connection status change:  OCC::ConnectionValidator::Connected -> OCC::ConnectionValidator::Timeout
09-04 18:11:04:066 [ info gui.account.state ]:  AccountState state change:  "Connected" -> "Network error"

Setting kernel TCP timeouts higher, or setting server php timeouts higher does not change anything.
A local discovery that takes 90 seconds or more is very likely to result in a Timeout error.

Also: During such a long local discovery, the client gui is unresponsive. It may take a minute for the tray icon to respond, also for the settings window to switch tab.

bug sev2-high

Most helpful comment

We can reproduce it and are working on a fix.

@michaelstingl @jnweiger @HanaGemela This means rc1 is not ready for general consumption, we will need another rc.

All 14 comments

Workaround:
Compile with the attached crude hack.
Now upload starts immediately after the first discovery. No errors in the log anymore.
extended_timeouts_jw.patch.zip

EDIT: https://gist.github.com/guruz/87f763b64f0038336ac39513a1623324

Tested on windows 10:

  • the unresponsive gui is shown as "(Not responding)" during local discovery.
    image

@ogoffart @ckamm Can it be that https://github.com/owncloud/client/issues/7445 stalls the main thread completely and the timeout timer for the ConnectionValidator was already started and then triggers a timeout or so?
(independant of if the QNAM scheduled the job already or not)

@jnweiger You could try the PR mentioned in https://github.com/owncloud/client/issues/7445#issuecomment-528812780

This is a regression. Testing the same with 2.5.4 on windows10 works fine, given enough RAM.

Tested on Windows with 2.6.1rc1, the behaviour is similar, except that the reported memory consumption is much less. Size rises to 300 ... 500 MB, GUI is mostly frozen, then we get hit by connection timed out and the discovery restarts.

Tested on Windows with 2.5.4, Size rises to similar 300.. 500 MB, the GUI remains reactive,
then upload starts without errors.

With a large file tree of 1.5 Mio files, 2.5.4 reaches a peak of 1.5 GB Memory, then drops back to 1.0 GB and upload starts. (If that much RAM is available).

If RAM is limited, and we get an OOM fault, owncloud disappears without notice. The Windows event log has an 'Application Error' in C:\Windows\System32\ucrtbase.dll

GUI is mostly frozen,

Are you saying it is mostly frozen all the time during discovery? Even like in the first minutes of discovery?

With 2.5.4 the GUI is frozen much later than with 2.6.1 --

The GUI of 2.6.1 becomes very sluggish the moment the discovery starts. It is possible to switch tabs or click the 3 dots menu within the first few minutes, but it seems to get harder every minute.

The moment the discovery is over, and upload begins, the GUI is fine again.

We can reproduce it and are working on a fix.

@michaelstingl @jnweiger @HanaGemela This means rc1 is not ready for general consumption, we will need another rc.

we will need another rc.

Okay 馃憤

This should be much better with current 2.6 daily.

https://github.com/owncloud/client/pull/7462#issuecomment-529927958 is no longer reproducable with 2.6daily20190917

100k Win10

Wall clock timings on my win10 VM:

  • 2.6daily20190917

    • initial sync with ca 10 files, 25 MB: 2 seconds.

    • move in a folder with 100k files after initial sync: 42 seconds discovery (90MB RAM), then upload starts.

    • pause sync after uploading ca 1k files out of 100k: 43 seconds UI freeze. Then tabs can be switched.

  • 2.5.6 stable

    • initial sync with ca 10 files, 25 MB: 4 seconds.

    • move in a folder with 100k files after initial sync: 44 seconds discovery (97MB RAM), then upload starts.

    • pause sync after uploading ca 1k files out of 100k: 43 seconds UI freeze (106 MB RAM), then tabs can be switched.

-> no regression anymore with 100k files.

Hmm. Is this because I did not hit the new 57 seconds timeout with this test?

500k Win10

Wall clock timings on my win10 VM using a bigger tree, to exceed an internal 57s timeout.

  • 2.6daily20190917

    • initial sync with ca 10 files, 25 MB: 2 seconds.

    • move in a folder with 500k files after initial sync: 122 seconds discovery (340MB RAM), then upload starts.

    • pause sync after uploading ca 1k files out of 500k: 177 seconds UI freeze (100%CPU, 340MB RAM). Then tabs can be switched (0% CPU, 40MB RAM).

  • 2.5.6 stable

    • initial sync with ca 10 files, 25 MB: 4 seconds.

    • move in a folder with 500k files after initial sync: 135 seconds discovery (480MB RAM), then upload starts.

    • pause sync after uploading ca 1k files out of 500k: 180 seconds UI freeze (380 MB RAM), then tabs can be switched (0% CPU, 52 MB RAM).

-> no regression any more with 500k files.
Congratulations! Fixed!

Client 2.6.0daily20190917 on Ubuntu 18.04 with server 10.3.alpha

500k Ubuntu18.04

Empty server, client exited.
rm -rf syncfolder; mkdir syncfolder
rm -rf ~/.config/ownCloud\ Testpilot\ Edition

This is a very deep structure, 500000 files in 104 subdirectory levels:
mv randfile_500k_200 syncfolder/
-> start client using this syncfolder, connect to server

client grows to 3.0 GB rss during discovery
Discovery ends after 195 sec, then upload starts.

-> Fixed also for Linux, not only for Win10 :-)

AFAIk on Linux we compile with 64bit so memory usage should be higher than on Windows (32bit).
I just wonder why it is so much higher on Linux than on Windows!?
Maybe differences in measurement.
CC @ogoffart

Was this page helpful?
0 / 5 - 0 ratings