Client: Resuming a paused sync that was uploading a bunch of files and were moved in the client is duplicating one file in both locations

Created on 10 Aug 2017  路  16Comments  路  Source: owncloud/client

Expected behavior

The reconcile phase should match the server and local contents properly and move the last file the client uploaded to the server.

Actual behavior

That last file is being downloaded from the server:

screenshot_2017-08-10_10_14_56

I think it has to do with pausing the sync either:

  • ...before receiving the upload ACK reply from the server with the ETAG (we should wait for it in this case)
  • ...and not storing the last ETAG properly on the sync journal

Steps to reproduce

  1. Create a bunch of large files (I did generate_n_files.sh 130 file_ 15M: 130 files , 15MB size with the file_ prefix)
  2. Pause the sync; you'll be prompted with the 'sync will be terminated' confirmation dialog
  3. Move all the files to a new location (mkdir filezz && mv file_* filezz)
  4. Resume the sync

screenshot_2017-08-10_10_22_07

Server configuration

  • Operating system: ubuntu
  • Web server: Apache/2.4.18 (Ubuntu)
  • Database: 5.7.17-0ubuntu0.16.04.1 - (Ubuntu)
  • PHP version: 7.0.15-0ubuntu0.16.04.4
  • ownCloud version: 10.0.2

Client configuration

Client version: 2.3.3rc1

Logs

  1. Client logfile: I pasted just the related entries (i.e. grep file_114 in this case) - https://gist.github.com/SamuAlfageme/b9e6a34e7fcd3963c7bb6a353995668f

I also have the recorded mitmproxy session recorded, will upload in a minute (after removing sensitive info)

cc/ @ogoffart @guruz @mrow4a

ReadyToTest bug

All 16 comments

Here is what happens:
The PUT is cancelled as we already sent data, but we did not receive the answer yet with the new etag, so we don't write it in the DB.

The solution could be to change what "pause sync" is doing. We should not blindly abort all requests. We should wait for the PUT to finish (possibly with a lower timeout)

@ogoffart :+1:

We should not blindly abort all requests. We should wait for the PUT to finish (possibly with a lower timeout)

Or abort depending on if all data had been sent already. This should be doable since the upload speed limiting gives us this much control over the data flow. So only do a hard abort() if not all data was sent.

Something that crosses my mind: since the same would happen on client crash/being killed is there a solution we could apply both for it and pause? e.g. a 'handshake' when starting a sync to check if last operation(s) before the crash were completed on the server.

@SamuAlfageme that's quite hard when moves are involved i think :-/

In other cases, we can/could do a lot based on filesize/mtime/filehash

when the client crash or is killed, or if the connection drops right at the right moment, there is no way to know if the file was uploaded or not. So we recover in the next sync. But if the file is moved or edited in the next sync, we have a conflict. This is something we have to live with. Only when pausing/aborting the sync, we can do something about it.

@ogoffart you can assign me to it if you want, it does not seem difficult.

I see 2 use cases for timeout of e.g. 5 sec:

  • small PUT
  • new chunking MOVE

The others, it might not make sense because of filesize or because protocol is obsolete (old chunking)

@mrow4a What do you mean? This would be about abort() only. I think what I had proposed in https://github.com/owncloud/client/issues/5949#issuecomment-321499702 is enough..

Wouldn't this be a situation where the local and remote checksums are identical, therefore skipping the download and just populating the etag on the next sync? Aka: is there really an issue here?

I am not sure @ckamm how is that related to aborting, which just waits a bit until the job is finished, doing no harm to nothing.

I made a testcase for this, will add here later.

@mrow4a I think it's the other way around. We're discussing the introduction of complex aborting logic in response to this issue. If the issue doesn't exist, why introduce aborting?

It's possible that my test case doesn't go far enough. But I'd like to know whether you or @ogoffart managed to reproduce the problem with master.

@ckamm I think this issue is you move the files locally that were PUTted in the aborted upload.
And @mrow4a 's patch tries to reduce the likelyhood of

@guruz @mrow4a Oops, I forgot the move-file-away part in my test case. No wonder I didn't see anything.

I'll update it and expect everything will make more sense to me then :)

Was this page helpful?
0 / 5 - 0 ratings