Client: Sync only the file change, not entire file [$1,755]

Created on 14 Dec 2012  Â·  131Comments  Â·  Source: owncloud/client

In Dropbox, there notes that only the file changes are sync-ed, not the entire file. https://www.dropbox.com/help/8/en. It is great if ownCloud can do the same. Especially for large files, sync-ing the entire files make a lot of bandwidth and time unnessararily wasted.

In my testing with latest 4.5.4 and sync client in Ubuntu 12.04, I prepared a 1GB text file, append a few characters to the end, and monitor the traffic and the file in the server, I see the entire 1GB file is transffered to the server and the server is actually creating a new file.

A reply from the forum indicates that librsync has this feature http://librsync.sourceforge.net/, maybe the csync can be switched to the libsync.

Enhancement Performance bounty

Most helpful comment

Hi all, thanks for all the kind words, much appreciated, it was a fun project and I had fun doing it, hope it stands the test of time. Would like to extend the thanks to all the oC devs who helped throughout the process and especially @ckamm who has so aptly continued the development where I left off, very nice! Thanks again all!

All 131 comments

i think this is ESSENTIAL for a sync client. And detecting of moved files. Maybe combining this like rsync -y (trying to find a similiar base file on the remote side to speedup upload), but note that rsync -y can slow down (lib)rsync when working in large directories.

any update on this?

+1 for this. There's a wealth of files that may slightly change each time and are big to sync. Outlook PST, truecrypt, databases, phone backups, ...
Any update on this?

I'd like to see this as well.

Waiting on this to install my first owncloud server.

I hope this will be implemented, seems essential..

  • for some reason my ownCloud client (1.3.0, OS X) has trouble syncing big files with small changes, and always re-uploads entire files...

I also see this as essential. Please.

I agree on this one as well, we need to be able to have some kind of incremental sync possibility before this is 100% usable.

The problem is known, and we will get to it. At one day.

Until then, please try to retain from +1 comments :v:

Can you give an estimate when will see the “one day”?
This functionality would open up whole new possibilities like syncing Truecrypt-volume files or Lightroom catalogs.

@Poelziminator Not in any release this year. Note that most work for this will be in the server and the general design.

This is the key-feature for everone works with trucrypt or bigger dbfiles. Please make it happen.

This as well seems related to the 1.6 »Sync Performance« milestone. @MTRichards @dragotin @danimo?

This is delta file syncing, and likely too complicated to get into 1.6 because it requires major server work too. The idea is to first improve performance on file level sync to get it more efficient, and then increase the granularity of the file comparisons (to file chunks), but going right to file chunks without first getting the file level sync comparisons would hurt performance more than help at this point because of the sheer volume of comparisons required.

I was about to choose ownCloud (with a very high probability of buying Enterprise) for use at our company but decided against it because of this particular issue. I'm a bit shocked that this isn't supported, considering how important this is for (potential) customers using TrueCrypt et al. This isn't so much of a +1 as an "at least one enterprise customer lost to the other guys".

This is very important for virtual machine disk image files. Single byte change causes upload of multi-gigabyte files to remote server.

Yes! Great idea!

As this is really the most important feature of a cloud service and it seems that nobody is interested in or working on it, I would like to offer some help with this issue. Is there already some information about what needs to be done, where to start, etc.?

I agree this is a very important feature. Owncloud is such a wonderful piece of software. I tested it today and found its quality up to the mark. Just Delta file sync addition will make it complete. Web interface/WebDav/Desktopsync/file sharing/ has worked out great on my VPS, works with ISPConfig3 implementation.

please initiate this effort and I am willing to buy the enterprise edition for my company.

This point should be an ownCloud priority, for sure ! Without this functionnality, the EE version is "just" the community one with more support ?

I remember hearing that owncloud wanted to keep the files stored on disk in their entirety. Is this (still) true? Because if so you could just generate a zsync signature file and a custom receiver that generated the entire file.

If the files are allowed to be broken up (maybe in a future version) then they can be chunked and a very efficient sync endpoint could be made.

What are the thoughts on this. I may consider working on this is my spare time.

@dragotin @danimo @MTRichards what’s the plan on this one? I know it’s a big one, but it’s requested very often and seems to be important to improve performance.

@jancborchardt As long as the server doesn't offer any delta syncs, we can't implement it.

@danimo Can you explain what you mean in a little more detail, please?

@LoZio Currently, we use WebDAV as the communication protocol with the server. Additionally, we can upload in chunks, but only if we transfer the entire file. Delta-sync requires another protocol extension. Also, we have no gurantee that the server is holding the hash-wise same file, since the server does not store file hashes.

So what’s the plan with the server-side regarding this one? @karlitschek @DeepDiver1975 @PVince81

We should do this in the future. But it is more a long term feature

@jospoortvliet @DeepDiver1975 has anyone followed up with @powerpaul17 s offer to help with implementation? Just asking because this is indeed a very important feature - would be great if it wouldn't be too "long-term"

The lack of this feature is deal-breaking for me :( It may be suitable over LAN, but resyncing the entire file with offsite servers is no-go. I understand it requires server support, so who is the best person to contact about that? Is there a protocol extension proposal?

@menelic Nobody has come back to me, but it seems that the first thing to do is implement the necessary features in the server. I tried looking at the sources but haven't quite found out where to tie in.

@powerpaul17 and anyone who is willing to help, please also join our IRC channel at #owncloud-dev, as well as our developer mailing list. There you can ask questions if you need help.

Thanks!

@powerpaul17 the thing is that WebDAV is used for downloading and uploading files.
So somehow the connector might need to be extended (check out lib/private/connector/sabre) to support requesting/sending partial files.

But the more important question is first to find out how to diff files (xdelta?) and how the server/client can store an older version of the file somewhere to be able to create that diff in the first place, considering that there might be conflicts.

@PVince81 I wouldn't diff the files, because that would require keeping an entire copy of every sync directory to diff against. Instead, I would do something like keep a SHA3 hash of each 1MiB block in each file, as well as the SHA3 hash of the entire file (both as seen on the server). For files less than 1MiB, just sync the entire file whenever the file no longer matches the hash. Anything over 1MiB, only sync the blocks that no longer match the hash. Let the 1 MiB threshold be configurable.

I agree. zsync is a simple solution it is essentially a different way to use the rsync algroithm. Basically the server keeps a static signature file (as @DarthAndroid suggested) and the client downloads it to figure out what it needs to download/upload.

Even if zsync is not used directly it is a nice approach to look at as there is very little logic on the server (just calculate a new signature file every time the file changes) and no "history" needs to be kept on either side. While the delta is not ideal (as something like xdelta would get closer to) that is the only major downside.

@DarthAndroid I thought about your idea while in the subway and came to the following questions/issues:
1) Would your approach work well with file size changes (basically, new chunks that might be inserted/deleted) ?

2) You should be aware that some files might be stored on external storage on the server (ex: SMB server) and recomputing the hashes of that file in case it changed remotely (not through ownCloud) might need to recompute the hash of all chunks, which might be expensive and need redownloading the whole file to a temporary store. (smbclient doesn't support partial file download)

3) What file formats only have parts of them changing and would benefit from this approach ? A few formats that come to might are TXT files, WAV, PNG, TIFF files. Other compressed files like JPG, MP3, OGG, AVI, ZIP, RAR, ODC, DOCX (zip file) etc will mostly likely change completely when working on them.

My concern with this is that it might introduce a very high level of complexity and maintenance costs where the benefit might not be that big (ratio between complexity/time to invest and overall benefit)

@kevincox are you talking about a full file signature like a MD5/SHA hash ? And when that one changes sync the whole file ? Note that the sync client already uses etags (similar hashes but not based on content) to detect changes. Just wanted to clarify :smile:

I did some experiments in the past with xdelta (binary diff) for another project and noticed that diffing two ZIP files produced a patch that was almost the same size as the ZIP file itself. I had to first extract the ZIP file then do a xdelta on every file and even extract the JAR files inside (there were JAR files inside the ZIP) to have even better compression. Only then the patch file was much smaller. But doing this would require to have the compressing/decompressing logic on both sides. Complexity seems to also be quite high.

@PVince81

No, I don't claim to be an expert but check the link for more details. Essentially the signature file has a list of hashes for each block in the file. The you download that file and figure out what blocks the server already has (using a rolling checksum algorithm a la rsync).

As I said I'm not an expert but if you want I can explain what I understand in more detail.

@PVince81

Regarding @DarthAndroid's idea.

1) You probably wont want to hash/checksum fixed sized blocks because adding or removing one byte to the beginning would force a full re-upload. Instead you would use some sort of hash function to find chunks that are approximately a given size, but adding or removing one byte will only affect one block.

2) This approach would probably be best if the files weren't stored as plain files on the server (this is why I as asking before) so the server would be the master of the file and modifications would have to be done through it. This isn't necessary but otherwise I don't see how to get around this problem.

3) This is a good point. And this approach is the "all in" method and might not be worth it.

About xdelta. I believe that it decompresses files before generating the delta (then compresses the delta) to help avoid this problem. But there are many formats that are not designed to keep similar data residing in a similar file. The general solution is "Don't put a compressed jar in a compressed zip" but for an end user solution you can't expect them do do the "sane" thing and should try to handle this situations as gracefully as possible.

I would suggest using the zsync method. Maybe we can take a look how Dropbox handles this? I think there are some python files on the client side which include this.
After we have chosen a possible solution we could distribute the work which needs to be done. Count me in if there is some coding to be done, i would be glad to help in my spare time. Maybe we should get in contact with the core development team and ask them how we can implement this so it gets into mainline?

@kevincox thanks for your input. It seems from the page that zsync does automatically decompress files and their blog says that Eve Online uses it as well.

Would be good to try it out with different file formats as it seems to use the approach suggested by @DarthAndroid and see whether it's worth the effort as putitng this into ownCloud is a big task.

Also, if going to zsync route, need to check whether the project is regularily maintained in case of bugs.

@rarspace01 feel free to do some experiments/proof of concept. The best would be to submit a PR with a "WIP" header with the work in progress so it can be reviewed/discussed/tested while it is being developed. Also you might want to drop a line about this in the developer mailing list to get more feedback first.

Also, always keep in mind the external storage cases: even if the client might be someday able to make HTTP range requests, ownCloud might still need to download the whole file from SMB/FTP/etc just to be able to return a specific part of it to the client.
So in the worst case that "delta" behavior must be able to fall back to the "regular" syncing in such cases.

Also, always keep in mind the external storage cases: even if the client might be someday able to make HTTP range requests, ownCloud might still need to download the whole file from SMB/FTP/etc just to be able to return a specific part of it to the client.
So in the worst case that "delta" behavior must be able to fall back to the "regular" syncing in such cases.

A very simple solution would be no not generate zdelta signature files in some cases. For example files smaller then a given number of blocks or files on "slow" storage without partial download capabilities. In these cases when the client requests a signature and gets a 404 (or any other error) it should fallback to a complete upload/download. This may not be ideal because there it an additional request but it is a pretty good first step. An optimization that tells the client what directories to never consider for delta-syncing could be added later if it is found to be useful.

Also note that afaik zsync has no upload capabilities. While the same signature files could be used to compute the necessary data to send there would need to be a custom "patch" format and handler to construct the new file server-side.

@kevincox 's first point to me is something I didn't think about.

The main issues/use cases where I am looking to benefit from smarter/incremental syncing is:

  • Initial sync. I have a folder with 12 GB of files, of which I already have a local copy in the exact folder structure. I'd like to be able to add a sync folder between the local and server copies, and have it detect that the hashes match, and that it doesn't need to transfer any data or update them. This is particularly important since I have 4 different computers I'd like to sync with my cloud, and would love to not redownload the files for each one. Bonus points if owncloud at some point gets the ability to sync peer-to-peer over the LAN, like dropbox. Most of this is a bullet I'm only going to bite once, so I could just suffer through it, but uploading/downloading when the data is already there is just painful.
  • Zip files. I work with a number of zip-based formats (.jar, .docx, .xlsx, etc.) which are in the 20-40 MiB range. My edits are usually only small changes at a time. If it could delta the files inside the .zip and only upload data for the ones that changed, that would save a lot of bandwidth. Zip formats are common enough that I think an argument could be made to give them special handling like this.

Hash-based signatures against the on-disk data would work for case 1, but not case 2. My suggestion for that is because dropbox uses them not only for partial syncing, but also deduplication when syncing. If I create a copy of a file, dropbox will sync it instantly because it doesn't upload anything - it just tells the server that a new file appeared with blocks A, B, and C, and since the server already has those blocks, it doesn't have to request them from the client. This also handles file renames without any transfer of data, as long as the "file create" command is issued before the "file delete" in the sync operation. I imagine dropbox has additional extensions on top of hash-based stuff, but it seemed like a good place to start.

@PVince81 I also agree this should be supplementary on top of the existing full-download/upload functionality. A simple query to check if the given file supports partial syncing should not cause issues-- either it's a small file, and we don't query because we're just going to sync the whole file anyways, or it's a large file, and the extra query either saves us a ton of bandwidth, or is negligible compared to syncing the file. This allows the server to selectively enable or disable partial syncing for any file or folder based on backend/external capabilities.

@DarthAndroid

zip files aren't the problem in themselves but compressed/encrypted files where a small change can completely change the contents. The solution is in general to uncompress the file, diff it, then recompress it afterwords. This is a very general process and can be adapted to (almost) any delta format that is used.

I need sync a file with more +5GB every time and only some KB are modified.

+1 very important basic feature

@hc-bondis yes, mind to send a patch?

Since this cannot be done using standard WebDav (as far as I know) one has to offer a different method of transfer or way of dealing with patch files. Is there a discussion group or mailing list for this topic? If someone is developing a "patch" the it works should be acceptable for the community / core team.

Yes, please use owncloud-devel mailinglist for that: http://mailman.owncloud.org/mailman/listinfo/devel

For the curious ones:
https://tech.dropbox.com/2014/07/streaming-file-synchronization/

Dropbox uses 4 MB chunks/blocks.

It is interesting that they use fixed size blocks. This has the obvious downside that a "shift" will cause the whole file to be completely uploaded/downloaded, where as if you use a variable block size it will only affect a block or two. I can't imagine that the cost of chunking a file would be too large but there are probably benefits in optimizing the storage of the chunks (no need to specify length and easy offset calculation of packed chunks). I wonder if they were redesigning their system if they would switch to a variable sized system.

This is in part a dedup decision, so they store a block exactly once but charge everyone for it.

Deduping works _better_ on variably sized blocks.

I wonder... Can you switch mid stream with 200 m users?

It would not be easy.

@kevincox I was actually looking at how rsync does incremental diffs that handles shifts, and it's not terribly complex. You just need two hashes, one cheap with a sliding window (such as "sum all bytes in the block as an unsigned int and let it overflow) and one that's accurate (such as sha1). Given a list of (cheap hash, accurate hash) for each block, the other end of the connection can find all unchanged blocks, including ones that have shifted in O(n) time by selecting potential matches via the cheap hash and then verifying them with the accurate hash. At that point, you know what blocks you have and how they're shifted, and need only request the missing blocks.

Yes, rsync is a different approach then the dropbox method. However the main difference is that the rsync can generate small deltas, while the chunking strategy achieves deduplication as well.

This is because in the rsync algorithm a "shift" will generate a completely new set of chunks, however because the receiver decides what chunks it has dynamically this is not an issue. This means that you can deduplicate in a single file quite efficiently, however deduplicating across different files is impractical because you would have to hash every byte of every file.

With a fixed chunk system it is very easy to keep a catalog of chunks you have, and you don't need to recalculate them all the time. The rsync algorithm is really good at what it was designed for, but for a deduplicating system it doesn't really cut it.

@MTRichards I wonder if dropbox using fixed size chunks is legacy, and if they would prefer switching to variable sized blocks. It would be interesting to hear some of their thoughts, although they probably aren't too keen to share.

@kevincox did you ever read the analysis of how dropbox works? Someone took it apart with some serious analysis step by step. I'll see if I can find it, it is interesting reading.

@MTRichards No I haven't read much about them, mostly the link posted above. If you could find that write up I would enjoy reading it.

That is mostly the network protocol although looking at syncthing's design would definitely be a good idea.

After reading that, it struck me that it would be reinventing the wheel if a similar process was to followed here. It seems that this could be a well defined way of ensuring partial transfer can be resumed, but could also lead to simpler implimentation of deduplication and partial change transfers, not so?

Based on the architecture of owncloud (as I understand it) it would be difficult to implement the same protocol. Although there would be advantages to using the same protocol (you might be able to use the syncthing client with a owncloud server, would need to look into this more) the actual protocol is very basic and there isn't much work to create a similar protocol. I was imagining that owncloud would probably use something over HTTP or even a webdav extension as those are already technologies used in the owncloud infrastructure and would be minimally invasive. Plus these technologies already do much of the work for a request/response protocol so we are already much of the way there.

I think that the syncthing protocol has many advantages that should be considered. I am aware that it is still in active development but its fast approaching stable 1.0. Combining teh two would make for a compelling solution.

@kevincox please clarify:

the actual protocol is >very basic and there isn't much work to create a similar protocol

I don't understand what you mean. If you are referring to the pace of development, syncthing is improving quickly and responsibly, with a view to backward compatibility, data security and plattform agnostic use. Or maybe you mean something else entirely? Please explain - and please do look into the possibility of using syncthing, because using a syncthing client with an owncloud server would make a compelling use case.

@menelic I didn't mean that in a negative way at all. Simplicity is the goal. What I meant that it simply requests files and chunks. There is nothing complex about it all of the sync logic is independent to the protocol.

@kevincox thanks for the clarification, I now understand your point. But what does that mean in terms of having a look at syncthing as a possible protocol to use for ownCloud? I'd still encourage you to check it out, also because it can provide some oft-requested features for ownCloud such as incremental file-change sync, LAN-only sync (owncloud/mirall#230), fine-grained repository and folder control through manually defineable master and receipient repositories and folders etc.

If any of you want to implement or support this - I've just put a bounty on this on bountysource as I personally would like to have this ;-)

https://www.bountysource.com/issues/905030-sync-only-the-file-change-not-entire-file

Just a small start, but if enough of the visitors here want this and support it, perhaps somebody can afford spending some time on it and building it in!

I don't know if it should be done the zsync way or via the syncthing protocol, that's not my area of expertise, would love to hear opinions from the sync devs on that.

Note that whoever wants to tackle this, needs to to server patches and needs to be aware of possible limitations imposed by external storages.

I've added to this bounty, as I see this as a major hurdle for the adoption of OC over services with software that does delta sync.

I spent a lot of time getting OC implemented at my company. I love OC, but was disappointed to discover this problem (although I understand the reasoning behind it).

I took some money from our budget that would normally go to DropBox and added to the bounty.

Thanks @jospoortvliet and @curtiszimmerman for donating.

I just added a small amount to the bounty for this.

I added a small bounty to solve this problem, I would really like to leave the dropboat...

Added my $15 too :)

Estamos con 15 mas por aqui! esperamos que sirva todo este aporte

$20 here. :)

Not sure if adding $ to this bounty helps a lot. As long as the server side hasn't changed there is no way this can be implemented in mirall...
Maybe the needed changes on the server side could be something for a summer of code project? http://forum.owncloud.org/viewtopic.php?t=24398&p=71733#p71733

i added it to the GSC idea page, could somebody fill in the details/contact/get an mentor for this topic?

Delta sync sounds great and would certainly provide improvement over current full file transfer. But can you explain to me the real usecase that makes it so important to you? Do you have large files that get modified only by a bit? If I have to upload 20M ppt again -- I don't really care. Do you have problems with transfer times (slow network)? Or you want to save on the internet plan? Or what is it?

In other words - has anyone measured the real impact of lack of delta sync? Is appending to a 1GB file a real usecase for someone?

It doesn't have to be 1GB, adding a few bytes to a file of a few MB in size is a regular use case for many users (log files).

And lots of users, especially those that host their own cloud server themselves over consumer internet connections that in many cases have small allowed quotas do care a lot for transfer volume.

The number of comments on this issue shows that lots of owncloud-users care for this feature.

The devil is in the detail with that, and here is why for the most use cases delta sync does not at all help:

The most file formats contain compressed data: Pictures, music, office docs etc. The problem with that is that if a byte changes somewhere in the file, the file gets compressed again, and the compression algorithm changes data over the entire file. That means that all parts of the file have to be transfered again and the delta sync would be useless.

To my knowledge, there is no solution to that yet. So bottomline: Delta sync is great checkmark on the feature list, but in practical rather limited useful.

That is why I think it is good to push this feature further out and use the limited engineering power to become more robust, faster and convenient.

To all who love to do +1: Check your data base and find out how many different file formats are in there. How many of them are compressed, what is the fraction of uncompressed files in your collection?

I don't agree. I mean, of course, you are right in that compressed data will in most cases not benefit from delta sync. But we don't usally update Pictures and Music, maybe the meta-data (like ID3-tags), and this stuff is not compressed and would again benefit from detla sync.

Office Docs depend on the concrete format. docx and xlsx are compressed and will most likely also not benefit from delta-sync, for others I'm not so sure.

The main use-case for me are log files, and those defenitly would benefit a lot from delta-sync.

But of course it's your freedom to work on whatever you think best, just be aware that there are a lot of owncloud users that would like to have that feature, and also some that are not owncloud users because of this missing feature.

@dragotin the use case is mostly multimedia files here, I concur.

Since no one has mentioned it, Truecrypt volumes can be large and delta sync can help tremendously with syncing these encrypted volumes across multiple machines. This is my primary reason for wanting delta sync, and it's definitely one of the best features about Dropbox which we could really use. Regarding this crazy "it is not helpful at all"... We can put a number representing that feature's value (say, on bountysource.com) and prove that actually, there are quite a few people who think it will be helpful to them. In fact, using bountysource we can see that this issue is the one people MOST want to see in Owncloud. I added to that bounty because I want that feature, because that feature will be extremely useful for me. Transferring gigabytes of encrypted volume across my wireless network every time two megabytes of the volume changes is not a trivial task. I really think you are kidding yourself if you genuinely believe it won't be helpful for a lot of use cases out there.

For me it's large Truecrypt containers and large index files where the byte change is minimal and the resulted transfer is enormous. I see that for traditional office files (rather small, a lot of change in the file itself) your argument is true. I think there is enough need for delta sync (just search for delta sync googledrive etc. online to get a hint outside of owncloud) and there is also a significant amount of usecases. Implementing something with a benefit for some users and no negative effect for the remaining users is ok for me, even if we have to invest a little more time on implementing it as developers. In the end it's a question of prioritization. But if you take a look at bountysource this issue seems to be the one of the important ones, even here there are 82 comments.

Is somebody able to come up with a description of the file format of Truecrypt containers? Maybe it makes more sense to consider to handle special file types special than implementing a general deltasync approach. Just thinking.

And: Please note that I did not say we should not do it, I just think its overrated and we can do better things short term. But sorry, I didn't want to bring you guys up, no worry.

I haven't looked at a volume rigorously, but generally it's designed to look like a block of random data (in a size multiple of 512 bytes). There is more information at Wikipedia, but Truecrypt volumes were designed not to have a specific header or magic byte sequence to help identify them with.

Truecrypt doesn't use any specific fileheaders. The whole file is AES/etc. encrypted. as a result i wouldn't make a white list for delta sync. i would make a black list, if a separatation is really needed. Maybe a simple >4MB rules would be also ok. a black list we could add zip/7z, but also docx etc.

I am not a user of TrueCrypt but it looks like it has been discontinued earlier this year. It is also marked as unsecure and not developed anymore. Is that true or a hoax? Does it affect the non-Windows platforms in the same way?

Does anyone have experience with the BitLocker (which is a supposed replacement of TrueCrypt on Windows) and dropbox/owncloud/google drive? Would it suffer from the same whole volume upload problem when syncing files?

As per handling special cases such as fixing metadata of multimedia files, I imagine that with the underlying webdav transfer one could think of propagating metadata changes without propagating the whole file payload.

Ultimately, I am not the developer owncloud, but I can imagine that adding a delta sync would require a major reworking of the protocol. That's probably a reason why they would like to have a clear understanding of the benefit to their users versus the big effort they have to spend on this (or they could spend on fixing other issues which to some other users appear more basic to fix). Maybe something similar applies to google which does not seem to support delta sync in google drive.

As discussed earlier in this thread, the way to deltasync zip/7Z/docx/xlsx files is to unzip and deltasync the uncompressed contents (i.e., treat it like a folder). Even doing this for just Zip-formatted files (*.zip, *.docx, *.xlsx, *.pptx, *.jar... lots of file formats are just *.zip in disguise) would be very beneficial.

I would argue a major usecase of this is office document files, since they often contain multimedia which causes them to be large, and they are frequently edited.

We put a lot of time and effort in to implementing OwnCloud for our small business (40 employees). We ditched DropBox and migrated all our files to OwnCloud. We learned about this issue the hard way: OC just can't handle the load (design files, office documents, etc). We had tons of headaches with files getting out of sync, or taking forever to finish syncing. My staff was almost ready to mutiny they hated OC so much.

We tried every single help document to try to tweak Apache/PHP/OC performance. We eventually came to this thread, after more digging realized this was the heart of the issue.

I'm still supportive of OC, and thus contributed to the bounty source. But sadly still had to switch back to Dropbox.

@moscicki Without getting too far off topic, the original developer has "discontinued" development of Truecrypt, but the project lives on. There are several active projects around Truecrypt (Truecrypt.ch, istruecryptauditedyet.com, etc), and if anything, it looks like the "discontinuation" of Truecrypt will cause its development to flourish as the torch is passed from the one (or two) previously anonymous developers to a real open source community.

I think that opening up zip files to sync them is not a good idea. You are extemely unlikely to be able to duplicate the file. The resulting zip file may have the same file contents, but it will be a different size and different bytes.

@robertmhoehn This issue is unlikely to be the cause of your problem unless you are dealing with very large files that only get small changes (not office documents and most likely not design files). I think the biggest slowdown issue with owncloud is if you don't switch the server from sqlite to a real database.

@robertmhoehn A lot of the "file never finish to sync" issues have been fixed in the 1.6.4 client, even more will with 1.7 (according to bug reports). As long as you're usually dealing with files in the region of 1-50 MB, you should not be seeing serious issue.

@ssieb Even if you don't unzip, the zip-based office files are still largely the same after small changes - adding a slide to a 4MB pptx leaves almost 60% of the file the same. As long as we use an algorithm that can properly detect/handle shifted bytes, office files will still likely see large improvements. As someone working with an offsite oC server and limited upload bandwidth, smaller deltas are a very big improvement.

Sure, depending on the contents, the zip file might not change much. I am totally in favour of implementing delta changes, just don't try to repack compressed files.

It is sad that I should run both OC and DropBox client on my machine, OC for all of my file and DropBox for my 5GB truecrypt mounted partition file. :-(

Eagerly waiting for this feature.

I would also say, that this (binary diff + movement analysis) is essential for syncing client. This has to be solved before ownCloud can be used massively. When I change a bit large file(s) or change location then it is abnormal, that files are uploaded again. This makes huge internet traffic and most critical problem - usually we do not have time to wait until all changed (large) files uploading are complete.

Very essential performance enhancing functionality in my opinion! Hope this will be picked up soon.

Is there an estimated date for this feature? Issue was opened two years ago... TIA.

Locking for contributors only because the notifications of all those "+1" are going to disturb me too much.

If you want to implement this feature please create a new issue or even better a Pull Request where we can discuss this. I think this issue left enough hints and more "+1" are not going to change anything.

Anyone working on this?

@gadLinux wants to work on it but needs input from the client developers on what direction to take with this. I pinged some of them, see if they have time for this.

I added some brainstorming: https://github.com/owncloud/core/issues/16162#issuecomment-104616147

Note that probably the big thing is in the ownCloud server component.

$1755 folks! Where's the like button on that? :)

Does anyone has an idea what file types delta sync will actually make sense ?

@ahmedammar does delta sync / zsync also detect shifts in a file ? For example if I take a wav file (no compression) and delete a chunk of audio in the middle, will it still detect this or will it resync the whole file ?

Technically zsync could detect moves, but my code doesn’t support that, all bytes moved will be resent, this was to make it easier to get something working quickly. I mentioned this as a future work area.

On Feb 7, 2018, at 9:03 AM, Vincent Petry notifications@github.com wrote:

Does anyone has an idea what file types delta sync will actually make sense ?

@ahmedammar does delta sync / zsync also detect shifts in a file ? For example if I take a wav file (no compression) and delete a chunk of audio in the middle, will it still detect this or will it resync the whole file ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Technically zsync could detect moves, but my code doesn’t support that, all bytes moved will be resent, this was to make it easier to get something working quickly. I mentioned this as a future work area.

in which area are the adoptions neccessary to make this work? I'm trying to find out the impact - if bigger refactorings are necessary I question the current in depth review ...

does the current impl at least help if appending stuff at the end of files ? In this case I expect that since the beginning's offset did not change it would only sync the appended block.

did anyone here already test the feature ? if you did, please report where you saw improvements (file types, use case, etc)

Appending will only send appended bytes. Regarding where the work needs to be done:

1) zsync - code wasn’t designed with upload path in mind, I implemented that and use the path of least resistance, meaning dropping moved block support.
2) oC - again to support moved blocks we’d need a more complex upload path where moved chunks are processed somehow, this can be as simple as sending a file with moved block to:from and modifying assembly code to handle appropriately.

I recommend we don’t try to do this now but after current simple approach is well tested.

Just wanted to clarify something, the above only applies to the uploader, the downloader will not redownload moved chunks.

@PVince81 Regarding the file types that would benefit from delta sync: we are a publishing house and we work with rather big (up to 500MB) Adobe InDesign files. Small changes to these files are very good candidates for delta sync (Dropbox syncs small changes to these files almost instantly).

The feature is stabilizing in the delta-sync branch and test builds will become available around the time 2.5.0 is released. If things go smoothly it'll be in 2.6.0.

Is this issue currently being resolved? Would like to start resolving this issue but do not want to if someone else is.

Yeah would be nice if this gets closed so I can claim the bounty?

Is this really still not merged? What's the hold-up guys? Can we close this ticket??

@ahmedammar It will be in client 2.6.0
On Friday we released 2.5.0 beta1. So it's coming :)
I don't know how the bounty thing works, but i'll try to get that info.

@guru was the PR finally merged? Yes? Close Ticket and we Trigger payment on Bountysource

Thanks! Have submitted the bounty claim too ...

@ahmedammar I don't use Owncloud anymore, but I was one of the contributors to the Bountysource bounty, and I just wanted to thank you for all this work. It was very interesting (and impressive) to read your status updates and commits. I hope you buy at least one beer from the bounty (or a nice dinner)!

Thanks @ahmedammar really appreciated your contribution and looking forward to your continued participation - as time fits. Delta sync will for sure help to bring ownCloud into new places and motivate people to contribute and use it! Privacy and security and a full concentration on file sync and share are so important in todays world. Again Thank You!

I know that one should not post off-topic comments in issues. However, I want to thank @ahmedammar and all other contributors in this issue. This is a really nice and long awaited feature. Well done, everyone!

I also want to thank you @ahmedammar for you work. My first post above is from 5 years ago and I'll be happy to go back to OC if this works!

@LoZio it’s gonna be merged to Nextcloud too.

Hi all, thanks for all the kind words, much appreciated, it was a fun project and I had fun doing it, hope it stands the test of time. Would like to extend the thanks to all the oC devs who helped throughout the process and especially @ckamm who has so aptly continued the development where I left off, very nice! Thanks again all!

This feature will be in the upcoming 2.6 alpha release.
Meanwhile you can try it inside the daily builds 2.6.x https://download.owncloud.com/desktop/daily/

So, let's say I wanted to experiment with this new feature. I just downloaded the 2.6.0alpha1 Desktop Client and enabled delta synchronisation. Unfortunately, without noticing any effect. So my question is: Which ownCloud server version is necessary for this to work as intended?

Thanks in advance and keep up the good work.

@bolandross The server part is only in the master branch of the ownCloud server. I hasn't been released in a production version yet.

Somebody needs to update the FAQ: https://owncloud.org/faq/#partialsyncing

Was this page helpful?
0 / 5 - 0 ratings