Cryptomator: Split large files

Created on 21 Jun 2016  路  4Comments  路  Source: cryptomator/cryptomator

Related to #85, split large files in small chunks: One unencrypted file --> several encrypted chunks in vault.

feature-request

Most helpful comment

@markuskreusch To my understanding, file splitting should not be the default behaviour, but something a user would enable or not when creating a vault, with configurable max chunk size. Then the problem of not being able to access a file during synchronization would be a trade-off of the ability to have big files split.

I (speaking for myself, obviously) would be happy if I had a choice of using an otherwise unusable cloud provider and additional security of possible file size obfuscation in exchange for occasional temporary loss of access to the file.

All 4 comments

Not feasible. This would "kill" synchronization and could potentially lead to unreadable files. Because of things like synchronization conflicts, this isn't that unlikely. This would also complicate restoring old versions that cloud storage services typically provide. Thanks for your suggestion though!

Edit: For future reference I've added further explanation. Taken from #82.

Let's say we would split a file in two or more files and they'd be eventually synchronized (uploaded one by one!). Keep in mind that we don't have any control over synchronization. Think of this edge case: Multiple people work on the same file (e.g., because you share a vault with your colleague) and now you have multiple uploads at the same time. This actually doesn't even have to be such an extreme case. Synchronization conflicts can also happen if you have multiple devices and work with synchronization clients that are doing an awful job at keeping things in sync.

In that way you could end up with a splitted file that has different origins. Let's say we have file.part1 and file.part2. There is a possibility(!) that file.part1 ends up in "another version" than file.part2. Heck, why don't we just throw in file.part1 (1) and file.part2 (1) in there, because they just popped up as synchronization conflicts! Have fun putting everything together. 馃槀

I'm exaggerating, but hopefully you get the idea why there is a small possibility that could destroy(!) your data or make it unreadable. This would be fatal and that's why we prefer atomic operations. This is only guaranteed by keeping the file "as a whole".

[...]

I'd like to point out that there certainly are technically sophisticated ways to prevent these sync conflict issues I've mentioned. But they would create a conflict with the user story of #236 that we wanted to solve with #336. There's just a whole lot to think about, we can't just "split" files without negative effects.

Some cloud storage providers have limitations on single file size. Splitting file into chunks could be a saver for such cases. As for synchronizaton problem, couldn't it be solved by updating all parts of file at the same time, even if technically the changes were made in one part only?

For example, each part could contain a version token (GUID). When a file is modified, new version token is generated, and so all file parts have to be updated. Now, somewhere else, the repository gets synchronized, one part at a time. It means that during the sync some parts will have old token, some will have a new one. This could be used by cryptomator to detect update and block access to the file temporarily.

@j-xella Even when doing this some really hard problems arise: For example a file where one part is synced to a newer state and others are still in old state can not be read completely by cryptomator (neither in the new nor in the old state) so when a users accesses such file he would see an error or similar that the file can not be accessed.

In addition cryptomator is not designed to overcome limitiations imposed by the unerlying cloud storage but is designed to encrypt the cloud storage.

@markuskreusch To my understanding, file splitting should not be the default behaviour, but something a user would enable or not when creating a vault, with configurable max chunk size. Then the problem of not being able to access a file during synchronization would be a trade-off of the ability to have big files split.

I (speaking for myself, obviously) would be happy if I had a choice of using an otherwise unusable cloud provider and additional security of possible file size obfuscation in exchange for occasional temporary loss of access to the file.

Was this page helpful?
0 / 5 - 0 ratings