Is your feature request related to a problem? Please describe.
The --patch-from option only accept that are not larger than 4GB. My files are >5GB (the most) and very similar, they only differ in language, so the patch would be about 200MB.
Describe the solution you'd like
I would like zstd to support files larget than 4GB for the --patch-from option.
Describe alternatives you've considered
No alternatives, because I couldn't find any solutions regarding this error.
Additional context
PS Y:\zstd> .\zstd.exe --patch-from=.\en_windows_10_consumer_editions_version_2004_x64_dvd_8d28c5d7.iso .\de_windows_10_consumer_editions_version_2004_x64_dvd_7efdffc7.iso -o patch
zstd: error 42 : Can't handle files larger than 4 GB
Thanks for the posting the issue.
The restriction is actually 2GB instead of 4GB. That was a mistake and I just merged a pr to fix it.
It would be nice to lift this restriction entirely at some point. But that will require quite a bit of work. I'm going to aim to have something like that for the next release if there is a lot of interest.
A good (easier) first step might be to just bump the restriction to 4GB. I'd have to look into how involved that would be but it would definitely be a smaller patch than allowing for arbitrarily large files.
What would be the main issue with supporting files larger than 4GB?
@luzeagithub, although the zstd codebase can compress arbitrarily large inputs, and although the Zstd format specification allows arbitrarily distant matches, this codebase internally uses 32 bit integers to represent match offsets. So in practice this implementation can only reference data within 4 GB of the current position in the stream. --patch-from mode operates as if you're compressing the new version using the old one as a dictionary, which places them in sequence like this: [old file contents...][new file contents...]. If the old file is larger than 4 GB, this implementation will therefore be unable to internally represent matches from a given part of the new file to the corresponding part of the old file, which is the whole point of --patch-from.
This could be fixed by using 64 bit integers to represent offsets internally, but that would be a large, scary refactor...
Ok, thanks for the explanation. Seems I have to use 3rd-party program (smartversion) to do it. It also has zstd support (which I am using) with files >4GB.
I'm going to keep this issue open as a reminder that there is interest in this
This item is present in our backlog, but no date set so far.
Most helpful comment
@luzeagithub, although the zstd codebase can compress arbitrarily large inputs, and although the Zstd format specification allows arbitrarily distant matches, this codebase internally uses 32 bit integers to represent match offsets. So in practice this implementation can only reference data within 4 GB of the current position in the stream.
--patch-frommode operates as if you're compressing the new version using the old one as a dictionary, which places them in sequence like this:[old file contents...][new file contents...]. If the old file is larger than 4 GB, this implementation will therefore be unable to internally represent matches from a given part of the new file to the corresponding part of the old file, which is the whole point of--patch-from.This could be fixed by using 64 bit integers to represent offsets internally, but that would be a large, scary refactor...