Cataclysm-dda: Compress game saves using LZ4 algorithm

Created on 6 Nov 2016  Â·  14Comments  Â·  Source: CleverRaven/Cataclysm-DDA

A world could take up tens or even hundreds of megabytes on the hard disk. I've heard a lot of complains about using too much disk space from players. To squeeze some free disk space, a user would manually compress the folder when he's not playing CDDA, and decompress them back when he wanted to play. For convenience, I wish we could automate this process in game.

( This is different from #14307 . #14307 suggests reduce the files on the disk by storing those thousands small fragments (o.0.0, o.1.0...) into one single database, but here I just want to talk about compressing those small files individually. )

We could use LZ4 algorithm. It is an amazingly fast compression algorithm aiming at high compression speed. GitHub repo: https://github.com/Cyan4973/lz4/

I tested with the 62MB game save provided by GenericEnemy3 in #19168 . The total size of o.* files to be compressed is 42MB.

BrettdeMacBook-Air:Shageluk brett$ mkdir O-files
BrettdeMacBook-Air:Shageluk brett$ mv o.* O-files # to count the total size of o.* files
BrettdeMacBook-Air:Shageluk brett$ du -sh O-files
 42M    O-files
BrettdeMacBook-Air:Shageluk brett$ mkdir LZ4-compressed # to count the total size of compressed files
BrettdeMacBook-Air:Shageluk brett$ time lz4 -1 -m O-files/o.*

real    0m0.139s
user    0m0.049s
sys 0m0.039s
BrettdeMacBook-Air:Shageluk brett$ mv O-files/*.lz4 LZ4-compressed/
BrettdeMacBook-Air:Shageluk brett$ du -sh LZ4-compressed/
2.2M    LZ4-compressed/
BrettdeMacBook-Air:Shageluk brett$ time lz4 -d -m LZ4-compressed/*.lz4

real    0m0.147s
user    0m0.037s
sys 0m0.067s

| | Size |
|---|---|
| Data | 42MB |
| Compressed | 2.2MB |

The compressed size is only 5% of the original size. Amazing compression ratio!

| Operation | Time |
|---|---|
| Compress (Level 1) | 139ms |
| Decompress | 147ms |

And compressing/decompressing all of them only takes less than 150ms. In game, we could just compress/decompress those files which we need at certain times. In the test above, compressing/decompressing one single o.* only takes about 16ms, and in fact, almost half of the time is due to I/O.

<Suggestion / Discussion> Performance

Most helpful comment

Some compression would be a great idea, but I'd much prefer something that can be decompressed with common tools. Unless LZ4 offers 20+% better overall quality (ratio, speed, ease of use) than say, LZMA, I'd rather use LZMA as specified in tar or some other common tool.

Alternatively, we could bundle a tool (or a command line switch) to turn saves into their (de)compressed versions.

All 14 comments

See #10784 as another instance of similar discussion.

Some compression would be a great idea, but I'd much prefer something that can be decompressed with common tools. Unless LZ4 offers 20+% better overall quality (ratio, speed, ease of use) than say, LZMA, I'd rather use LZMA as specified in tar or some other common tool.

Alternatively, we could bundle a tool (or a command line switch) to turn saves into their (de)compressed versions.

I'd suggest deflate instead, as it's quite a bit faster, about twice as fast, and should retain an acceptable compression ratio.

It's also common because it's in zlib, and I believe that the zlib license is probably permissive enough(it's permissive enough for everything) so we could just put the source code in a subdirectory like we do for lua and chkjson.

My recommendation is to use zlib, also it should be an optional dependency
(possibly enabled by default, but it is essential to allow it to be
disabled for easier development). I am very against bundling the zlib
source in the dda repository. If some platforms (windows) don't supply
basic dependencies, then we can supply a dependency archive or something,
but there's no reason to bundle it in the source.

AFAIK neither Lua or chkjson are bundled in the source.

zlib is available on most platforms

Tangent: isn't chkjson built from the game sources?

Why use zlib? AFAIK, LZ4 is one of the fastest open-source compression algorithms.

| Compressor | Ratio | Compression | Decompression |
| --- | --- | --- | --- |
| memcpy | 1.000 | 4200 MB/s | 4200 MB/s |
| LZ4 fast 17 (r129) | 1.607 | 690 MB/s | 2220 MB/s |
| LZ4 default (r129) | 2.101 | 385 MB/s | 1850 MB/s |
| LZO 2.06 | 2.108 | 350 MB/s | 510 MB/s |
| QuickLZ 1.5.1.b6 | 2.238 | 320 MB/s | 380 MB/s |
| Snappy 1.1.0 | 2.091 | 250 MB/s | 960 MB/s |
| [Zstandard] 0.5.1 | 2.876 | 240 MB/s | 620 MB/s |
| LZF v3.6 | 2.073 | 175 MB/s | 500 MB/s |
| [zlib] 1.2.8 -1 | 2.730 | 59 MB/s | 250 MB/s |
| LZ4 HC (r129) | 2.720 | 22 MB/s | 1830 MB/s |
| [zlib] 1.2.8 -6 | 3.099 | 18 MB/s | 270 MB/s |

zlib is believed to generate a smaller output, but for those plain text game saves, LZ4 is already enough efficient.

I'm in favor of whatever is easiest for us to use, so long as its faster
than paq8*.

It also needs to fulfill a few requirements, it needs to play well with
Windows, it needs to not have a huge memory footprint (the game has a big
enough of one). Ideally we need to deal with something like save corruption
in a sane way, but it may become more of an issue with compression(you
increase the entropy, you decrease recoverability).

I actually don't know what happens when a map file is broken. Whatever
happens now should remain the same with compression.

Anyway, it definitely looks like there's likely to be a port suitable for
this. In any case, looking at some information about it, lz4 looks plenty
lightweight.

On Nov 11, 2016 8:59 PM, "Brett Dong" [email protected] wrote:

zlib is believed to generate a smaller output, but for those plain text
game saves, LZ4 is already enough efficient.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/CleverRaven/Cataclysm-DDA/issues/19192#issuecomment-260094159,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAUY7VML9LgkS4ZyBOo8K_dxRPuqZMwxks5q9R2EgaJpZM4KqeYf
.

Why use zlib?

Portability

Let's put it this way, zlib is the default due to extreme ubiquity. If you
want to use something else on performance grounds, you need to demonstrate
the player visible performance improvement in the game.

Tangent: isn't chkjson built from the game
sources?

Someone could have added it without me noticing at some point, but I don't
see it in the repository.

Let's put it this way, zlib is the default due to extreme ubiquity. If you want to use something else on performance grounds, you need to demonstrate the player visible performance improvement in the game.

Or to go one step further the performance benefits would have to be extreme to justify the extra development time needed to support a less common library versus having developers work on bug fixes in the existing code base.

Tangent: isn't chkjson built from the game
sources?
Someone could have added it without me noticing at some point, but I don't
see it in the repository.

Is it not built from json.cpp?

It's definitely in there. There's a folder for it and everything

On Nov 12, 2016 12:38 PM, "mugling" [email protected] wrote:

Tangent: isn't chkjson built from the game
sources?
Someone could have added it without me noticing at some point, but I don't
see it in the repository.

Is it not built from json.cpp?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/CleverRaven/Cataclysm-DDA/issues/19192#issuecomment-260136385,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAUY7bWEuiCnArTK4_muYXkd5K3vjejGks5q9fmngaJpZM4KqeYf
.

OK. You win.

| Compressor | Compression Time | Decompression Time | Compressed Size |
| --- | --- | --- | --- |
| LZ4 | 30ms | 28ms | 270,164 bytes |
| zlib | 72ms | 46ms | 145,029 bytes |

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Nioca picture Nioca  Â·  49Comments

Coolthulhu picture Coolthulhu  Â·  42Comments

ifreund picture ifreund  Â·  86Comments

bpwatts picture bpwatts  Â·  74Comments

railmonkey picture railmonkey  Â·  43Comments