Cataclysm-dda: Save files are massive compared to compressed versions

Created on 17 Aug 2018  路  4Comments  路  Source: CleverRaven/Cataclysm-DDA

Describe the problem
Save files get massive quickly, upwards of 100MB in some cases.

To Reproduce

  1. Make a save file
  2. Compress it
  3. See that the compressed version is about 1/10'th the size.

Expected behavior
A save file should be generated with space efficiency in mind. Also, load times will be reduced as the program will only need to read and parse less than 1/6'th the information for a reasonable compression technique.

Actual behavior
The save file stores mostly map files which themselves contain mostly things like "t_grass" or "t_dirt". I did a quick scan of how many times "t_grass" was referenced in my 40MB world, found 1,236,377 matches.

Additional information
Even just replacing the "t_grass" with something short will cut file size at the minimum 5-10MB. (my calculation shows "t_grass" takes up approx. 25% of the entire save file). With an all encompassing compression technique, it would probably change my save file from 40MB to <10MB easily.

So essentially, I see a map file going from "t_grass","t_grass","t_grass",..... to "032032032" (032 being the in game debug # for "t_grass".
If you want to get technical, you could use basic compression technique to replace all "032" with some single character (or two characters if there's large variety in terrain in any particular map) and map it to "032". Which becomes "0:032","000"
Even further, you could just use a delimiter to know which section of the file belongs to the coordinates and terrain without specifying "coordinates:[XX,YY,ZZ]"

Also, I don't know why but every map file splits a chunk into 4 smaller chunks with a whole new header with version, coordinates, turn_last_touched, etc. This could probably be cut out to make it one whole chunk (might be a little tough because that's some hefty code).

(P5 - Long-term) <Suggestion / Discussion> Performance

Most helpful comment

Additional similar issues for reference: #19192, #14307, #10784

All 4 comments

Additional similar issues for reference: #19192, #14307, #10784

After reading some of the above articles more closely, it looks like the main blocker was save compatibility, nobody wanting to redo so much code, and possibly that compressing saves didn't provide any speedup at all.

However, I suggest only changing the text file format to something much more compressed. This is mostly changing how strings are output to the file or read in. For older saves, you can keep the preexisting code to parse old saves. However, new saves will not work on older versions (hardly a problem as I see it).

Of course there are better ways of compression, but my suggestion is an much easier compromise with healthy benefits.

Summary from previous discussions.
Use a common compression library unless theres a clear benefit to using a
more obscure one.
Make it optional at build time so it's not a hard requirement for
development builds.
We're not going to do our own compression or save mangling, that's what
compression libraries are good at.

Closing due to a lack of actual impact, save file compression is a nice-to-have, but since it isn't breaking anything per se, it doesn't need an issue.

Was this page helpful?
0 / 5 - 0 ratings