Currently saving battlescape takes an unacceptable amount of time because it saves arrays of 0's and 1's and shit, and resulting file takes like 50 megabytes even compressed.
Xml is not a good format for anything except small packs of human-modified data. It's definitly not to be used for:
We should figure out a new way to store stuff so that save/load times become acceptable at the very least
Ideally, the saved games would still be in a human-readable form. The game data can get pretty large, and I don't think that any kind of human-readable format is suited for this task.
YAML is lighter on the markup than XML, though not sure if it suits the needs of OpenApoc saved games.
I remember, back in the day when Daiky was implementing battlescape in OpenXcom he had a similar issue with the YAML saves.
The solution at the time was to compress the binary data and UUENCODE/MIME encode the data so it was still ASCII based in the file (keeping the save readable in a text editor), but still allowed for faster loading/saving.
I don't know what OpenXcom uses these days though.
Why would we need human-readable save?
I don't think there's any benefit to have human-readable saves. There is a huge detriment, however, and that is save size and (derived from it) save/load times
The only difficulty with binary saves is that when you need to modify the save structure, you will break things faster. Whereas when you parse text files/markup, it can just skip over missing data.
That said, I'm quite happy for save files to be binary.
I'd recommend binary files be "chunked" (like PNG files), where you have a magic number (indicating what data is in the chunk), followed by length (though could be optional), then the binary data.
That way you could have a GLOB global chunk with game-wide data, BATL battlescape chunk (which would only be in files that are in the middle of a battle) etc.
It would also mean if you needed to modify a chunk, you could give it a new chunk magic, and write a new loader/saver so you could still import old games and save them with the new data (if we even cared about that)
I saw this structure in "Generals: Zero Hour" in its *.SCB files. There was a number of text tags and binary numbers attached to each tag. It is well known "dictionary coding".
Like this:
File begins:
3 <- its number of tags
DAMAGE_1_RANGE_2_SIZE_3 <- tags and their numbers (must be defined once)
1_50_2_30_3_4 <-binary data (1_50 is damage 50 and so on)
End of file.
We can easily use this structure. Its very flexible and allows to store different data and skip data which is not stored.
To protect file from cheaters we need to calculate its hash and write hash to the end of savegame file. Then we need to check hash each time when loading and say "hey, you are cheater" if hash does not match expected values.
There's any number of "lighter" serialization formats (YAML, TOML) that may do what we want, without having to make out own, and we can just import all the tools etc. from their current ecosystems.
I just never really considered it a priority - on my tests (gcc linux x86_64) it spends longer doing the zlib compression than the serialization itself, and then much of the serialization time was the reading of the current data (chasing a lot of random pointers through memory killing the cache) and reference from ram, not the writing of the xml. And changing the format won't affect that.
While I'm not saying that the total serialization won't won't be improved with a less verbose format, I doubt it'll be a massive difference in itself.
There's probably more low hanging fruit changing the compression (something like lz4 would be faster, but compress less well - zstd should compress quite a bit better than zlib while being slightly faster, or even just tweaking some of the zlib tunables might be useful), or laying out the gamestate in memory in a more optimal format (possibly involving changing the "Expensive" sp<> copying through StateRefs, or allocation pools to give better cache locality and lower heap allocation overhead), would give a bigger speedup to the total serialization time.
To protect file from cheaters we need to calculate its hash and write hash to the end of savegame file. Then we need to check has each time when loading and say "hey, you are cheater" if hash does not match expected values.
and delete all saves