Citra: [Feature request] 7zip and/or zip support

Created on 27 Jun 2017  Â·  18Comments  Â·  Source: citra-emu/citra

I would like to request support for opening roms within 7zip and/or zip files. It can greatly reduce the size of roms on disk.

Thanks!

E-easy enhancement

Most helpful comment

I had a moment and decided to test. I picked a large game, specifically the encrypted No-Intro verified Bravely Second - End Layer (USA) (En,Fr,Es).

4.0GiB (4294967296 bytes) - Uncompressed
2.6GiB (2708261293 bytes) - Max compression 7z
2.6GiB (2709183215 bytes) - HDD format chd

Checking the chd breakdown, I got this:

     Hunks  Percent  Name
----------  -------  ------------------------------------
   661,095    63.0%  Uncompressed
   387,471    37.0%  Copy from self
         8     0.0%  LZMA
         1     0.0%  Deflate
         1     0.0%  FLAC

The large uncompressed chunk may be a result of being encrypted, I'm not sure. I'll have to check into that. I'm not too intimately familiar with the 3ds format structure myself. Regardless, the result here is that 7z and chd produce nearly an identical filesize (1MiB difference), both save 1.4GiB over uncompressed, but 7z requires full extraction to be used, while chd can be read directly in that format at near-uncompressed speed. Given the advantages, you might be surprised how many people get on board with the format. After adding support to various libretro cores, chd has become extremely popular (for CD/GD-ROM games).

In case it hasn't already been made clear, chd support can be added through the libchdr library:
https://github.com/rtissera/libchdr

EDIT: Also, chd is lossless (perfectly reversible) and perfectly deterministic.

All 18 comments

This has been brought up previously https://github.com/citra-emu/citra/issues/2157 If you would like to add it, we can discuss it in further details. If I recall, anything related to rom loading is due for a rewrite, so this feature won't be happening until after that is complete.

I don't think this is practical to implement, unfortunately. Zip and 7-zip (afaik, though correct me if I'm wrong) don't allow random access inside a compressed file, and 3DS ROMs tend to be quite big, so it's not practical to just load the entire thing to memory. Furthermore, once we start supporting (and encouraging) encrypted ROMs, they will be essentially uncompressable. Support for compressed ROMs would have to be done through a specialized format.

imo we should not bother with this, citra's job is to load a rom in a known format, why should anyone go through the trouble of supporting compressed archives and possibly start having regression and more bugs to fix.

even standard prebuilt pcs and laptops these days come with a nice hdd space from 200+ GB and if it comes to it a 1TB drive is not that horribly expensive either and a rom file would most likely go 2 or 3GB maximum.

Agreed with @RavenHome1 and @yuriks… dog days are over when it comes to ROM formats like that. It’s not the same as a GBA binary or whatever where things are stored simply, so it doesn’t make sense like it used to.

1) roms can be trimmed. often up to ~2.3 gb for 4gb rom.
2) yes, zip/7z don't have random access. But, i remember on hacked PSP was format CISO (compressed ISO), where every 4kb (or something like it) was compressed separately. So, to access to some offset need to decompress only 4kb block. Not sure if is possible on Citra

So, to access to some offset need to decompress only 4kb block. Not sure if is possible on Citra

There are some existing compression formats that do that, like bgzf which is used in bioinformatics for compressing genomes.

Though, as stated before, encrypted files cannot be compressed; too much entropy.

The thing i most disagree with compressing rom formats, especially some of the modern ones is that as a matter of course, some of them are already performing 'big archive' compression inside the rom as part of the game, so you're paying latency twice for fewer gains.

I'd expect that to be true on the 3ds because i'd expect cartriges manufacturing to be expensive and squeezed as much as possible in a few standard sizes (not sure), and they use crypto, so they already pay that preprocessing price. Encryption ruins compression too (and binary patching unfortunately for romhacking).

There are other reasons to compress, ofc (such as projects like retroarch wanting checksums and taking them direct from precalculated size headers), or consoles like the dreamcast that zero filled immense parts of the cd etc.

I myself am using external filesystem compression because it's oriented to be more 'random access' than archive compression, and this way i don't depend on emulator support, though i pay the price in retroarch slowing down scanning immensely because of fallback to using crc32/sha1/md5 on the files themselves.

If i actually had to chose a file compression format i'm quite taken with the .chd format for hdd and cdroms and i wish a dumping project like redump took it up as a standard. It uses FLAC by standard on wav data, gets rid of zero fills quite nicely and compresses well on the dreamcast set. A few specializations for each console, such as getting rid of ecd on ps1 for a function, or getting rid of the noise on wii isos, and i can see it as quite a great lossy-but-not-really format for all consoles, or nearly. For hdd, i suspect they may have copy-on-write at the byte level in the future, which is super great for granularity and keeping the 'rom' safe.

I had a moment and decided to test. I picked a large game, specifically the encrypted No-Intro verified Bravely Second - End Layer (USA) (En,Fr,Es).

4.0GiB (4294967296 bytes) - Uncompressed
2.6GiB (2708261293 bytes) - Max compression 7z
2.6GiB (2709183215 bytes) - HDD format chd

Checking the chd breakdown, I got this:

     Hunks  Percent  Name
----------  -------  ------------------------------------
   661,095    63.0%  Uncompressed
   387,471    37.0%  Copy from self
         8     0.0%  LZMA
         1     0.0%  Deflate
         1     0.0%  FLAC

The large uncompressed chunk may be a result of being encrypted, I'm not sure. I'll have to check into that. I'm not too intimately familiar with the 3ds format structure myself. Regardless, the result here is that 7z and chd produce nearly an identical filesize (1MiB difference), both save 1.4GiB over uncompressed, but 7z requires full extraction to be used, while chd can be read directly in that format at near-uncompressed speed. Given the advantages, you might be surprised how many people get on board with the format. After adding support to various libretro cores, chd has become extremely popular (for CD/GD-ROM games).

In case it hasn't already been made clear, chd support can be added through the libchdr library:
https://github.com/rtissera/libchdr

EDIT: Also, chd is lossless (perfectly reversible) and perfectly deterministic.

@Sanaki A freshly dumped game file from cartridge usually contains massive amount of zero/padding bytes at the end. This is because game cartridge only have a few capacity types that are usually powers of 2, and most game dumper software, for the sake of keeping data original, will dump all the padding bytes even though they do nothing.

I would bet that Bravely Second has ~1.4GiB zero bytes. If you simply remove these 1.4GiB bytes from the end, the game can still run perfectly, and now you get a nice uncompressed 2.6GiB game file. All compression algorithm will give you this the same result due to the rest of data being encrypted and incompressible, so comparing compression algorithm here is meaningless.

If you run Bravely Second through a decryptor, you end up with a 2.6 GB
file so I suspect you are correct. 7zip with ultra LZMA2 compression can
get that down to 2.2 GB. So maybe the savings aren't as drastic as I would
have hoped. I believe CHD and 7zip have libraries you can bring for support.

This would benefit people who have padded/unencrypted roms more and less
for people with decrypted roms.

On Tue, Jul 21, 2020 at 3:31 PM Weiyi Wang notifications@github.com wrote:

@Sanaki https://github.com/Sanaki A freshly dumped game file from
cartridge usually contains massive amount of zero/padding bytes at the end.
This is because game cartridge only have a few capacity types that are
usually powers of 2, and most game dumpers, for the sake of keeping data
original, will dump all the padding bytes even though they do nothing.

I would bet that Bravely Second has ~1.4GiB zero bytes. If you simply
remove these 1.4GiB bytes from the end, the game can still run perfectly,
and now you get a nice uncompressed 2.6GiB game file. All compression
algorithm will give you this the same result due to the rest of data being
encrypted and incompressible, so comparing compression algorithm here is
meaningless.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/citra-emu/citra/issues/2808#issuecomment-662090764,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAG4XSIKOUQY2KGJR6AUMWTR4X3LVANCNFSM4DQ4J7HA
.

You generally get nearly identical sizes between 7z and chd (with chd slightly larger on average), since it compresses each hunk individually based on which compression method is the most efficient for it. I'm aware of the zero padding, though I wasn't aware 3ds can potentially function with it trimmed. Decompressed being 2.2GiB does confirm that it's the encryption resulting in the large uncompressed block. I'll decrypt mine later to provide an updated comparison, since that'd be far more useful.

Creating sparse files that get filled with the real part of the roms might be possible in some (linux) filesystems if you want to optimize both size and latency.

Personally i never compress the more modern consoles rom formats, unless they're unencrypted because it's otherwise a waste of time. Encryption 'as a standard' really fucked up ROM compression as was probably intended to. If the filesystem inside the ROM is also compressed it also tends to be useless.

Right, got it decrypted. New stats for that (decrypted No-Intro verified Bravely Second - End Layer (USA) (En,Fr,Es)):

4.0GiB (4294967296 bytes) - Uncompressed
2.2GiB (2302623720 bytes) - Max compression 7z
2.3GiB (2440564011 bytes) - HDD format chd

Definitely not a game that compresses well in general, regardless of method (other than the padding of course, which as pointed out previously is only relevant to people who meticulously keep perfect No-Intro sets).

Compression breakdown:

     Hunks  Percent  Name
----------  -------  ------------------------------------
   143,945    13.7%  Uncompressed
   389,619    37.2%  Copy from self
   147,356    14.1%  LZMA
    33,878     3.2%  Deflate
   333,146    31.8%  Huffman
       632     0.1%  FLAC

Is there any reason why gzip wouldn't work? PCSX2 uses gzip with a separate index to minimise random access times, which bypasses the issue with 7zip not being seekable.

Also, I see no value in keeping ROMs encrypted once they've been dumped, so it doesn't matter how poorly encrypted files compress. Surely the point of compression over trimming is to eliminate padding in a reversible manner?

Encryption is only part of the story. The game companies themselves that make the games use compressed filesystems and compressed fileformats (textures and meshes...) in the game engine, and compressing already compressed data is 'almost' as bad as compressing encrypted data. That's why compressing a unencrypted rom above didn't give major gains over compressing the encrypted rom.

That said it's completely worthwhile to try to remove the 0-pads at the end of the roms, but i suspect a more or less equivalent effect (+/- 100 mb) could be had in most cases by using a simple zip in the 'non-compressed' setting that only makes 0-sequences minimize to 2 bytes or something.

I wish there was a simple way to make a complex file sparse in ext4 or something like that, so the extra format on top was unnecessary.

Yeah, compressing the 0-padding is the real benefit here. Dolphin does something like this with its RVZ file format, which is basically just Deflate compression applied to the garbage data, while the game files that are already compressed are left alone.

Also, there's no need for the archive to even be seekable if Citra just unzips them before running them and then deletes the file again afterwards. This is how Project64 handles zip files. I just think this topic has been dismissed prematurely.

Also, there's no need for the archive to even be seekable if Citra just unzips them before running them and then deletes the file again afterwards.

I disagree on this point. Extraction is a functional solution, but distinctly worse than being directly seekable. It comes with a moderate to severe slowdown on -every- launch (especially for games that approach 4GiB), contributes to HDD or especially SSD wear for anyone not using a ramdisk as a cache location, and can be entirely unworkable on limited memory systems. By no means am I saying it shouldn't be an option, but there are distinct advantages to directly seekable compressed formats for a large number of users.

If extraction isn't considered acceptable, then wouldn't something like nicoboss's NSZ format do the job? It uses Zstandard to compress 256 KB chunks while uncompressible chunks are stored without compression, making fast random reads possible. This is very much along the same lines as Dolphin's RVZ format and seems to have been designed with emulator support in mind.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

FearlessTobi picture FearlessTobi  Â·  3Comments

KillzXGaming picture KillzXGaming  Â·  3Comments

Allanouille picture Allanouille  Â·  3Comments

ghost picture ghost  Â·  3Comments

Atsuraelu picture Atsuraelu  Â·  3Comments