Pcsx2: [Request] Support games compressed in .xz format

Created on 7 Jun 2017  路  19Comments  路  Source: PCSX2/pcsx2

PCSX2 has a shiny new .xz module. Any chance of users being able to shrink down their game library because of it?

Core Enhancement / Feature Request

Most helpful comment

This is the most promising I've found.

https://github.com/aaru-dps/Aaru

All 19 comments

I would love it.

Some info for future implementer

  • index information is builtin in the xz format
  • xz-utils code can be used as an example to get the index information

Future/new version of xz (5.3) add a new API to parse block information: lzma_file_info_decoder. This function decodes all headers data and creates a block list info structure (type lzma_index). Note, xz calls block index

Then you can create an iterator lzma_index_iter_init. (Note next element can be get with lzma_index_iter_next but I think it is useless).

You can directly go to the good block with lzma_index_iter_locate which will return the iterator of the address you want to decode.

Summary:

  • lzma_file_info_decoder(&stream, &index, ...); : out index
  • lzma_index_iter_init(&iterator, index); out iterator
  • lzma_index_iter_locate(&iter, uncompressed_address): out correctly set iterator
  • Allow to use iter.block.compressed_file_offset, iter.block.uncompressed_file_offset and iter.block.uncompressed_size

API is not clear for me. There are lzma_block/lzma_index/lzma_iter objects. It seems that above iterator should be used to decode header block info.

while (!lzma_index_iter_next(&iter, LZMA_INDEX_ITER_BLOCK)) {
       lzma_block block;
      uint8_t header_size = fread of 1 bytes at iter.compressed_file_offset
      block.header_size = lzma_block_header_size_decode(header_size);
      // XXX need to block.version ! And likely block.check
      lzma_block_header_decode(&block, ..., compressed_buffer);
      all_info_struct.push(block);
   }

With block and file info, you can directly use lzma_block_buffer_decode

lzma_block_buffer_decode(lzma_block *block, const lzma_allocator *allocator,
        const uint8_t *in, size_t *in_pos, size_t in_size,
        uint8_t *out, size_t *out_pos, size_t out_size)

So I read the xz file format specification. Information are duplicated for redundancy/corruption checking.

So the full story is

  • block contains header + data. Header contains header size and may (depends on compression flags) contains compressed/uncompressed size of the block.
  • index is a list of block records. Each record contain the unpadded size and the uncompressed size.

NOTE: I checked a binary on my computer and size aren't present in block header.

Conclusion we need first to decode index to get the various offset of the blocks. Iterator allow us to iterate on block records.

@turtleli
I would need a newer version of xz to do some tests (unreleased actually). How can I sync https://github.com/PCSX2/xz.git with latest upstream git ?

Edit: actually don't bother I pulled something in a local branch. It should be enough.

While xz is definitely popular, if possible, I'd suggest to also examine/play with newer compression formats like zstd and brotly or maybe some of the LZ* family. In my experience with the gzip implementation, random access decompression speed is _the_ key for avoiding lot of pitfalls, workarounds and caches.

Also, it's best to avoid creating an index, and instead stick to formats/configurations which provide their own index as part of the standard - and require users to use these configurations only (I don't know how much this is true for the formats i mentioned).

So I'm rather close of a working prototype (based on latest xz git). I manage to uncompress a couple of blocks. And cdvd format seem to be detected correctly. But it fails later. Maybe an issue with block boundary. I need to double check the logic..

Good news I manage to boot a game. The issue was on the blocksize/blockcount management. Honestly the logic should be moved into the base class. Anyway XZ stuff is done 馃憤

As a side note, xz could also be a neat replacement for save state too. I saved 30% with a repack of the savestate.

Is this waiting on a new XZ release?

Yeah a new XZ release would help. We would need to release 1.6 too. I don't want to requires an alpha release of XZ for our release. I'm also waiting to have free time to merge the code.

Any news regarding it?

Nope, everything you see in the pull requests section is what's currently being worked on.

I hope that it gets added soon.

I've just been checking for new XZ releases. This is the biggest gap between releases they've had in a long while, so hopefully it's soon.

Looks like an alpha build of xz utils was published with the lzma_file_info_decoder() API

https://git.tukaani.org/?p=xz.git;a=blob_plain;f=NEWS;hb=114cab97af766b21e0fc8620479202fb1e7a5e41

Whats happening/ed with this? I've just done a ton of tests to compress my PS2 library and out of gz, zip, 7z, rar, cso and xz, xz had the lowest filesize and did it _suspiciously_ fast 10MB/s with the strongest compression. Gz was doing about 2MB/s (less cores used) Im using the one built into current 7z.

Space savings roughly 8-15% over gz. Thats several hundred GB saved for larger collections.
Theoretically~ if we keep implementing new better compression every few years, in 200 years or so a massive Ps2 collections will be under 10KB

Nothing has really happened I'm afraid. xz got added to PCSX2 for making GS dumps, but no support for loading games yet, it's not really been much of a priority.

Theoretically~ if we keep implementing new better compression every few years, in 200 years or so a massive Ps2 collections will be under 10KB

That's big brain right there, but I don't think that's how compression works.

Btw, I found this codec recently. The promise is a similar lzma compression ratio, but a much faster decompression speed.

https://github.com/richgel999/lzham_codec

However, I don't know if we can chunk the bitstream for random access

This is the most promising I've found.

https://github.com/aaru-dps/Aaru

Was this page helpful?
0 / 5 - 0 ratings