I'm implementing a decoder in Go (very much a work in progress; see https://github.com/mvdan/zstd). Something that I've been wondering about is how I could reasonably test whether the decoder is fully featured or not, compared to the main implementation here.
Is there a set of test cases somewhere that I could reuse? I know that this repository contains a lot of tests (including fuzz tests), but I haven't seen any table-driven test cases. For example, it would be useful to have lots of "here is the zstd input, here's the expected decompressed output" bytes.
A similar set of inputs could be maintained for errors, such as those involving corrupted blocks or those that tend to be problematic (e.g. window size of 1TB).
If there already is something like that somewhere, then perhaps it should be documented better. For example, a document with tips and helpful links for those trying to implement their own decoders and/or encoders.
Very nice project @mvdan !
And yes, it's a very good point, it happens we indeed have such a tool.
It's decodecorpus, in the /tests directory.
Documentation is here : https://github.com/facebook/zstd/tree/dev/tests#decodecorpus---tool-to-generate-zstandard-frames-for-decoder-testing
though it's limited to generating a bunch of _valid_ frames.
For invalid frames, we tend to use fuzzer tools.
The goal is less to test if the decoder can detect invalid frames (that's actually the job of the checksum), and more to ensure that no invalid data can trigger side effects such as read out-of-bound for example. The fuzzer is generally seeded with a list of known problematic data patterns, and then search new cases automatically.
There is also a regression test which is played regularly to check the decoder against a curated list of bad inputs. The regression test is in https://github.com/facebook/zstd/tree/dev/tests/fuzz, though unfortunately, it's currently hard-wired to work with zstd, so using it for your version will require a bit of work modifying the python script.
Maybe @terrelln can learn us more about it.
Thanks for the pointers! I had not noticed that there was a tool to generate valid frames and their expected output.
I still think it would be useful to have a document somewhere with useful links and tools for those trying to implement zstd, as otherwise one might just look at the compression format spec.
agreed !
There was a hint of this feature in the educational_decoder's README, but it was clearly not enough since you didn't find it.
Let's make the documentation clearer...
Amazing - I skimmed past this in two different READMEs :) I was looking for static data, such as arrays in a C file or lots of files somewhere in the git repo, so I think it's a case of not knowing what to look for.
/doc/README updated