Zstd: ZSTD_compressStream2 can't stop with wrong ZSTD_EndDirective argument

Created on 8 Sep 2020  路  4Comments  路  Source: facebook/zstd

ZSTD_EndDirective enum's value is 0 (continue), 1 (flush), 2 (end).
If pass -1 or 3 to ZSTD_compressStream2() function, the function can't stop.

Seems an infinite loop in ZSTD_compressStream_generic():
https://github.com/facebook/zstd/blob/dev/lib/compress/zstd_compress.c#L3808.

Test code:

#include <stdio.h>     // printf
#include <zstd.h>      // presumes zstd library is installed

void compress()
{
    char buffer[256];
    ZSTD_inBuffer in;
    ZSTD_outBuffer out;
    ZSTD_CCtx *cctx;

    in.src = &in;
    in.size = 0;
    in.pos = 0;

    out.dst = buffer;
    out.size = sizeof(buffer);
    out.pos = 0;

    cctx = ZSTD_createCCtx();

    printf("ZSTD_compressStream2 started.\n");
    ZSTD_compressStream2(cctx, &out, &in, 3);  // can't stop other than 0,1,2
    printf("ZSTD_compressStream2 finished.\n");
}

int main(int argc, const char** argv)
{
    compress();
    return 0;
}
enhancement

Most helpful comment

Passing a value other than a valid ZSTD_EndDirective is undefined behavior. It would be nice to report an error instead of infinite looping, but it isn't a requirement of the API.

All 4 comments

Passing a value other than a valid ZSTD_EndDirective is undefined behavior. It would be nice to report an error instead of infinite looping, but it isn't a requirement of the API.

Passing a value other than a valid ZSTD_EndDirective is undefined behavior. It would be nice to report an error instead of infinite looping, but it isn't a requirement of the API.

Yes, downstream code can easily handle this problem.

Considering any value beyond the valid range of ZSTD_EndDirective as UB (Undefined Behavior) on reaching ZSTD_compressStream_generic() is the correct thing to do to, since it's an internal function, and everything should be clean on reaching this point.

In contrast, ZSTD_compressStream2() is a public function, and while it's reasonable to require endDirective to be valid, it's also a place where we have no direct control; the user is entirely in charge, and could mess up.

A similar scenario is providing an input where in.pos > in.size, which is obviously invalid.
We could also invoke UB, but have also decided to actively check that the structure is valid, and return an error code when it's not.

This is in part because this check is trivial and costs (proportionally) almost nothing.
Checking the validity of endDirective seems almost as simple. Therefore, we could add an equivalent check.

As a general principle, relying solely on UB, hence transferring the full responsibility of respecting interface's requirements to the user, is the right thing to do when checking these requirements is too difficult or too costly. When when such a cost is minimal, it's generally better to reduce UB territory.

It would be nice to not add any overhead.
Since the user of zstd are almost developer, maybe can verify the parameter with assert().

Basically, it is not easy to make mistakes when using endDirective argument.

Was this page helpful?
0 / 5 - 0 ratings