Zstd: Proposal for a new Streaming API

Created on 1 Aug 2016 · 25Comments · Source: facebook/zstd

A new streaming API is in preparation, planned to be released with v0.8.1.

_Background_ : Currently, there are 2 streaming API :

ZSTD_compressContinue() : a synchronous, bufferless streaming API, which gives full control over buffer management to users. Alas, control also means responsibility and complexity, so it's not a recommended API, only exposed for experts.
ZBUFF_compressContinue() : this is a much more flexible API, freeing users from any condition on input / output buffer. The price to pay is that ZBUFF allocates and manages its own internal buffers, making it consume memory and is _slightly_ slower.

Retrospectively, the better streaming API is ZBUFF, due to its ease of use.
But it's exposed into its own zbuff.h, and is therefore regularly missed by would-be users.

The objective is to merge ZBUFF API into zstd.h, and in the process makes it swap its ZBUFF_ prefix for a ZSTD_ one.
And while at it, improves the prototypes it contains.

In a nutshell, here is the current streaming API proposition :

typedef struct ZSTD_readCursor_s {
  const void* ptr;
  size_t size;
} ZSTD_readCursor;

typedef struct ZSTD_writeCursor_s {
  void* ptr;
  size_t size;
  size_t nbBytesWritten;
} ZSTD_writeCursor;

/* streaming compression */

typedef struct ZSTD_CStream_s ZSTD_CStream;
ZSTD_CStream* ZSTD_createCStream(void);
void ZSTD_freeCStream(ZSTD_CStream* zcs);

size_t ZSTD_initCStream(ZSTD_CStream* zcs, int compressionLevel);
size_t ZSTD_compressStream(ZSTD_CStream* zcs, ZSTD_writeCursor* output, ZSTD_readCursor* input);
size_t ZSTD_flushStream(ZSTD_CStream* zcs, ZSTD_writeCursor* output);
size_t ZSTD_endStream(ZSTD_CStream* zcs, ZSTD_writeCursor* output);


/* streaming decompression */

typedef struct ZSTD_DStream_s ZSTD_DStream;
ZSTD_DStream* ZSTD_createDStream(void);
void ZSTD_freeDStream(ZSTD_DStream* zds);

size_t ZSTD_initDStream(ZSTD_DStream* zds);
size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_writeCursor* output, ZSTD_readCursor* input);

An advantage of this construction is that cursor structures are auto-updated at each invocation,
meaning it's enough to re-present them "as is" on next function call if there is some remaining work to do,
freeing user from pointer arithmetic.
Here is an example usage :

static void compressFile(const char* fname, const char* outName, int cLevel)
{
    FILE* fin = fopen(fname, "rb");
    FILE* fout = fopon(outName, "wb");
    ZSTD_CStream* cstream = ZSTD_createCStream();
    void*  buffIn  = malloc(ZSTD_recommendedBuffSize_CIn());
    size_t buffOutSize = ZSTD_recommendedBuffSize_COut();
    void*  buffOut = malloc(buffOutSize);
    size_t read, toRead = ZSTD_recommendedBuffSize_CIn;

    ZSTD_initStream(cstream, cLevel);

    while( read = fread(buffIn, 1, toRead, fin) ) {
        ZSTD_readCursor cursin = { buffIn, read };
        while (cin.size) {
            ZSTD_writeCursor cursout = { buffOut, buffOutSize, 0 };
            toRead = ZSTD_compressStream(cstream, &cursin , &cursout);
            fwrite(buffOut, 1, cursout.nbBytesWritten, fout);
        }
    }

    ZSTD_writeCursor cursout = { buffOut, buffOutSize, 0 };
    ZSTD_endStream(cstream, &cursout);
    fwrite(buffOut, 1, cursout.nbBytesWritten, fout);

    fclose(fout);
    fclose(fin);
    free(buffIn);
    free(buffOut);
}

The inner loop is pretty tight and expressive.
New API looks good from C and C++ user space.

However, I've less experience in bindings from other languages (Python, C#, Java, Rust, Ada, etc.), and since this API proposes "structures by reference" as parameters, and I would like to know if this is good enough or a source of concerns for binders.

cc : @luben , @jrmarino , @KrzysFR , @Gyscos, @sergey-dryabzhinsky, @chipturner, @jmoiron

question

Source

Cyan4973

👍1

All 25 comments

Yes, this will significantly reduce the complexity. Do you have it in the dev branch? I want to try it for the JVM bindings before setting it in stone.

luben on 1 Aug 2016

👍1

@luben :

Do you have it in the dev branch? I want to try it for the JVM bindings before setting it in stone.

Not yet, I'll drop a word here when it's ready ... ;)

Cyan4973 on 1 Aug 2016

@luben : the new streaming API is available in branch "newStreamingAPI" :
https://github.com/Cyan4973/zstd/tree/newStreamingAPI

It can be found in the experimental section of zstd.h

This early version is lacking some comments to help out. Don't hesitate to ask when needed.

Cyan4973 on 12 Aug 2016

Looks great. I am trying it now following the examples. Will come back if I have any difficulties

luben on 13 Aug 2016

👍1

I am currently looking at the stream compression example. It assumes pull-based reads (the compressor initiates the reads of the input stream) but if you want to make it work for streams it should be push-based (i.e. the writes in the input buffer are externally initiated) so the cursors should be heap allocated as they have to survive between function invocations as you may not have enough data in the input buffer to complete the write. It is doable with the proposed API but we have 2 additional heap allocations for the read and write cursors and have to handle their lifetime / de-allocation properly. Currently I am experimenting with allocating them as one struct that will contain the ZSTD_CStream pointer + ZSTD_rCursor + ZSTD_wCursor. But then why not include the cursors in the ZSTD_CStream and skip the pointer dereferencing? As far as I see there is 1-to-1 mapping of cursors and buffers and compression/decompression stream context. What are the advantages of having them as separate entities?

luben on 14 Aug 2016

It's an interesting point,
I will add @chipturner to this discussion,
as he is the creator of the cursor structures.

A few reasons that come to mind :

ZSTD_CStream is an incomplete type. It cannot "contain" anything.
- though it looks possible to create an "higher level" structure, containing ZSTD_CStream pointer and both cursors, as you suggest
ZSTD_rCursor is not present in ZSTD_flushStream() and ZSTD_endStream().
- but I guess it would be possible to nonetheless provide it and simply not read nor update it.

the cursors should be heap allocated as they have to survive between function invocations

I'm not sure to follow this part, though my experience is limited to C, and this could be a consequence of JNI interface.
In C at least, it's possible to create cursors on stack, and provide their pointer to the function.
Cursors are then checked, to ensure everything happened properly (such as : was the input entirely consumed). If not, additional action are taken, such as re-presenting input data not yet consumed.

I suspect it must be different in Java, and just don't see why.

Cyan4973 on 14 Aug 2016

Yes, I was intending to make a higher level structure for the ZSTD_CStream pointer and the cursors (I am not sure I will need them both there).

On the other point, I think you are right I may go with allocating them on the stack and just preserving the toRead outside the method / function.

luben on 14 Aug 2016

Regarding the heap allocations of the cursors, yes it is because of limitation of the JNI interface but of another nature (we can still stack allocate). In the JNI functions if you claim some buffers (not to be touched from the GC) you can't call other JNI functions, so no way to call back to java to write the outBuff. This limits the compression JNI function to something like:

size_t compress(ZSRT_CStream cstream, jobj inBuff, int inSize, jobj outBuff, int outSize);

that has just to compress the inBuff into outBuff and communicate some information to the caller: the size of the compressed chunk to be written and preserve the cursors as many invocations of the function may be needed because the while loop (from the example) is driven from outside, from the java world, that is also writing the results.

P.S. Looks even harder as the buffers can be moved by the GC and invalidate the cursors.

luben on 14 Aug 2016

👍1

OK thanks,
seems there are difficulties, seems there are some possible work-around ,
so to summarize :

Is currently proposed streaming API usable ?
Is it worse or better than previous one (from a JNI perspective) ?
Could it be improved ? If yes, how ?

Cyan4973 on 14 Aug 2016

Looked around a little bit more. I think it's huge simplification from the users perspective and it can be applicable in a lot of cases and runtimes. It can also work for the JNI input/output streams but it will need one more memory copy, so making it 2 more copies over the currently used compression/decompression implementation:

copy the input byte array in the input buffer (because the input array can be moved by the GC and invalidate the read cursor and the input buffer will be allocated with malloc, outside the GC reach)
then another copy can happen in the ZSTD_(compress|decompress)Stream
after compression/decompression it should copy cursout.nbBytesWritten from output buffer into an byte array to be returned (the output buffer will be allocated by malloc so that it is not touched by the GC)

My current implementation uses the low-level API.

For compression:

it copies the input into aligned input buffer
when one block (128K) is filled it calls the compressContinue (it does not copy buffers) and the whole compressed/decompressed block is available in the output buffer / array
the produced output is directly usable to pass to the next stage.

And for decompression:

we read from the source until will fill the input buffer up to the requested size.
we decompress it
we copy the result from the output buffer to the result array.

I think the extra copies with the new streaming API (because the GC intrinsic and JNI limitations) will pose a penalty on the performance. I will take a look and compare with the ZBUFF variants as they may be more suitable for my case because they will do the buffer management themselves.

luben on 14 Aug 2016

👍1

copy the input byte array in the input buffer (because the input array can be moved by the GC and invalidate the read cursor

OK. Indeed, it makes for additional memory copy.
Let's assume that we want to get rid of this memory copy, by providing directly "byte array" to ZSTD_compressStream().

I'll take the following assumptions (might be wrong) :

byte array can be moved by GC, but not _during_ a JNI call (confirmed ?)
it may happen any time after the JNI call (hence cursin.ptr is no longer correct)

But I'm missing an important element :

could byte array be moved by GC before the JNI call, and specifically _between_ setting cursin and ZSTD_compressStream() invocation ?
In which case, it would be a real issue, since provided cursin.ptr would be false.

It would require to send "starting position of byte array" directly as a parameter of the JNI invocation.

Cyan4973 on 14 Aug 2016

It can't be moved during the JNI call after a critical lock is held over it. The other problem is that while you hold the lock you can't invoke other java functions (e.g. invoke the next "write" handler with the current output buffer) before releasing the lock. But the GC is free to move the buffer if it is not locked.

This limits the JNI function to return the buffer (after releasing the lock) so that the result can be passed to the next "write" handler but during this time the input and output buffers may be moved and invalidate the cursors.

I am experimenting now with the ZBUFF variants. I will report later but it looks quite good as it moves all the input copies in the zstd code and I don't have to manage them manually.

luben on 14 Aug 2016

I'm a bit puzzled,
ZSTD_compressStream() and ZBUFF_compressContinue() are basically doing the same thing, that is they copy input into their internal buffer before starting compression.
So both API should behave the same...

Cyan4973 on 16 Aug 2016

Yes, but in the ZBUFF case you manage the pointer to the source externally (it's not persisted in the cursor) and so the buffer can be moved (by the GC) between invocations to ZBUFF_compressContinue.

BTW, I did the compression side and it falls nicely:

here is the C code: https://github.com/luben/zstd-jni/blob/zbuff/src/main/native/jni_outputstream_zstd.c#L59

here is the Java code: https://github.com/luben/zstd-jni/blob/zbuff/src/main/java/com/github/luben/zstd/ZstdOutputStream.java#L60

and it performs at the same level as the previous version that was using ZSTD_compressContinue and managing the buffers on the java side. I like the new version better as I don't have to deal with the zstd internals (block sizes, buffer sizes, etc)

luben on 16 Aug 2016

So, my understanding is,
you can't rely on cursor.ptr,
because the buffer it relates to may be moved to another place by the Garbage Collector.
cursor.ptr must be manually reset before each JNI invocation,
to be sure it's still inside the valid buffer region.

What you can rely on is the nb of bytes read or written during JNI invocation,
which is valid even if src/dst buffers are moved around.

There is already a cursout.nbBytesWritten field which provides what its name implies.
What about adding a cursin.nbBytesRead ?
With both fields, it seems it would provide you with the same information as ZBUFF_* ?

Cyan4973 on 16 Aug 2016

Yes, that can work: I can reset the the pointers on each invocation.

One question: do the *ptr-s will be moving to then start of next unconsumed data like it is done now or they will always point to the beginning of the buffers and the start of the next data to read be ptr + nbBytesRead?

luben on 16 Aug 2016

Indeed, that's a question I had in mind.

The current behavior is to move ptr to its next position, hence the name, cursor.

Now, thinking about it, a close enough construction could change ptr into src (or dst), which would always point at the beginning of the buffer instead. ZSTD_compressStream() would automatically deduct ptr from src + pos.

A subtle difference is that it's no longer possible to use nbBytesWritten as a total accumulator.
It will be replaced by pos, which is necessarily a position relative to dst.
Hence, if totalNbBytesWritten is necessary, it must be a separate accumulator.

That's a minor downside though.
If it feels preferable to update src (beginning of buffer) rather than ptr (beginning of buffer + nbBytesRead), it looks to me a good enough reason to make such a change.

The new object would no longer be a cursor then, but something like this (for discussion) :

typedef struct {
  const void* src;
  size_t pos;
  size_t remainingSize;   /**< or totalBufferSize ? */
} ZSTD_rBuff;

typedef struct {
  void* dst;
  size_t pos;
  size_t remainingSize;   /**< or totalBufferSize ? */
} ZSTD_wBuff;

remainingSize is a direct equivalent of today's size, that is it shrinks as pos increases.
It was logical in combination with ptr.
Checking the end then is just checking size==0.

Now, with a totalBufferSize instead, end condition would rather be buff.pos == buff.totalBufferSize.
Maybe that's more logical than a shrinking remainingSize ?

_Update :_

Maybe something like this :

typedef struct {
  const void* src;   /* beginning of buffer */
  size_t size;    /* total size of buffer */
  size_t pos;     /* current cursor position, necessarily <= size */
} ZSTD_rBuff;     /* is this a good enough name ? */

Cyan4973 on 16 Aug 2016

I think both variants will work and are equivalent. In the second case I have the size of the buffer in the java code so I don't have to communicate it back from the C code, so my check will be buff.pos == theSizeOfBuffer

One question: if the src is declared const can I update it to point to possibly moved buffer?

And one suggestion: we can use one type for both the src and dst cursors.

luben on 16 Aug 2016

One question: if the src is declared const can I update it to point to possibly moved buffer?

I think so.
The const in const void* is a promise by ZSTD_compressStream() that it will not modify the _content_ of the buffer.
Therefore the pointer itself can be moved anywhere as long as it is in a valid memory region.

_PS_ : not to be confused with void* const ptr, in which case, ptr value can no longer be changed.

we can use one type for both the src and dst cursors.

Almost, indeed, the only difference is the const property of the pointer.
I tend to consider such property as important, hence I made 2 types to preserve it.

Note that, in case of "const transitivity check" (ex: -Wcast-qual on gcc), a function receiving a const void* argument will generate a warning if it tries to assign it (or cast it) to a void* pointer.

Cyan4973 on 16 Aug 2016

yes, makes sense. and we can keep the 2 pointer names more descriptive.

luben on 16 Aug 2016

new API has been pushed into branch newStreamingAPIv2 :
https://github.com/Cyan4973/zstd/tree/newStreamingAPIv2

It proposes the following structures :

typedef struct ZSTD_inBuffer_s {
  const void* src;    /**< start of input buffer */
  size_t size;        /**< size of input buffer */
  size_t pos;         /**< position where reading stopped. Will be updated. Necessarily 0 <= pos <= size */
} ZSTD_inBuffer;

typedef struct ZSTD_outBuffer_s {
  void*  dst;         /**< start of output buffer */
  size_t size;        /**< size of output buffer */
  size_t pos;         /**< position where writing stopped. Will be updated. Necessarily 0 <= pos <= size */
} ZSTD_outBuffer;

Cyan4973 on 17 Aug 2016

Thanks. Looks it will work. I will try it.

luben on 17 Aug 2016

👍1

I have moved the compression part over the new API and it works great. I have to only preserve only the positions inside the buffers and the cursors can be reconstructed on the stack from the known params.

Here is how it looks the C part: https://github.com/luben/zstd-jni/blob/zstream/src/main/native/jni_outputstream_zstd.c#L58

And here is the Java part: https://github.com/luben/zstd-jni/blob/zstream/src/main/java/com/github/luben/zstd/ZstdOutputStream.java#L59

Regards

luben on 17 Aug 2016

👍1

New streaming API merged into "dev" branch

Cyan4973 on 17 Aug 2016

Added in v0.8.1

Cyan4973 on 18 Aug 2016

Was this page helpful?

0 / 5 - 0 ratings