Aspnetcore: Smarter output buffering

Created on 9 Apr 2017 · 20Comments · Source: dotnet/aspnetcore

In order to improve application performance, we might be able to do some clever buffering that won't affect most applications. The theory is that we can buffer all output until there's no more input in the pipe. Even though there's no correlation between input and output, this shouldn't introduce bad buffering behavior in most cases and will improve pipelined requests dramatically. We can still respect the IHttpBufferingFeature feature interface if there are cases where the application isn't reading the body and is writing to the output (this is bound to break somebody).

We can bury this behind a flag and only enable it in the benchmark if we saw issues with it. Also respecting the IHttpBufferingFeature gives applications a way to the immediate write behavior.

affected-very-few area-servers enhancement servers-kestrel severity-nice-to-have

Source

davidfowl

Most helpful comment

Would want to switch it off for upgraded streams (websockets)

benaadams on 9 Apr 2017

👍2

All 20 comments

Or buffer until (packet full (1400bytes ish) || no more input) Layer 7 (Application) Nagel rather than TCP's Nagel; as TCP doesn't know if there is going to be more data (bad); whereas the Application does (good).

benaadams on 9 Apr 2017

Would want to switch it off for upgraded streams (websockets)

benaadams on 9 Apr 2017

👍2

And make the size bigger for tls :)

Drawaes on 9 Apr 2017

Saw 50% gain when experimenting with this before https://github.com/aspnet/KestrelHttpServer/pull/1236

benaadams on 9 Apr 2017

The output buffering helps a lot for the plaintext benchmark. The test itself seems very artificial (100% load + all synchronous responses).
Does it happen often that output buffers can be stitched together on their way to the kernel?

tmds on 10 Apr 2017

@tmds TCP has Nagle which is on sockets by default to do this; its switched off by default in Kestrel as it interacts badly with delayed acks which is a harder thing to control; also introduces transmission delays.

At the TCP layer the kernel isn't aware of what will be happening next to the application code; whereas at the application layer there is more knowledge; so its mostly reintroducing the benefits of Nagle; but with added context to remove most of the disadvantages.

benaadams on 10 Apr 2017

Here the application layer has control by flushing when a meaningful amount of data is available (e.g. http response). In the Nagle case, most of the small (1 byte) packets were not meaningful on their own.

An interesting observation is the libuv thread already delays writes. For the plaintext benchmark this coalesces the outputs of the pipelined requests. The delay on the libuv thread depends on the load of the thread which makes it somewhat self regulating. When there is more load, the delay is higher and the chances of aggregating become higher too. When the load is low, there is less aggregating (but there is also more capacity to do separate sends).

Aggregating will increase latency.

tmds on 10 Apr 2017

@pakrym This is lower priority than everything else, and if we can't get it done it's fine.

muratg on 10 Apr 2017

Add another method to pipe? SignalAsync where flushing optional for the receiver vs FlushAsync where it is compulsory?

So Commit => Signal(+Commit) => Flush(+Commit+Signal)

or add a param to FlushAsync

FlushAsync(bool allowBuffering = false)

benaadams on 10 Apr 2017

TryFlush

Drawaes on 10 Apr 2017

I think TryFlush is too ambiguous in meaning? Would need to be TryFlushAsync as may bock.

This is contrary to TryRead which means don't block.

benaadams on 10 Apr 2017

Actually don't you have commit vs flush? At the moment I commit the multiple messages in a "flight" and flush on the last

Drawaes on 10 Apr 2017

3 ways to "end" writing could become a mess

Drawaes on 10 Apr 2017

3 levels of granularity; each one would also do the one above

 |  Commit -> make data available
 |  Signal -> tell data is available              (backpressure + schedule)
 v  Flush  -> tell to do something with the data  (backpressure + schedule)

benaadams on 10 Apr 2017

👍1

so what happens if you call signal and you are full (need to apply back pressure) does it basically become a flush?

Drawaes on 10 Apr 2017

Yeah; that's kind of the point. Optional flush vs compulsory flush.

So loop could be something like

while (true)
{
    var result = await Input.ReadAsync();
    var buffer = result.Buffer;

    while (!buffer.IsEmpty)
    {
        while 
        {
            // Do work
            while 
            {
                // Do sub work
                Output.Advance(bytes);
            }
            Output.Commit();
        }

        Input.Advance(buffer.End);

        // Check for more data
        if (Input.TryRead(out result))
        {
            buffer = result.Buffer;
            // Signal data is ready before processing more
            await Output.SignalAsync();
        }
    }

    // Flush before blocking for more input
    await Output.FlushAsync();

    if (result.IsCompleted)
    {
        break;
    }
}

Output.Complete();

benaadams on 10 Apr 2017

And for a simple user scenario where you aren't too concerned about unblocking all the flows you can just drop Commit , SignalAsync and TryRead so something like

while (true)
{
    var result = await Input.ReadAsync();
    var buffer = result.Buffer;

    if (!buffer.IsEmpty)
    {
        // Do work
        Output.Advance(bytes);

        Input.Advance(buffer.End);

        // Flush before blocking for more input
        await Output.FlushAsync();
    }

    if (result.IsCompleted)
    {
        break;
    }
}

Output.Complete();

benaadams on 10 Apr 2017

I like it but.... you need signal and flush on the pipewriter .... kinda what I am doing without the signal.

Drawaes on 10 Apr 2017

Its pseudo-code; in the non-compiling dialect they use in academic papers 😉

benaadams on 10 Apr 2017

Talked to @davidfowl moving to 2.1.

/cc @muratg