Runtime: API Proposal: SequenceReader<T>.TryPeek overloads to read a specified number of elements from any position

Created on 6 Sep 2019 · 48Comments · Source: dotnet/runtime

The use case is to return a contiguous ReadOnlySpan<byte> or a ReadOnlySequence<byte> from the specified position.

```C#
public partial ref struct SequenceReader
{
bool TryPeek(int skip, int count, out ReadOnlySpan value);
bool TryPeek(int skip, int count, out ReadOnlySequence value);
bool TryCopyTo(int skip, Span destination);

// And possibly (but would require flipping the skip/count on the other API):
bool TryPeek(int count, out ReadOnlySpan<T> value);
bool TryPeek(int count, out ReadOnlySequence<T> value);

}
```

api-needs-work area-System.Buffers

Source

davidfowl

Most helpful comment

I think it will be important to have proper documentation that informs about good and bad use of the TryPeek(Span) API and illustrates the allocation behavior.

The probability of reallocation can be calculated as follows:

P(alloc) = (length - 1) / segmentLength  |> for (length <= segmentLength), otherwise 100%

Some examples for the case of the default segment size (4k):

Peek length  |  Probability of allocation
-----------------------------------------
> 4k         |  100.000 %
4k           |   99.999 %
200          |    4.858 %
50           |    1.196 %
20           |    0.463 %
4            |    0.073 %
1            |    0.000 %

Those figures show that the allocation penalty is pretty much acceptable for many use cases.

It should only be warned about that this is not the right method for reading a larger range of data.

softworkz on 10 Sep 2019

👍2

All 48 comments

Perhaps int skip would be clearer? I had to think this through a couple of times when I first saw it.

JeremyKuhne on 6 Sep 2019

dotnet/runtime#30771 is the single element version of this.

JeremyKuhne on 6 Sep 2019

I have mixed feelings about taking a start of SequencePosition. My first inclination is to have it also take int skip. Trying to position from the start of the sequence is potentially really expensive and it introduces another possible error state (the SequencePosition not belonging to the Sequence).

JeremyKuhne on 6 Sep 2019

Also note that we already have TryCopyTo which does effectively this. We should consider adding the matching API.

C# public partial ref struct SequenceReader<T> { bool TryCopyTo(int skip, Span<T> destination); }

JeremyKuhne on 6 Sep 2019

We can review this, but I really am hesitant to introduce complete random access to the reader. Skipping ahead a bit makes sense to me, but arbitrary positioning is expensive and confuses the prevailing model here.

JeremyKuhne on 7 Sep 2019

Also note that we already have TryCopyTo which does effectively this. We should consider adding the matching API.
public partial ref struct SequenceReader<T>
{
    bool TryCopyTo(int skip, Span<T> destination);
}

TryCopy does copying. The purpose of this API is to go without copying data (except when crossing segment boundaries).

softworkz on 7 Sep 2019

TryCopy does copying. The purpose of this API is to go without copying data (except when crossing segment boundaries).

Sure, I'm suggesting that we add the same "skip" concept to TryCopyTo as well. The proposal I'm happy with is:

``` C#
public partial ref struct SequenceReader
{
bool TryPeek(int skip, int count, out ReadOnlySpan value);
bool TryPeek(int skip, int count, out ReadOnlySequence value);
bool TryCopyTo(int skip, Span destination);

// And possibly (but would require flipping the skip/count on the other API):
bool TryPeek(int count, out ReadOnlySpan<T> value);
bool TryPeek(int count, out ReadOnlySequence<T> value);

}
```

JeremyKuhne on 7 Sep 2019

@JeremyKuhne - I think we're coming from totally difference experiences in using this API.
Would you mind reading my conversation with David here?

I would make the same suggestion to you as I had written there:

How about if you would try to create some kind of parsing example based on SequenceReader?

I think some basic JSON or XML parser would be a good idea. It would take a char sequence instead of a byte sequence, but it would be easier to understand than some kind of binary format because everybody knows those formats.
Of course I don't mean to create a complete and usable xml parser - just a simple example, parsing elements and their attributes as string values.

I think this would help getting a better understanding about the requirements when trying to parse content using the SequenceReader.

softworkz on 7 Sep 2019

How about if you would try to create some kind of parsing example based on SequenceReader?

I actually wrote the reader with @ahsonkhan writing the JsonReader on top of it. We iterated for months to squeeze as much perf as we could out of the abstraction.

The JsonReader didn't use it ultimately as we wanted to get the absolute fastest performance for it, and any abstraction will add some overhead. Just making a method call was enough to impact the sorts of perf goals we were trying to reach.

JeremyKuhne on 7 Sep 2019

My project is completed but I would still be interested in that code how you managed to do that using SequenceReader. Was it a char or a byte sequence and how did parse values without copying? How did you determine the length of string values to parse?

softworkz on 7 Sep 2019

I would still be interested in that code how you managed to do that using SequenceReader

@ahsonkhan can you speak to @softworkz questions about parsing JSON with the SequenceReader? Perhaps point to an iteration where we still had the SequenceReader actively in use?

JeremyKuhne on 7 Sep 2019

Updated the proposal.

davidfowl on 7 Sep 2019

We should consider converging on the parameter names. One of: start/offset/skip and length rather than count, particularly since it applies to the Length of the ROSequence or ROSpan being returned.

ahsonkhan on 10 Sep 2019

Video

Approved: dotnet/corefx#40962 (comment):

``` C#
// Equivalent "Peek" versions. They need a offset as peeking doesn't advance the reader and rewinding is super expensive.
public readonly bool TryPeek(long offset, int length, out ReadOnlySpan value);
public readonly bool TryPeek(long offset, long length, out ReadOnlySequence value);

    // Pairs with existing TryCopyTo(Span<T> destination), which does not advance the reader (neither does this)
    public readonly bool TryCopyTo(long offset, Span<T> destination);

```

JeremyKuhne on 10 Sep 2019

I think it will be important to have proper documentation that informs about good and bad use of the TryPeek(Span) API and illustrates the allocation behavior.

The probability of reallocation can be calculated as follows:

P(alloc) = (length - 1) / segmentLength  |> for (length <= segmentLength), otherwise 100%

Some examples for the case of the default segment size (4k):

Peek length  |  Probability of allocation
-----------------------------------------
> 4k         |  100.000 %
4k           |   99.999 %
200          |    4.858 %
50           |    1.196 %
20           |    0.463 %
4            |    0.073 %
1            |    0.000 %

Those figures show that the allocation penalty is pretty much acceptable for many use cases.

It should only be warned about that this is not the right method for reading a larger range of data.

softworkz on 10 Sep 2019

👍2

I've been thinking about this proposal, and the allocating part for spans bothers me a bit.
I'd like to propose the following non-allocating alternative:

````csharp
public partial ref struct SequenceReader
{
bool TryPeek(int skip, Span readBuffer, out ReadOnlySpan value);

bool TryPeek(Span<T> readBuffer, out ReadOnlySpan<T> value);

}
````

The idea would be to have the developer preallocate a read buffer of the appropriate size. (e.g. from the stack, from a pool, …)
Then, the method would

Return a span of length readBuffer.Length from the original sequence when possible
Fill the provided buffer with non-contiguous data from the sequence (i.e. TryCopyTo), and return this one

This would allow reusing the same buffer continuously during parsing, thus enabling better control of memory allocations by the developer.
The obvious downside being that it makes the API a little bit more complicated.

GoldenCrystal on 22 Sep 2019

👍1

I think it could be fine as an overload but it shouldn’t replace this API (and the other APIs that already allocate)

davidfowl on 22 Sep 2019

I think it would be better when the SequenceReader would do this under-the-hood. I can imagine a number of ways how it could be done. Eventually maybe exposing only a .ClearCache() method, which wouldn't deallocate but tell the reader that it can reuse the memory it has allocated so far.

softworkz on 22 Sep 2019

Doing anything under the hood would be dangerous. I’d rather the more complex overload as nobody has to answer lifetime questions or buffer ownership questions.

davidfowl on 22 Sep 2019

Let's stick to the original proposal:

The SequenceReader will have to do allocations
The lifetime would usually have to match the lifetime of the SequuenceReader
(otherwise what you say would be true: there would be questions about lifetime)

That's the situation we'll have anyway.
What I'm suggesting on top of this is a method to tell the reader that you're no longer interested in the allocated data and that it can re-use previously allocated data.
I think that's pretty transparent. And there's no requirement to call the method to invalidate the cache - that's just an optional optimization that the consumer can make use of.

This makes more sense in combination with the following:
. At the first time, when the reader needs to make an allocation that is - let's say - smaller than the segment size, it allocates a larger range of memory, e.g. 4 * SegmentSize - let's call it 'cache'

For subsequent calls requiring new memory, it can continuously use memory from the 'cache'
When the 'cache' is full (or the required memory is too large) it will allocate an additional range of 'cache' memory
Calling .ClearCache() would allow this memory to be re-used

In parsing scenarios, you are typically using loop constructs in code, there it would allow writing nice code where you would simply call .ClearCache() once in a loop.

Actually this is similar to what @GoldenCrystal is suggesting, just in a much more convenient way.

softworkz on 22 Sep 2019

I disagree but let’s not pollute this already accepted proposal. New APIs can always be proposed but I personally not complicate this but you’re free to make a new proposal that will be discussed on API review.

davidfowl on 22 Sep 2019

I don't think this is off-topic - it's rather about the implementation details of the accepted api proposal (which I'm fully commited to).

Having a .ClearCache method would be an (optional) optimization on top of it, not something separate.

softworkz on 22 Sep 2019

Having a .ClearCache method would be an (optional) optimization on top of it, not something separate.

PS: Think of cases where the SequenceReader has a long lifetime

softworkz on 22 Sep 2019

Sounds good, can you make another API proposal?

davidfowl on 4 Oct 2019

Sure, will do.

I'm just a bit undecided about the naming - ClearCache or InvalidateCache were the initial ideas, but I'm not even sure whether it's right to call it 'Cache'..

What's your thoughts about this?

softworkz on 4 Oct 2019

I wouldn’t worry about the name, pick clear cache. I would focus on the semantics and potential ways it can be misused and how it interacts with all other APIs.

davidfowl on 4 Oct 2019

Just added: https://github.com/dotnet/corefx/issues/41536

I hope it makes sense..

softworkz on 4 Oct 2019

The SequenceReader will have to do allocations

The lifetime would usually have to match the lifetime of the SequuenceReader
(otherwise what you say would be true: there would be questions about lifetime)
...
This makes more sense in combination with the following:
. At the first time, when the reader needs to make an allocation that is - let's say - smaller than the segment size, it allocates a larger range of memory, e.g. 4 * SegmentSize - let's call it 'cache'

For subsequent calls requiring new memory, it can continuously use memory from the 'cache'

When the 'cache' is full (or the required memory is too large) it will allocate an additional range of 'cache' memory

Calling .ClearCache() would allow this memory to be re-used

In parsing scenarios, you are typically using loop constructs in code, there it would allow writing nice code where you would simply call .ClearCache() once in a loop.

Does this mean that memory usage will accumulate infinitely unless ClearCache is called?

Adding this API adds an entirely new dimension to SequenceReader. This type now allocates and owns memory. Previously, it merely accessed memory that was provided.

Is this fairly fundamental change warranted by the value of this new API?

GSPP on 14 Oct 2019

👍1

It's not a fundamental change. The API already performs allocations in certain cases even in its current state.

softworkz on 14 Oct 2019

It is a fundamental change to which component owns the buffers. The caller is now responsible for telling the reader when to clear is internal buffer (which doesn't exist today). If you really want to go with this approach (I'm not sold), then you should spike it to figure out how wide of an impact it has on both the proposed API and existing APIs.

davidfowl on 14 Oct 2019

👍1

You sound like there would be a choice.
Let's look at this for example:

bool TryPeek(int skip, int count, out ReadOnlySpan<T> value);

The API returns a ReadOnlySpan<T>. It's impossible to transfer ownership this way.
This would require returning Memory<T> instead - which we don't want either because in most cases there won't be any allocation and we return a span pointing directly to the ReadOnlySequence's buffer.
It doesn't make a difference whether the reader uses a single buffer and return spans from it or whether it allocates memory for the returned span in each individual case.

Or am I missing something here?

softworkz on 14 Oct 2019

Based on having recently ratified guidance for working with spans, the API as reviewed needs discussion again; because we feel that when returning spans it should always be caller-provided data.

So bool TryPeek(int skip, int count, Span<T> destinationIfAcrossSegments, out ReadOnlySpan<T> value) is a better fit, the returned (via out) span is either came from the ctor (a slice of CurrentSpan) or a slice of value. But nothing is ever "made up" then returned via the span.

bartonjs on 17 Oct 2019

This API is kinda terrible to use though. I'd much prefer an allocating API than one that is that awkward to use. We have other existing API on this type that have the existing signature and allocate, are we going to change those as well?

davidfowl on 17 Oct 2019

Is it API that is returning a span where that span is not an input parameter (via either the method or a ctor)? If yes, and it hasn't shipped, then it could/should also be re-evaluated.

It's entirely possible that the output of a re-review is leave it as-is; but it got noticed in passing as being in conflict with the draft guidance for buffers.

bartonjs on 17 Oct 2019

Is it API that is returning a span where that span is not an input parameter (via either the method or a ctor)? If yes, and it hasn't shipped, then it could/should also be re-evaluated.

This method sure but there are other ones that did ship with the same pattern.

It's entirely possible that the output of a re-review is leave it as-is; but it got noticed in passing as being in conflict with the draft guidance for buffers.

Sure, but I think the existing API shape is good for the 90% case.

davidfowl on 17 Oct 2019

This method sure but there are other ones that _did_ ship with the same pattern.

I just ran a tool that dumps all properties/methods that return spans. The scope is .NET Core 3.0 + Platform Extensions, which includes ASP.NET Core too.

Results

@ahsonkhan and I just did a quick scan. So far most of them look fine in that none of them allocated memory. The only exception is maybe the JsonEncodedText but one could argue that newing up that type is that allocation. And this API in SignalR, which allocates on cache miss:

C# namespace Microsoft.AspNetCore.SignalR.Protocol { public static class HandshakeProtocol { public static ReadOnlySpan<byte> GetSuccessfulHandshake(IHubProtocol protocol); } }

terrajobst on 17 Oct 2019

👍1

none of them allocated memory

What about e.g. SequenceReader.TryReadTo, which may call ToArray?

stephentoub on 17 Oct 2019

What about e.g. SequenceReader.TryReadTo, which may call ToArray?

Hmm, good point.

terrajobst on 17 Oct 2019

bool TryPeek(int skip, int count, Span<T> destinationIfAcrossSegments, out ReadOnlySpan<T> value)

How about something like this instead?

bool TryPeek(int skip, int count, Span<T> destinationCopy);
bool TryPeek(int skip, int count, out ReadOnlySpan<T> underlyingSpan);

The second overload would have to be documented as potentially returning less than count bytes, but could be an option for those trying to avoid copying.
Of course, for user-allocated spans, the first overload will still works as expected and copies to the destination.

FiniteReality on 17 Oct 2019

I very much agree with David: The SequenceReader<T> should allow for convenient API access to ReadOnlySequennce<T>. When you want to do things manually with your own buffer management, you can already do that accessing ReadOnlySequence<T> directly.

But in many use cases, you will have to do your own manual implementation of something similar as what is being proposed here for the SequenceReader<T>.
And Needing to do that manually is a showstopper for many from using that whole stuff at all - imo.

softworkz on 17 Oct 2019

I'm not sure why this is any harder to use:

```C#
SequenceReader reader = ...;
// However big the biggest value to TryPeek is
Span tmp = stackalloc byte[512];
ReadOnlySpan next;
...

if (reader.TryPeek(25, 38, tmp, out next))
{
// "next" is either from the data provided to reader in the ctor, or from tmp, but it's not important which it is.
}

The only "hardness" seems to be from when calling TryPeek with arbitrarily large values... but that already feels contraindicated by the current proposal.  For large values it would want an array (vs a stackalloc) and probably want to delay creation, and deal with growth... which I'd probably write my own helper for.

```C#
internal static bool TryPeekLarge<T>(ref SequenceReader<T> reader, int skip, int count, ref T[] scratch, out ReadOnlySpan<T> value)
{
    if (reader.CurrentSpan.Length - skip >= count)
    {
        value = reader.CurrentSpan.Slice(skip, count);
        return true;
    }

    if (scratch == null || scratch.Length < length)
    {
        scratch = GetABiggerArray();
    }

    return reader.TryPeek(skip, count, scratch, out value);
}

But, again, maybe what's proposed is fine. It violates guidance we recently decided on, but the type also violates longstanding guidance against mutable structs, and it's a ref struct (which is its own can of worms).

bartonjs on 17 Oct 2019

@bartonjs - Your suggestion would introduce even more unexpected semantics: The result from one of those API methods would only be guaranteed to be valid until the next call you make to SequenceReader<T>.
That would mean you would never be able to make two subsequent calls and compare the results - for example.

softworkz on 17 Oct 2019

Generally I wouldn't want to talk down the idea of delegating the allocation of "cache" memory to something outside of SequenceReader<T>. There are multiple ways to do that.
E.g. SequenceReader<T> could have a constructor overload that allows passing in some kind of allocation handler. There could be a default implementation for this, that a consumer of the API could easily create and pass in the constructor.

This could fix the mentioned 'violations' in case it would be decided that they would apply to this case.

Though, I'm not aiming to suggest adding even more complexity to the API, I'm rather up to illustrate this:

No matter where the memory "is coming from" that SequenceReader<T> is using to stage and return inter-segment data:

You will want that memory to be re-used.
And for this, you'll need to be able to tell SequenceReader<T> when it is allowed to do so.

That's the motivation behind the .ClearCache() proposal.(https://github.com/dotnet/corefx/issues/41536) that had originally re-triggered this discussion.
The .ClearCache() proposal itself doesn't actually require the allocation to be done by SequenceReader<T> internally.

softworkz on 17 Oct 2019

When I commented https://github.com/dotnet/corefx/issues/40871#issuecomment-533878095, I assumed that:

APIs returning ReadOnlySequence<T> (e.g. PipeReader) are advanced API designed with the intent of avoiding pointless allocations. (e.g. by maximizing reuse of allocated memory)
The difficulty of working with ReadOnlySequence<> (which I can personally attest 😀) prompted the creation of SequenceReader<>, trying to address common scenarios in parsing a stream of data, at the expense of slight performance degradation.
While making it easier to work with ReadOnlySequence<>, SequenceReader<> was still an advanced API designed for developers wanting to avoid memory allocations. (At the cost of a slight performance overhead ?)
I might be half wrong here, since HttpRequest.BodyReader was publicly introduced in ASP.NET Core. But non-expert developers can still use the simpler HttpRequest.Body stream, so… ¯\_(ツ)_/¯

I get that the possible allocations discussed here may not happen that often, but I think that letting the developer keeping the control of the allocations should still be a concern.

What about e.g. SequenceReader.TryReadTo, which may call ToArray?

I was a bit sad as I (re)discovered this, but I guess that makes a point towards the previously approved API, if only for consistency. (https://github.com/dotnet/corefx/issues/40871#issuecomment-530070064)

I'd still argue about the need for equivalent non-allocating APIs, but I can make a separate proposal for this.

When you want to do things manually with your own buffer management, you can already do that accessing ReadOnlySequence<T> directly.

I disagree: To me, the point of ReadOnlySequence<T> is much more about abstracting the bookkeeping required to sequentially read ReadOnlySequence<T>, than "managing" buffers for you.

Your suggestion would introduce even more unexpected semantics: The result from one of those API methods would only be guaranteed to be valid until the next call you make to SequenceReader<T>.

I don't think it would be unreasonable to expect developers to understand how the writable Span<T> they passed to the API will be used.

You do however raise a valid point in the fact that the non-allocating APIs proposed here don't provide a way to know if the Span<T> was actually written to. (But I'm not sure it really matters; see below)

Also, not saying that it is wrong, but I seriously question the proposed scenario:

One of the common use cases for ReadOnlySequence<T> is to read non-completed data from a PipeReader. As such, you'd need to handle the cases where you don't have enough data to read ahead. So, you'd have to manage buffers by storing the comparison reference value elsewhere.
The currently proposed API doesn't/can't do that for you, because you don't own the memory wrapped by ReadOnlySequence<T>.
In that case, simply reading the data in your own preallocated buffer with the proposed TryCopyTo would be the easiest thing to do. (More expensive, but simpler code)
It seems that you'd have to want to implement some kind of complex LR parser to be concerned by this, and I'd argue that ReadOnlySequence<T> is not be the right API to do this.

In the end:

But, again, maybe what's proposed is fine. It violates guidance we recently decided on, but the type also violates longstanding guidance against mutable structs, and it's a ref struct (which is its own can of worms).

👍

GoldenCrystal on 17 Oct 2019

SequenceReader could have a constructor overload that allows passing in some kind of allocation handler. There could be a default implementation for this, that a consumer of the API could easily create and pass in the constructor.

FWIW, that is actually what I had proposed originally. That we introduce a delegate Span<T> Allocator<T>(int length) or some such. I think that is the simplest way to do things- rather than add a lot of individual complicated APIs.

JeremyKuhne on 17 Oct 2019

SequenceReader could have a constructor overload that allows passing in some kind of allocation handler. There could be a default implementation for this, that a consumer of the API could easily create and pass in the constructor.

FWIW, that is actually what I had proposed originally. That we introduce a delegate Span<T> Allocator<T>(int length) or some such. I think that is the simplest way to do things- rather than add a lot of individual complicated APIs.

The question here would be whether to call this for each required allocation with the individually required length or whether to allocate a larger range of memory which will be sequentially filled for each request requiring allocation. Until it's full - then allocate another one. And start reusing once .ClearCache() is called.

softworkz on 17 Oct 2019

I disagree: To me, the point of ReadOnlySequence<T> is much more about abstracting the bookkeeping required to sequentially read ReadOnlySequence<T>, than "managing" buffers for you.

It's still about some very basic functionality.
For example:

Read a 4 byte integer at 200 bytes from the current position
Get a 20 char span from the current position without advancing (and without copying if possible)

Currently, this cannot be done in an easy way.

One of the common use cases for ReadOnlySequence<T> is to read non-completed data from a PipeReader. As such, you'd need to handle the cases where you don't have enough data to read ahead. So, you'd have to _manage buffers_ by storing the comparison reference value elsewhere.

No. When you know that you need to look ahead n bytes, you would pause processing until n bytes are available.

It seems that you'd have to want to implement some kind of complex LR parser to be concerned by this

Not at all. See examples above.

softworkz on 18 Oct 2019

Video

The two TryPeek APIs that out a span need more work because they silently allocate. Either allow for the caller to pass in a pre-allocated span or make it clear in the name that it will only return true if the data is in contiguous memory.
The TryPeek overloads involving sequences look fine, except that count changes position due to skip. We should either remove the one that doesn't take skip (callers would have to pass in 0) or we should change order of skip which might be weird.
TryCopyTo seems fine as proposed, but it seems we might want to change it based on what we do for TryPeek.