Runtime: Add Span<T> Binary Reader/Writer APIs

Created on 19 Sep 2017 · 68Comments · Source: dotnet/runtime

The API allows for reading and writing binary representation of primitve types (bit blitting) from/to spans of bytes. The API is used by SignalR.

A prototype of the API is available in corfxlab: https://github.com/dotnet/corefxlab/tree/master/src/System.Binary

Part of dotnet/corefx#24174

Usage is quite simple:
```c#
var span = stackalloc byte[4];
span.Write(Int32.MaxValue);
var value = span.Read();

The LittleEndian and BigEndian versions assume/specify specific endianness. The Write/Read versions assume current machine endianness. Try versions return false if the buffer is too small to read/write the specified data type.

API Design:
```C#
// System.Memory.dll
namespace System.Buffers.Binary {
    public static class BinaryPrimitives {
        public static short ReadInt16BigEndian(ReadOnlySpan<byte> buffer);
        public static short ReadInt16LittleEndian(ReadOnlySpan<byte> buffer);
        public static int ReadInt32BigEndian(ReadOnlySpan<byte> buffer);
        public static int ReadInt32LittleEndian(ReadOnlySpan<byte> buffer);
        public static long ReadInt64BigEndian(ReadOnlySpan<byte> buffer);
        public static long ReadInt64LittleEndian(ReadOnlySpan<byte> buffer);
        public static T ReadMachineEndian<T>(ReadOnlySpan<byte> buffer) where T : struct;
        public static ushort ReadUInt16BigEndian(ReadOnlySpan<byte> buffer);
        public static ushort ReadUInt16LittleEndian(ReadOnlySpan<byte> buffer);
        public static uint ReadUInt32BigEndian(ReadOnlySpan<byte> buffer);
        public static uint ReadUInt32LittleEndian(ReadOnlySpan<byte> buffer);
        public static ulong ReadUInt64BigEndian(ReadOnlySpan<byte> buffer);
        public static ulong ReadUInt64LittleEndian(ReadOnlySpan<byte> buffer);
        public static byte ReverseEndianness(byte value);
        public static short ReverseEndianness(short value);
        public static int ReverseEndianness(int value);
        public static long ReverseEndianness(long value);
        public static sbyte ReverseEndianness(sbyte value);
        public static ushort ReverseEndianness(ushort value);
        public static uint ReverseEndianness(uint value);
        public static ulong ReverseEndianness(ulong value);
        public static bool TryReadInt16BigEndian(ReadOnlySpan<byte> buffer, out short value);
        public static bool TryReadInt16LittleEndian(ReadOnlySpan<byte> buffer, out short value);
        public static bool TryReadInt32BigEndian(ReadOnlySpan<byte> buffer, out int value);
        public static bool TryReadInt32LittleEndian(ReadOnlySpan<byte> buffer, out int value);
        public static bool TryReadInt64BigEndian(ReadOnlySpan<byte> buffer, out long value);
        public static bool TryReadInt64LittleEndian(ReadOnlySpan<byte> buffer, out long value);
        public static bool TryReadMachineEndian<T>(ReadOnlySpan<byte> buffer, out T value) where T : struct;
        public static bool TryReadUInt16BigEndian(ReadOnlySpan<byte> buffer, out ushort value);
        public static bool TryReadUInt16LittleEndian(ReadOnlySpan<byte> buffer, out ushort value);
        public static bool TryReadUInt32BigEndian(ReadOnlySpan<byte> buffer, out uint value);
        public static bool TryReadUInt32LittleEndian(ReadOnlySpan<byte> buffer, out uint value);
        public static bool TryReadUInt64BigEndian(ReadOnlySpan<byte> buffer, out ulong value);
        public static bool TryReadUInt64LittleEndian(ReadOnlySpan<byte> buffer, out ulong value);
        public static bool TryWriteInt16BigEndian(Span<byte> buffer, short value);
        public static bool TryWriteInt16LittleEndian(Span<byte> buffer, short value);
        public static bool TryWriteInt32BigEndian(Span<byte> buffer, int value);
        public static bool TryWriteInt32LittleEndian(Span<byte> buffer, int value);
        public static bool TryWriteInt64BigEndian(Span<byte> buffer, long value);
        public static bool TryWriteInt64LittleEndian(Span<byte> buffer, long value);
        public static bool TryWriteMachineEndian<T>(Span<byte> buffer, ref T value) where T : struct;
        public static bool TryWriteUInt16BigEndian(Span<byte> buffer, ushort value);
        public static bool TryWriteUInt16LittleEndian(Span<byte> buffer, ushort value);
        public static bool TryWriteUInt32BigEndian(Span<byte> buffer, uint value);
        public static bool TryWriteUInt32LittleEndian(Span<byte> buffer, uint value);
        public static bool TryWriteUInt64BigEndian(Span<byte> buffer, ulong value);
        public static bool TryWriteUInt64LittleEndian(Span<byte> buffer, ulong value);
        public static void WriteInt16BigEndian(Span<byte> buffer, short value);
        public static void WriteInt16LittleEndian(Span<byte> buffer, short value);
        public static void WriteInt32BigEndian(Span<byte> buffer, int value);
        public static void WriteInt32LittleEndian(Span<byte> buffer, int value);
        public static void WriteInt64BigEndian(Span<byte> buffer, long value);
        public static void WriteInt64LittleEndian(Span<byte> buffer, long value);
        public static void WriteMachineEndian<T>(Span<byte> buffer, ref T value) where T : struct;
        public static void WriteUInt16BigEndian(Span<byte> buffer, ushort value);
        public static void WriteUInt16LittleEndian(Span<byte> buffer, ushort value);
        public static void WriteUInt32BigEndian(Span<byte> buffer, uint value);
        public static void WriteUInt32LittleEndian(Span<byte> buffer, uint value);
        public static void WriteUInt64BigEndian(Span<byte> buffer, ulong value);
        public static void WriteUInt64LittleEndian(Span<byte> buffer, ulong value);
    }
}

Original proposals (click to expand)

c# // System.Memory.dll namespace System.Buffers.Binary { public static class BinaryPrimitives { public static short ReadInt16BigEndian(ReadOnlySpan<byte> buffer); public static short ReadInt16LittleEndian(ReadOnlySpan<byte> buffer); public static int ReadInt32BigEndian(ReadOnlySpan<byte> buffer); public static int ReadInt32LittleEndian(ReadOnlySpan<byte> buffer); public static long ReadInt64BigEndian(ReadOnlySpan<byte> buffer); public static long ReadInt64LittleEndian(ReadOnlySpan<byte> buffer); public static T ReadMachineEndian<T>(ReadOnlySpan<byte> buffer) where T : struct; public static ushort ReadUInt16BigEndian(ReadOnlySpan<byte> buffer); public static ushort ReadUInt16LittleEndian(ReadOnlySpan<byte> buffer); public static uint ReadUInt32BigEndian(ReadOnlySpan<byte> buffer); public static uint ReadUInt32LittleEndian(ReadOnlySpan<byte> buffer); public static ulong ReadUInt64BigEndian(ReadOnlySpan<byte> buffer); public static ulong ReadUInt64LittleEndian(ReadOnlySpan<byte> buffer); public static byte ReverseEndianness(byte value); public static short ReverseEndianness(short value); public static int ReverseEndianness(int value); public static long ReverseEndianness(long value); public static sbyte ReverseEndianness(sbyte value); public static ushort ReverseEndianness(ushort value); public static uint ReverseEndianness(uint value); public static ulong ReverseEndianness(ulong value); public static bool TryReadInt16BigEndian(ReadOnlySpan<byte> buffer, out short value); public static bool TryReadInt16LittleEndian(ReadOnlySpan<byte> buffer, out short value); public static bool TryReadInt32BigEndian(ReadOnlySpan<byte> buffer, out int value); public static bool TryReadInt32LittleEndian(ReadOnlySpan<byte> buffer, out int value); public static bool TryReadInt64BigEndian(ReadOnlySpan<byte> buffer, out long value); public static bool TryReadInt64LittleEndian(ReadOnlySpan<byte> buffer, out long value); public static bool TryReadMachineEndian<T>(ReadOnlySpan<byte> buffer, out T value) where T : struct; public static bool TryReadUInt16BigEndian(ReadOnlySpan<byte> buffer, out ushort value); public static bool TryReadUInt16LittleEndian(ReadOnlySpan<byte> buffer, out ushort value); public static bool TryReadUInt32BigEndian(ReadOnlySpan<byte> buffer, out uint value); public static bool TryReadUInt32LittleEndian(ReadOnlySpan<byte> buffer, out uint value); public static bool TryReadUInt64BigEndian(ReadOnlySpan<byte> buffer, out ulong value); public static bool TryReadUInt64LittleEndian(ReadOnlySpan<byte> buffer, out ulong value); public static bool TryWriteInt16BigEndian(Span<byte> buffer, short value); public static bool TryWriteInt16LittleEndian(Span<byte> buffer, short value); public static bool TryWriteInt32BigEndian(Span<byte> buffer, int value); public static bool TryWriteInt32LittleEndian(Span<byte> buffer, int value); public static bool TryWriteInt64BigEndian(Span<byte> buffer, long value); public static bool TryWriteInt64LittleEndian(Span<byte> buffer, long value); public static bool TryWriteMachineEndian<T>(Span<byte> buffer, ref T value) where T : struct; public static bool TryWriteUInt16BigEndian(Span<byte> buffer, ushort value); public static bool TryWriteUInt16LittleEndian(Span<byte> buffer, ushort value); public static bool TryWriteUInt32BigEndian(Span<byte> buffer, uint value); public static bool TryWriteUInt32LittleEndian(Span<byte> buffer, uint value); public static bool TryWriteUInt64BigEndian(Span<byte> buffer, ulong value); public static bool TryWriteUInt64LittleEndian(Span<byte> buffer, ulong value); public static void WriteInt16BigEndian(Span<byte> buffer, short value); public static void WriteInt16LittleEndian(Span<byte> buffer, short value); public static void WriteInt32BigEndian(Span<byte> buffer, int value); public static void WriteInt32LittleEndian(Span<byte> buffer, int value); public static void WriteInt64BigEndian(Span<byte> buffer, long value); public static void WriteInt64LittleEndian(Span<byte> buffer, long value); public static void WriteMachineEndian<T>(Span<byte> buffer, ref T value) where T : struct; public static void WriteUInt16BigEndian(Span<byte> buffer, ushort value); public static void WriteUInt16LittleEndian(Span<byte> buffer, ushort value); public static void WriteUInt32BigEndian(Span<byte> buffer, uint value); public static void WriteUInt32LittleEndian(Span<byte> buffer, uint value); public static void WriteUInt64BigEndian(Span<byte> buffer, ulong value); public static void WriteUInt64LittleEndian(Span<byte> buffer, ulong value); } }

api-approved area-System.Buffers

Source

KrzysztofCwalina

Most helpful comment

We'd discussed a number of similar concerns yesterday as well:

Generically reading T is problematic, especially when talking about reading big/little endian, as that's likely not what's going to do the intended operation for anything but int/long/etc., e.g. Guid would be incorrect.
Especially as extension methods, Write/Read look like they advance but don't.
We have these basic read/write operations for span on BitConverter, and these operations essentially duplicate them; I prefer it when these separate packages provide the more advanced functionality rather than duplicating what's in the core.

I'd be happy with just putting an API like this in System.Memory.dll or wherever:
```C#
namespace System.Buffers
{
public ref struct SpanReader
{
public SpanReader(ReadOnlySpan span);

    public int Length { get; }
    public int Position { get; set; }
    public ReadOnlySpan<byte> Remaining { get; }

    public bool TryReadInt32LittleEndian(out int result);
    public bool TryReadInt32BigEndian(out int result);

    public bool TryReadInt64LittleEndian(out long result);
    public bool TryReadInt64BigEndian(out long result);

    ... // repeated for each primitive type we care about
}

public ref struct SpanWriter
{
    public SpanWriter(Span<byte> span);

    public int Length { get; }
    public int Position { get; set; }
    public Span<byte> Remaining { get; }

    public bool TryWriteInt32LittleEndian(int result);
    public bool TryWriteInt32BigEndian(int result);

    public bool TryWriteInt64LittleEndian(long result);
    public bool TryWriteInt64BigEndian(long result);

    ... // repeated for each primitive type we care about
}

}
```
We wouldn't have just the unadorned TryReadInt32 as it's duplicative: if you don't care about endianness because you're just reading/writing on the same system, then you can use either the little or big methods, as long as you're consistent, and if you do care about endianness, then you can use the one you need. I didn't include the non-Try methods above, because if you're writing this kind of code, it seems you do care about the potential expense of exceptions, in which case you either want to use the Try methods to avoid exceptions, or you know for certain that the operation will succeed, in which case you can just use the Try variant; that said, if folks feel strongly about having the non-Try, I don't think it's a big deal, it just seems overkill to me to have both.

This avoids concerns about arbitrary Ts and endianness (plus needing primitive checks), it avoids concerns about duplication with BitConverter as it's clearly providing more advanced functionality than just the primitive read/write BitConverter does, it avoids concerns about not advancing because it advances, it fits well with the general .NET notion of XxReader/Writer, etc.

stephentoub on 20 Sep 2017

👍7

All 68 comments

Api review feedback:

public static bool TryReadBigEndiann(this ReadOnlySpan buffer, out T value) where T : struct;

TryReadBigEndiann -> TryReadBigEndian

The T should be constrained to blittable types, currently the compiler doesn't support that so it will throw in cases it cannot support.
BigEndian/LittleEndian the T should be constrained to our single value primitive types, perhaps just use an overload for each of those instead of a T.
Consider making the Read/Write Apis use the default endianness and just have a flip endianness method for just the single value primitives.
How do these relate to BitConverter APIs?
We need a better name then Binary for this type. Perhaps SpanBinaryExtensions.
Will anyone ever call LittleEndian APIs over the Read/Write APIs? As they would essentially be the same until we run on a BigEndian architecture.

weshaggard on 19 Sep 2017

Additional questions:

We've been calling things TryWriteBytes, but here it's just TryWrite. Is there a reason for the difference? Should we rename TryWriteBytes -> TryWrite or TryWrite -> TryWriteBytes?
What is the behavior of the non-Try Read/Write... do they throw in cases where the Try would return false?

stephentoub on 19 Sep 2017

I'm concerned that a common mistake will be to try to use this API to write multiple values to a single Span<byte> like this:

```c#
var span = stackalloc byte[8];
span.Write(Int32.MinValue);
span.Write(Int32.MaxValue);

Similar code works for other `Write` methods in the framework (including `BinaryWriter`, `Stream` and `TextWriter`). Is there a way to change the API so that this mistake is avoided? (Maybe by using a different name?)

Also, how should that code be written correctly? Do I have to manually use `Slice()`?

```c#
var span = stackalloc byte[8]; 
span.Slice(0).Write(Int32.MinValue);
span.Slice(4).Write(Int32.MaxValue);

(The first Slice is unnecessary, but I added it for symmetry.)

svick on 20 Sep 2017

System.Buffers.dll

Why System.Buffers and not System.Memory where Span and friends live?

just use an overload for each of those instead of a T.

jkotas on 20 Sep 2017

Why System.Buffers and not System.Memory where Span and friends live?

This was brought up in the review. The rationale is that we only have the span/memory primitive type and APIs/extension methods closely related to them in System.Memory (IndexOf, AsBytes, etc). Operations like Read/Write, transformation APIs like text encode, and text parse/format would live in System.Buffers.

Also, how should that code be written correctly? Do I have to manually use Slice()?

Yes.

What is the behavior of the non-Try Read/Write... do they throw in cases where the Try would return false?

Yes. Are there any concerns with that?

We've been calling things TryWriteBytes, but here it's just TryWrite. Is there a reason for the difference? Should we rename TryWriteBytes -> TryWrite or TryWrite -> TryWriteBytes?

One approach could be to remain consistent with TryRead and since we don't have TryReadBytes, change TryWriteBytes elsewhere to TryWrite and keep this as TryWrite as well.

ahsonkhan on 20 Sep 2017

The rationale is that we only have the span/memory primitive type and APIs/extension methods closely related to them in System.Memory

Well, it is more complex than for .NET Core: Span/Memory primitive types are in System.Runtime, SpanExtension are in System.Memory, and now you are proposing to add extension methods to read/write bytes from Span to System.Buffers that has nothing to do with Span so far. It feels like we are trying to overthink the layering again. It has been proven that we do not know how to do that well. We should preffer larger assemblies that have all related stuff lumped together.

jkotas on 20 Sep 2017

👍1

I think we should remove the generics and instead have the type being written in the method name e.g.

span.ReadInt32()
span.TryReadInt32()

It's too easy to mess up the generic method. Here's some MQTT parser code using these APIs:

https://github.com/aspnet/SignalR/blob/6151af7e2610d422bba2097c8326de7e5d545803/samples/SocketsSample/EndPoints/MqttEndPoint.cs#L48-L59

davidfowl on 20 Sep 2017

👍3

add extension methods to read/write bytes from Span to System.Buffers that has nothing to do with Span so far

We need a better name then Binary for this type. Perhaps SpanBinaryExtensions.

We could change them so they are no longer extension methods on Span (and take Span as a parameter, similar to the other System.Text.Primitive APIs for example bool TryParseBoolean(ReadOnlySpan<byte> text, out bool value)). However, in that case, a short name like Binary would be preferable over something like SpanBinaryExtensions.
Used like:
C# var span = stackalloc byte[4]; Binary.Write(span, Int32.MaxValue); var value = Binary.Read<int>(span);

ahsonkhan on 20 Sep 2017

Also this problem that @svick brings up is another big usability issue with these APIs and span:

Also, how should that code be written correctly? Do I have to manually use Slice()?

Since the early days of pipelines and span @benaadams, @Drawaes, @mgravell and I have been discussing this scenario. We should also think about a mutable ref like struct that moves forward as the data is written.

C# var writer = new BinaryWriter(span); writer.WriteInt32(10); // writes and moves to span + sizeof(int32) writer.WriteInt64(20);

davidfowl on 20 Sep 2017

👍5

Funny over the last couple of days I have been writing a fast PE/IL rewriter and while the code is dogs breakfast due to me deep diving the topic I have had to write exactly those methods again.

https://github.com/Drawaes/dotnetXperiments/blob/master/PEQuick/PEQuick/MetaData/MetaDataReader.cs#L60

A struct wrapper for a span. And then extension methods that read from (to be replaced with your methods of course mine are a hack) the span. Then slice the span and return it. Then the struct wrapper uses this to keep its internal span up to date.

Seeing as every app I write that uses spans heavily ends up with this same pattern it seems a good candidate for a framework item.

Drawaes on 20 Sep 2017

We'd discussed a number of similar concerns yesterday as well:

Generically reading T is problematic, especially when talking about reading big/little endian, as that's likely not what's going to do the intended operation for anything but int/long/etc., e.g. Guid would be incorrect.
Especially as extension methods, Write/Read look like they advance but don't.
We have these basic read/write operations for span on BitConverter, and these operations essentially duplicate them; I prefer it when these separate packages provide the more advanced functionality rather than duplicating what's in the core.

I'd be happy with just putting an API like this in System.Memory.dll or wherever:
```C#
namespace System.Buffers
{
public ref struct SpanReader
{
public SpanReader(ReadOnlySpan span);

    public int Length { get; }
    public int Position { get; set; }
    public ReadOnlySpan<byte> Remaining { get; }

    public bool TryReadInt32LittleEndian(out int result);
    public bool TryReadInt32BigEndian(out int result);

    public bool TryReadInt64LittleEndian(out long result);
    public bool TryReadInt64BigEndian(out long result);

    ... // repeated for each primitive type we care about
}

public ref struct SpanWriter
{
    public SpanWriter(Span<byte> span);

    public int Length { get; }
    public int Position { get; set; }
    public Span<byte> Remaining { get; }

    public bool TryWriteInt32LittleEndian(int result);
    public bool TryWriteInt32BigEndian(int result);

    public bool TryWriteInt64LittleEndian(long result);
    public bool TryWriteInt64BigEndian(long result);

    ... // repeated for each primitive type we care about
}

stephentoub on 20 Sep 2017

👍7

That works except when I need to read an int24 ;)

Drawaes on 20 Sep 2017

👍2

I like the idea or span reader/writer, but I think it should be added in addition, not instead of the proposed APIs. the reader/writer is a necessarily more heavy API.

KrzysztofCwalina on 20 Sep 2017

but I think it should be added in addition, not instead of the proposed APIs. the reader/writer is a necessarily more heavy API.

Why? It sounds like the reader/writer support is the more common need (to avoid needing to manually slice), and if you really do want the individual operation, you can do:
```C#
new SpanReader(span).TryReadInt32LittleEndian(out int result);

instead of:
```C#
SpanReader.TryReadInt32LittleEndian(span, out int result);

which is barely more characters and saves doubling the number of methods we need to expose, including those that duplicate BitConverter functionality.

stephentoub on 20 Sep 2017

I've recently become a fan of... "working" with the interesting things the type system + Jit does so I'd go for something more like:

public interface IEndianness {}

public struct LittleEndian : IEndianness {}
public struct BigEndian : IEndianness {}

public ref struct SpanWriter
{
    public SpanWriter(Span<byte> span);

    public int Length { get; }
    public int Position { get; set; }
    public Span<byte> Remaining { get; }

    public bool TryWriteInt32(long value);
    public bool TryWriteInt32<TEndianness>(long value) where TEndianness : IEndianness;

    public bool TryWriteInt64(long value);
    public bool TryWriteInt64<TEndianness>(long value) where TEndianness : IEndianness;

    // ... repeated for each primitive type we care about
}

benaadams on 20 Sep 2017

public bool TryWriteInt32<TEndianness>(long value) where TEndianness : IEndianness
{
    if (typeof(TEndianness) == typeof(LittleEndian))
    {
       // do LittleEndian
    }
    else if (typeof(TEndianness) == typeof(BigEndian))
    {
       // do BigEndian
    }
    else
    {
        throw WTF();
    }
}

Types being structs should compile it to an optimized method for that Endianness

Usage

// read LittleEndian
new SpanReader(span).TryReadInt32<LittleEndian>(out int result);

// read BigEndian 
new SpanReader(span).TryReadInt32<BigEndian>(out int result);

// read native
new SpanReader(span).TryReadInt32(out int result);

benaadams on 20 Sep 2017

Assuming BitConverter.IsLittleEndian is recognized as an intrinsic then the native version can just be

bool TryWriteInt32(long value)
    => BitConverter.IsLittleEndian ? 
        TryWriteInt32<LittleEndian>(value) : 
        TryWriteInt32<BigEndian>(value);

benaadams on 20 Sep 2017

👍2

Now we are cooking with gas ;)

Drawaes on 20 Sep 2017

then the native version

For what situation is a native version needed?

And if it is needed, why would it be done differently from the little/big endian support? i.e. why have LittleEndian and BigEndian structs but not PlatformEndian or whatever?

Also, with the structs, what does it mean to call these methods with some other implementation of the interface?

stephentoub on 20 Sep 2017

It means it it throws right ? I think that is why there was a "wtf" exception.

Drawaes on 20 Sep 2017

It means it it throws right ?

Sure... it's just an odd design from that perspective: you allow parameterizing based on an interface anyone can implement, but then fail if provided with something other than one of a few built-in blessed implementations. I understand why it's being suggested, it's just a bit strange.

stephentoub on 20 Sep 2017

👍1

For what situation is a native version needed?

When you don't care about endianness because you are always reading and writing to the same architecture

benaadams on 20 Sep 2017

When you don't care about endianness because you are always reading and writing to the same architecture

In which case why not just use little or big consistently? I guess the concern is that you might force unnecessary reversals if, for example, you picked big but were on a little platform? Does that come up in the scenarios where this is used? I thought these APIs were primarily about interacting with data over the network.

stephentoub on 20 Sep 2017

In which case why not just use little or big consistently?

Is wordy; and I still have to look up what endianness intel is on the rare occasions endianness is important; so anecdotally I'd guess a regular user would also need to if they wanted to be native to the platform (rather than always reversing the input) by making the wrong initial choice and then just copy pasting

benaadams on 20 Sep 2017

👍2

To be fair @marcgravel and I circled around this issue several times and cane to the conclusion that even if you "don't care" it should be explicitly little or big. Reason you might not care today. But say you save the data and load it on say a bigendian Xbox you get in trouble.

So there should be no "native" option.

Drawaes on 20 Sep 2017

Suppose it doesn't matter, could always be an extension ;-)

benaadams on 20 Sep 2017

but I think it should be added in addition, not instead of the proposed APIs. the reader/writer is a necessarily more heavy API.

Why? It sounds like the reader/writer support is the more common need (to avoid needing to manually slice)

Our customers need APIs to read sigle values. The reader/writer is a good idea, but a feature creep. It needs to be designed separately. The reader/writer APIs proposed are a good start, but they are not complete and not tested on real scenarios. In real scenarios involving readers there is a need for more APIs to search, skip, etc.

In addition, as I said the reader/writer will be more costly when reading a single value, which is what our customers do.

KrzysztofCwalina on 20 Sep 2017

Our customers need APIs to read sigle values.

All examples shown on this thread have been for reading multiple values, and as I noted, these APIs still let you read single values when you need to.

In real scenarios involving readers there is a need for more APIs to search, skip, etc.

These would work for any scenario where someone was doing the manual advance thing. If additional APIs are needed, they can be provided subsequently. And if we're worried about them not being right, then let's take the time to design the right thing rather than rushing something in. The feedback I've seen thus far on this thread has been that the originally proposed APIs are not the right thing.

as I said the reader/writer will be more costly when reading a single value

Can you provide the scenario where a single value needs to be read/written and share how much more cost there is by requiring the struct to be instantiated? It should be close to negligible, as all the constructor should need to do is store the span into a field, and the ctor should get inlined, and if it's not, that seems like something we need to address in the JIT.

stephentoub on 20 Sep 2017

I have used spans in 2 real world scenarios (not my garbage github examples) on binary protocols and both times a reader and writer struct similar to above had to be written.

Drawaes on 20 Sep 2017

I have used spans in 2 real world scenarios (not my garbage github examples) on binary protocols and both times a reader and writer struct similar to above had to be written.

I totally agree we need such reader and writer. I just think it's a separate feature. It would be great if somebody opened an issue about it, if not I will do it when I find some time.

KrzysztofCwalina on 20 Sep 2017

Okay I added an issue, mostly to start a discussion

dotnet/corefx#24180

Drawaes on 20 Sep 2017

I just think it's a separate feature

And to me, it's _the_ feature, i.e. with that new issue opened, this one should be closed.

stephentoub on 20 Sep 2017

👍3

The only way the methods above are potentially useful, is if they have a return type and an out (this is the other pattern which I have used, but recently abandoned for the reader/writer in code)

(dropping the bigendian/littleendian thing for brevity)

public static Span<byte> WriteInt(this Span<byte> self, int value)
{
   //Do your write
   return self.Slice(sizeof(int));
}

//use

span = span.Write(value);

Also note... ref extension methods would help here ;) which means vb could do

```
public static void AdvancingWrite(this ref Span self, int value)
````

Drawaes on 21 Sep 2017

As a side note the extension method by ref discussion on this exact method was discussed between

@benaadams @davidfowl @mgravell and myself about 12 months this month ago for exact reasons like this in fact if I can find the gitter archives it might be insightful :)

Drawaes on 21 Sep 2017

Yeah ref extensions would avoid needing an extra writer/reader struct to hold the state by just modifing the original

public static bool TryWriteInt(ref this Span<byte> self, int value)
{
   //Do write
    if (success)
    {
        self = self.Slice(sizeof(int));
    }
}

benaadams on 21 Sep 2017

annnnd, I will say it again "VB can do it, so why don't we trust C# devs to do it?"

Drawaes on 21 Sep 2017

And to me, it's the feature, i.e. with that new issue opened, this one should be closed.

The current SignalR code is reading single BigEndian ints. The reader would be a completely unnecessary from overhead and and syntax complexity perspective.

KrzysztofCwalina on 21 Sep 2017

Sorry, did not mean to close. I disagree with

And to me, it's the feature, i.e. with that new issue opened, this one should be closed.

Real code (e.g. SignalR) reads single BigeEndian ints, and using a reader for it would be completely unnecessary form perf and syntax perspective.

KrzysztofCwalina on 21 Sep 2017

Real code (e.g. SignalR) reads single BigeEndian ints

Links to examples?

The real SignalR code linked above is reading sequence of ints and bytes, and it has to Slice the Span by the right amount after each read. E.g. here:

https://github.com/aspnet/SignalR/blob/6151af7e2610d422bba2097c8326de7e5d545803/samples/SocketsSample/EndPoints/MqttEndPoint.cs#L313

The SpanReader would be a net improvement for it.

jkotas on 21 Sep 2017

https://github.com/aspnet/SignalR/blob/dev/src/Microsoft.AspNetCore.SignalR.Common/Internal/Formatters/BinaryMessageParser.cs#L22

https://github.com/aspnet/SignalR/blob/dev/src/Microsoft.AspNetCore.SignalR.Common/Internal/Formatters/BinaryMessageFormatter.cs#L18

Yes, as I said we need both. We need a reader, but reader needs to be designed and it would internally use the simple APIs. We should not feature creep here.

KrzysztofCwalina on 21 Sep 2017

https://github.com/aspnet/SignalR/blob/dev/src/Microsoft.AspNetCore.SignalR.Common/Internal/Formatters/BinaryMessageParser.cs#L30

From your first one, they slice it a couple of lines down ;)

Admittedly they slice the buffer, but then this operation is really on a "Buffer" not the span (I think there should be direct read/write methods on buffer FYI )

Drawaes on 21 Sep 2017

Also that second example you gave, the code rents an 8 byte buffer, to write a long and then sends it to the stream, and then rents a payload buffer to send it. Which just seems... wasteful? So my PR to rent the length + payload and write in one go

https://github.com/aspnet/SignalR/pull/921/files

Now could happily use the writer wrapper (as it writes and then slices)

Drawaes on 21 Sep 2017

👍2

@stephentoub

I didn't include the non-Try methods above, because if you're writing this kind of code, it seems you do care about the potential expense of exceptions, in which case you either want to use the Try methods to avoid exceptions, or you know for certain that the operation will succeed, in which case you can just use the Try variant; that said, if folks feel strongly about having the non-Try, I don't think it's a big deal, it just seems overkill to me to have both.

Personally, I'd prefer to have the non-Try methods included. If I have a bug in my code, I prefer that it throws (and subsequently crashes the process), rather than it silently doing the wrong thing. In that case, the expense of exceptions doesn't matter, you're never going to catch them.

svick on 21 Sep 2017

👍1

I agree with that. If you "know" you have the right space then the try methods are just added clutter.

And if you "know" but are wrong then you want an exception because it's exception. And as @svick says perf doesn't matter as much in a true error case.

Drawaes on 21 Sep 2017

👍2

non-Try

Ok, that's fair.

stephentoub on 21 Sep 2017

We should not feature creep here.

This isn't feature creep. This is designing the thing the code actually needs. The examples provided would mostly (all?) be better off with a reader/writer, or would not suffer from the few extra instructions required to pass the span in via a field on a ref struct rather than as a parameter.

stephentoub on 21 Sep 2017

I tend to agree I have been racking my brain for a time you do a single op on a span and am failing to come up with one.

The examples above the first slices the buffer only so it's close. The second example was odd code and I changed it to be correct and now it slices. It might be worth fleshing out the design in the other issue and see if it can be done in a reasonable way. If it has blocking points or seems too hard for now then this issue could be returned to?

Drawaes on 21 Sep 2017

This isn't feature creep. This is designing the thing the code actually needs. The examples provided would mostly (all?) be better off with a reader/writer, or would not suffer from the few extra instructions required to pass the span in via a field on a ref struct rather than as a parameter.

I still don't see how the code samples would be better with a reader/writer. The first code sample reads one int (length) and then does nothing else with binary. The other is overly complicated. Should probably just write the int to the Stream and then the payload.

Also, to implement reader/writer we need the low level APIs.

KrzysztofCwalina on 21 Sep 2017

to implement reader/writer we need the low level APIs.

The low-level APIs are Unsafe.Read/WriteUnaligned. We have those.

stephentoub on 21 Sep 2017

Right. I won't be surprised if we find that the Binary.Read/Write are not great APIs to implement the Span reader/writer because of they incur unnecessary overhead (redundant range checks and/or need to move both pointer and count at the same time).

jkotas on 21 Sep 2017

Great, then we will use the Unsafe APIs to implement reader/writer.

KrzysztofCwalina on 21 Sep 2017

I updated the APIs (at the top of the issue) after incorporating feedback from. We are still discussing if we rename Read to Get and Write to Put/Set.

KrzysztofCwalina on 3 Oct 2017

SpanReaderand SpanWritercould have an endianness constructor argument. I don't know about the performance implications, but it would shorten the method names and make it easy to changes endianness .

tmds on 3 Oct 2017

@tmds Take a look at dotnet/corefx#24180

Drawaes on 3 Oct 2017

Updated the assembly where we will add the APIs (at the top of this issue). The current POR is to use System.Memory.dll.

We decided we cannot use System.Buffers.dll as it was in-boxed in .NET Core 2.0

KrzysztofCwalina on 4 Oct 2017

FYI: The API review discussion was recorded - see https://youtu.be/m4BUM3nJZRw?t=38 (70 min duration)
For detailed notes see System.Binary notes PR.

karelz on 4 Oct 2017

Floating point numbers are not included here. Can they be included?
FYI, Wikipedia on endianness floating point:

Floating point[edit]
Although the ubiquitous x86 processors of today use little-endian storage for all types of data (integer, floating point, BCD), there are a number of hardware architectures where floating-point numbers are represented in big-endian form while integers are represented in little-endian form.[17] There are ARM processors that have half little-endian, half big-endian floating-point representation for double-precision numbers: both 32-bit words are stored in little-endian like integer registers, but the most significant one first. Because there have been many floating-point formats with no "network" standard representation for them, the XDR standard uses big-endian IEEE 754 as its representation. It may therefore appear strange that the widespread IEEE 754 floating-point standard does not specify endianness.[18] Theoretically, this means that even standard IEEE floating-point data written by one machine might not be readable by another. However, on modern standard computers (i.e., implementing IEEE 754), one may in practice safely assume that the endianness is the same for floating-point numbers as for integers, making the conversion straightforward regardless of data type. (Small embedded systems using special floating-point formats may be another matter however.)

Perhaps the method names can be shortened by using "LE"/"BE" instead of "LittleEndian/BigEndian"?

From what I want to do with this, I'd like a higher-level SpanReader/SpanWriter api as requested by @svick and @Drawaes (https://github.com/dotnet/corefx/issues/24180).
nit: perhaps change the title of this issue from Reader/Writer to Read/Write.

I wonder: can't we get rid of the overloads via generic ReadBE<T>/WriteBE<T>/ReadLE<T>/WriteLE<T>?

tmds on 5 Oct 2017

👍1

As discussed in the API review there is no current way of limiting T to a primative hence no generic.

As for floating point the text above says why I don't think there should be a "standard" floating point op because there really is no standard to speak of. I would say if there are floating point ops they should be off on their own or if they truly are just byte flipped you could just read a unit or using and blit it in a simple method ;)

Drawaes on 5 Oct 2017

As discussed in the API review there is no current way of limiting T to a primative hence no generic.

I wondered if the reason was perf. The methods could throw for non-primitive types.
The generic method has the advantage of being generically callable.
e.g. consider how this affects a BigEndianWriter implementation.

As for floating point the text above says why I don't think there should be a "standard" floating point op because there really is no standard to speak of. I would say if there are floating point ops they should be off on their own or if they truly are just byte flipped you could just read a unit or using and blit it in a simple method ;)

Indeed. Lack of a standard may be a reason to not to include these. Or you can go by _one may in practice safely assume that the endianness is the same for floating-point numbers as for integers_.

tmds on 5 Oct 2017

So I read the whole thread...

I think the explicitness of the type being written/read is a good reason to go for the non-generic version (even for the 'machine' endianness).
buffer.WriteBigEndian(i); vs buffer.WriteBigEndian((byte)i); vs buffer.WriteUInt8BE((byte)i)

Also, my use-case is for SpanReader/SpanWriter. It seems that that is the common use-cases. And as shown by @stephentoub a single value can easily read/written using such api's too. This raises the question whether there should be a public API for dealing with a single value.

tmds on 5 Oct 2017

@ahsonkhan

Do you have a rough ETA for this being available for CoreFX, there are some places I would like to use it inside SslStream in the next refactor around the handshake and framing.

Cheers
Tim

Drawaes on 10 Oct 2017

@Drawaes, which functionality do you need? I'd have thought BitConverter would be sufficient. Or you need everything read/written big endian?

Don't count on being able to use these APIs.

stephentoub on 10 Oct 2017

I don't need it per se. But the frames do a bigendian read on the size field. Just would have been nice as I prototype this all moving to span / memory to use the official method. It's certainly not going to hold me up. Just wondering.

Drawaes on 10 Oct 2017

@Drawaes we have active discussions on which APIs should be out-of-box vs. in the platform. I believe we are leaning towards having this one out of box, which means nothing in platform can depend on it. At least not now. Please don't take dependency on it, unless you really have to.

karelz on 10 Oct 2017

Cool good to know. I don't really need it was just going to use it if it was the new one true way ;)

Drawaes on 10 Oct 2017

This is now done. dotnet/corefx#24400.
Thanks for all the feedback!

KrzysztofCwalina on 13 Oct 2017

@KrzysztofCwalina I think you meant "now done", right?

karelz on 13 Oct 2017

Corrected :-)

KrzysztofCwalina on 14 Oct 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Default hashcode for struct with double or float field is wrong

jkotas · 3Comments

Generic interface method not being called

v0l · 3Comments

IAsyncDisposable TransactionScope

Timovzl · 3Comments

'Type' does not contain a definition for 'IsEnum'

aggieben · 3Comments

LLVM backend

bencz · 3Comments