Runtime: Proposal: Endianness Enum and Extensions for BinaryReader an BinaryWriter

Created on 23 Jul 2018  路  21Comments  路  Source: dotnet/runtime

Proposal

Add an enum that handles the endiannes

    /// <summary>
    /// Endianness refers to the sequential order in which bytes are arranged into larger numerical values when stored in memory or when transmitted over digital links.
    /// </summary>
    public enum Endianness {

        /// <summary>
        /// The least significant byte (LSB) value, is at the lowest address. 
        /// </summary>
        LittleEndian,

        /// <summary>
        /// The most significant byte (MSB) value, is at the lowest address.
        /// </summary>
        BigEndian

    }

Also add extensions to BinaryReader like:

public static ushort ReadUInt16(this BinaryReader binaryReader, Endianness endianness);
public static short ReadInt16(this BinaryReader binaryReader, Endianness endianness);
public static uint ReadUInt32(this BinaryReader binaryReader, Endianness endianness);
public static int ReadInt32(this BinaryReader binaryReader, Endianness endianness);
public static ulong ReadUInt64(this BinaryReader binaryReader, Endianness endianness);
public static long ReadInt64(this BinaryReader binaryReader, Endianness endianness);
public static float ReadFloat(this BinaryReader binaryReader, Endianness endianness);
public static double ReadDouble(this BinaryReader binaryReader, Endianness endianness);

And for BinaryWriter:

public static void Write(this BinaryWriter binaryWriter, ushort value, Endianness endianness);
public static void Write(this BinaryWriter binaryWriter, short value, Endianness endianness);
public static void Write(this BinaryWriter binaryWriter, uint value, Endianness endianness);
public static void Write(this BinaryWriter binaryWriter, int value, Endianness endianness);
public static void Write(this BinaryWriter binaryWriter, ulong value, Endianness endianness);
public static void Write(this BinaryWriter binaryWriter, long value, Endianness endianness);
public static void Write(this BinaryWriter binaryWriter, float value, Endianness endianness);
public static void Write(this BinaryWriter binaryWriter, double value, Endianness endianness);

Rationale

BinaryReader and BinaryWriter works really well for accessing Streams but they seems to lack the support for byte-order. And the usage of an enum for representing the Endianness seems more clear to me than using the BitConverter.IsLittleEndian bool.

api-suggestion area-System.IO

Most helpful comment

Triage: There are enough scenarios that we see the value in doing this.

All 21 comments

Sounds like it would overlap quite a bit with BinaryPrimitives?

@khellang Yes, I have implemented this by myself and i have done it with help of BinaryPrimitives.ReverseEndianness();
But as I feel it, seems more clear to use BinaryRW when accessing streams than the way it is done via BinaryPrimitives

I've not worked with many data streams that need to read different endianness data in the same stream.

BinaryReader seems to always read data in little endian format. Some other considerations.

  1. What about "I want to use the endianness of whatever the architecture is"?

  2. Should this be a class-level thing instead of per-method? It might be tedious and error prone to keep having to pass around the right endianness in different methods.

  3. Does it make sense to implement an entirely different type? BigEndianStreamReader or something along those lines since the current one appears to always use little?

Hi @vcsjones I have never worked on something that needs to change the Endianness in the same stream, but it could be a possible (I cant imagine a case like this).

I'm not completely sure, but I think that BinaryReader takes by default the Endianness of the architecture by default (Most arch are little endian).

1.- As I said, I think its the default (What is actualy implemented). But I agree that would be great to have a static property in some class that represents the system information that handles the Endianess. Just like BitConverter.IsLittleEndian but returning the actual Endianness enum value.

2.- I would prefere the API I have described, but I don't think its a bad idea what you said. Something like set the default endianness of the BinaryRW on the constructor. I think that we can have the two things at the same time without cost.

3.- As I said, I think that it gets the default of the architecture, so your case can be handled in the example of 2, putting the default endiannes on the constructor or something like that.

I'm not completely sure, but I think that BinaryReader takes by default the Endianness of the architecture by default (Most arch are little endian).

As far as I'm aware, the .NET CLR itself is little-endian, and the BinaryReader/Writer implementations are built with this in mind, swapping bytes as necessary when running on a big-endian machine. This simplifies running code on the CLR, as endianness is completely taken out of the equation.

@Halofreak1990 If somebody from CoreCLR could provide us a bit more of help

I'm not opposed to having additional members, but I would want to see more feedback to justify the value here. It is relatively trivial to derive from BinaryReader and create an EndianReader and your code would be portable.

@JeremyKuhne For my experience, most of the times I use the BinaryReader and BinaryWriter I need to enforce an endianess. In fact, I have never used the BinaryReader and BinaryWriter without forcing a specific endiannes.

Hi @vcsjones I have never worked on something that needs to change the Endianness in the same stream, but it could be a possible (I cant imagine a case like this).

Just a data point on this, parts of the TDS protocol used in SqlClient require specific and opposite endian-ness

For example, if you have a game data file that needs to be written using different endianness depending on the platform.

If reading/writing the file for Windows, then you may use LittleEndian, and instantiate the BinaryReader/Writer as so. Whereas for the PlayStation 3, the reader/writer would be instantiated with BigEndian.

@Ryder25 that is assuming you'd output your assets in a different endianness for each specific platform, which is something I'd personally never do. Why complicate development, if the framework automagically fixes things for you? .NET, as it currently works, allows one to use little-endian assets, and in code never care about that, unless you need to fiddle with specific bits. For example, a little endian long value in a file will produce the expected value in code, regardless of the endianness of the platform. I'd be inclined to say that, if you really need control of the endiannes on a case by case basis, you really need to re-evaluate what you're doing, because something's out of whack.

I'm not saying I would do it, but I've seen it done before with several games. I assume it is with performance reasons in mind. I can see a case where the game itself is written in C++, and there could be tooling done in C#. At that point, you can't only focus on what's happening in the .Net bcl.

Triage: There are enough scenarios that we see the value in doing this.

@carlossanlop What is required for making this API real?

@deinok in the main description with the API proposal, instead of etc..., can you please list all the extension APIs that you think should receive an endianness argument?

@carlossanlop Done

Passing an extra Endianness argument for each call looks pretty cumbersome.

I agree with @JeremyKuhne that it is trivial to derive from BinaryReader (or include these as extension methods in your project) if you need a reader like this. I do not think that this API belongs to the framework.

There are multiple ways to do this. Here are some different ideas explained by usage examples:

  1. BinaryReader/Writer remains as the LittleEndian version, and BigEndianBinaryReader/Writer are added with BinaryReader/Writer as the base class.
public class BigEndianBinaryReader : BinaryReader { ... }
int Example(Stream in)
{
    // Assume first byte in stream denotes whether the data is little or big endian
    var endianness = in.ReadByte(); // 0 - means little, 1+ means big
    var reader = endianness == 0 ? new BinaryReader(in) : new BigEndianBinaryReader(in);
    using (reader)
    {
        return reader.ReadInt32();
    }
}
  1. BinaryReader/Writer get constructor(s) with a bool isLittleEndian. Then internally it will read and write correctly based on this setting.
int Example(Stream in)
{
    // Assume first byte in stream denotes whether the data is little or big endian
    var endianness = in.ReadByte(); // 0 - means little, 1+ means big
    bool isLittleEndian = endianness == 0;
    using (BinaryReader reader = new BinaryReader(in, isLittleEndian))
    {
        return reader.ReadInt32();
    }
}
  1. A mix of options one and two. Instead of adding isLittleEndian to the constructor of BinaryReader/Writer, another new class is created called EndianBinaryReader/Writer which takes the bool, and internally uses the little or big endian version of the binary readers depending on the setting.

These implementations should add little overhead in terms of creating work to maintain the API. This also prevents someone from having to create their own utility library for this kind of functionality, or from having to copy that code from project to project. Overall this will make it really convenient to read and write binary data for various architectures.

Of course, we'll also have to think about performance when it comes to this API which may change things. I just want to get the ball rolling with some ideas.

I'd have thought that if we add a flag for this it would hit performance as an extra if condition would be needed for every read/write. Would it be better to simply provide a second set of methods for big endian with a BE suffix. Something like this?

public static ushort ReadUInt16(this BinaryReader binaryReader);
public static ushort ReadUInt16Be(this BinaryReader binaryReader);

Missed that fact. I supose that for big frames that if condition would cause a big overhead

A couple observations I would add as someone who's implemented big endian by derivation:

  • BinaryReader has ambiguous endianness, most obviously in that consumers of methods which read multiple bytes have no way of querying for endianness of the bytes they're retrieving. While other read types are little endian by default, this doesn't hold for characters or strings if Encoding.BigEndianUnicode or some other derived, big-endian encoding is passed to the constructor. I suspect these considerations underlie some of what @deinok was referring to about enforcing endianness.
  • Both BinaryReader and BinaryPrimitives provide chatty APIs. From a performance perspective, an efficient implementation is more likely to pass blocks of bytes through Ssse3.Shuffle(Vector128<byte>, Vector128<byte>) or similar and would therefore probably expose methods such as ReadInt32Array(int length). While this amortizes the cost of endianness if statements or vtable lookups it seems more significant BinaryReader and BinaryPrimitives lack such methods.

Given these considerations, my design endpoint here is BinaryReader is primarily helpful as a base for quick big endian implementations but, if one's writing new infrastructure, it's probably preferable to implement an entirely separate class which interacts with a Stream directly.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

btecu picture btecu  路  3Comments

bencz picture bencz  路  3Comments

omajid picture omajid  路  3Comments

matty-hall picture matty-hall  路  3Comments

chunseoklee picture chunseoklee  路  3Comments