Runtime: Proposal: Expose Bit Manipulation functions

Created on 13 Sep 2018  路  102Comments  路  Source: dotnet/runtime

Bit manipulation routines are common enough that we should expose a subset as platform primitives.
While some of them may be simple to write, it's much harder to achieve the requisite performance desired, especially since they tend to be used within tight loops.

The aim of this proposal is to scope & design a minimal set of functions, to be implemented with a bias towards performance. Per @tannergooding

The point of these APIs is to provide a general-purpose API that works on all platforms (which means providing a software fallback) and is generally-usable. Hardware Intrinsics are for performance oriented scenarios where you require hardware acceleration and need more direct control of the code that is emitted.

Note that even though some of the formula may be simple, relevant callsites are more self-documenting when using the intrinsics (should the dev _choose_ to use them).

Scope

  • [x] Consolidate existing callsites in CoreCLR (https://github.com/dotnet/coreclr/pull/22584)
  • [x] Units pass in CoreFX(https://github.com/dotnet/corefx/pull/35193)
  • [x] Expose System.Numerics.BitOperations as a public ref assembly in CoreFX (https://github.com/dotnet/corefx/issues/35419)
  • [ ] Consolidate existing callsites in CoreFX (https://github.com/dotnet/corefx/pull/34917)
  • [ ] Implement proposed methods (this issue)

Rationale and Usage

The proposed functions are already implemented throughout the stack, often with different algorithms, performance characteristics and test coverage.
Existing callsites below: https://github.com/dotnet/corefx/issues/32269#issuecomment-457689128
(There is likely to be more; the initial search was timeboxed to ~1 hour)

Some of the implementation have suboptimal performance or bugs. Something like ExtractBit is trivial to implement, but PopCount is more complex and thus prone to logic and performance issues. Hiding these complex formulae behind friendly signatures makes using them more approachable.

Here's an example of a function (BTC) whose signature is simple but the algebra is easy to get wrong.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool ComplementBit(ref uint value, int bitOffset)
{
    uint mask = 1u << bitOffset;
    bool btc = (value & mask) != 0;

    value = ~(~mask ^ value);

    return btc;
}

However making a call to it meets our goal of abstraction and performance:

uint value = 123;
bool previouslyTrue = BitOperations.ComplementBit(ref value, 6);

Proposed API

The proposed API is purposefully kept lean. We can add more methods in later design iterations. We should view this as an opportunity to get simple, base functionality out the door and not stray into the dangerous territory of adding every bit twiddling hack that exists.

Assume all methods are decorated with [MethodImpl(MethodImplOptions.AggressiveInlining)]

public static class BitOperations
{
    // BT
    bool ExtractBit(byte value, int bitOffset); // Could name this BitTest or TestBit
    bool ExtractBit(uint value, int bitOffset);
    bool ExtractBit(int value, int bitOffset);

    // BTS (scalar)
    byte InsertBit(byte value, int bitOffset); // BitSet or SetBit
    uint InsertBit(uint value, int bitOffset);
    int InsertBit(int value, int bitOffset);

    // True BTS (returns original value)
    bool InsertBit(ref byte value, int bitOffset);
    bool InsertBit(ref uint value, int bitOffset);
    bool InsertBit(ref int value, int bitOffset);

    // BTR
    byte ClearBit(byte value, int bitOffset); // BitReset or ResetBit
    uint ClearBit(uint value, int bitOffset);
    int ClearBit(int value, int bitOffset);

    bool ClearBit(ref byte value, int bitOffset);
    bool ClearBit(ref uint value, int bitOffset);
    bool ClearBit(ref int value, int bitOffset);

    // BTC
    byte ComplementBit(byte value, int bitOffset);
    uint ComplementBit(uint value, int bitOffset);
    int ComplementBit(int value, int bitOffset);

    bool ComplementBit(ref byte value, int bitOffset);
    bool ComplementBit(ref uint value, int bitOffset);
    bool ComplementBit(ref int value, int bitOffset);

    // on ? BTS : BTR
    byte WriteBit(byte value, int bitOffset, bool on);
    uint WriteBit(uint value, int bitOffset, bool on);
    int WriteBit(int value, int bitOffset, bool on);

    bool WriteBit(ref byte value, int bitOffset, bool on);
    bool WriteBit(ref uint value, int bitOffset, bool on);
    bool WriteBit(ref int value, int bitOffset, bool on);
}

Details

  • The focus will be on performance.
  • Ultimately, most of the code should be branchless and leverage intrinsics where possible.
  • Somewhat of an overlap in functionality between InsertBit/ClearBit and WriteBit where the latter conditionally executes the equivalent of either former (using twiddling to do so without branching). But there's enough twiddling in Write to avoid any branching that it's maybe worth keeping both variants.
  • Since these functions are used in performance-sensitive applications, we avoid input checking and exceptions. The current design tries to dispense with the requirement by sharpening input types, contractual assumptions (eg out-of-bounds offset will use mod n in some functions or result in a no-op in others) and specific design choices.

Questions

  • What additional sizes & signs of integers do we support, and in which methods? It looks like int, uint and byte are commonly used. Anything else?

Decisions

  • What namespace this should be in. Decision: namespace System.Numerics.
  • The class name of BitOps is self-documenting, terse (it might be specified frequently in a callsite, if not aliased with using) and both terms are well-known names or abbreviations. Decision: BitOperations
  • Decision: We favor well-known names over the alternatives. For example, PopCount could be called CountSetBits but that's not the common lingo used by twiddlers. Furthermore, intrinsics already expose the well-known names, eg Popcnt.PopCount.
  • Likewise, method names should also be as concise as possible, for example, TrailingZeroCount may be more concisely described as TrailingZeros. Decision: TrailingZeroCount already chosen by a previous PR.
  • Decision: Count-oriented methods such as PopCount return (idiomatic) int, not uint
  • Should offset or position parameters be int or uint. Latter is preferred since negatives not permitted regardless. Decision: int is an idiomatic input/output type in C#.
  • Log(0) is mathematically undefined. Should it return 0 or -1? Decision: Returns 0
  • Do we care about endianness (ie do we need BE and LE variants of relevant methods). The current proposal is only LE, and is shaped in such a way that we are not future-proof wrt supporting BE. Discussion proposes an alternative. For example, LeadingZeros _might_ need a different algorithm on BE/LE platforms. Decision: Endianess only matters if we are reinterpreting integers.

Sample call sites

The following samples are taken from the linked units, from the method BitOps_Samples.
The code chooses values that are easy to eyeball for correctness. The real units cover many more boundaries & conditions.

// ExtractBit: Reads whether the specified bit in a mask is set.
Assert.True(BitOps.ExtractBit((byte)0b0001_0000, 4));
Assert.False(BitOps.ExtractBit((byte)0b0001_0000, 7));

// InsertBit: Sets the specified bit in a mask and returns the new value.
byte dest = 0b0000_1001;
Assert.Equal(0b0010_1001, BitOps.InsertBit(dest, 5));

// InsertBit(ref): Sets the specified bit in a mask and returns whether it was originally set.
Assert.False(BitOps.InsertBit(ref dest, 5));
Assert.Equal(0b0010_1001, dest);

// ClearBit: Clears the specified bit in a mask and returns the new value.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.ClearBit(dest, 3));
// ClearBit(ref): Clears the specified bit in a mask and returns whether it was originally set.
Assert.True(BitOps.ClearBit(ref dest, 3)); 
Assert.Equal(0b0000_0001, dest);

// ComplementBit: Complements the specified bit in a mask and returns the new value.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.ComplementBit(dest, 3));
// ComplementBit(ref): Complements the specified bit in a mask and returns whether it was originally set.
Assert.True(BitOps.ComplementBit(ref dest, 3));
Assert.Equal(0b0000_0001, dest);

// WriteBit: Writes the specified bit in a mask and returns the new value. Does not branch.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.WriteBit(dest, 3, on: false));
// WriteBit(ref): Writes the specified bit in a mask and returns whether it was originally set. Does not branch.
Assert.True(BitOps.WriteBit(ref dest, 3, on: false));
Assert.Equal(0b0000_0001, dest);

Updates

  • Original issue authored by @mburbea here: Proposal: Add a BitManipulation class
  • Initial proposal submitted
  • Methods such as InsertBit accepted a bool that determined whether it set or cleared the bit in question. Such methods have now been refactored in two, InsertBit and ClearBit.
  • Some name changes based on suggestions (eg FlipBit became ComplementBit).
  • TestAndSet methods have been refactored into simple scalar functions, for performance.
  • The offset parameter is int instead of byte. All POC units pass.
  • Added WriteBit(value, offset, bool on) overloads that conditionally set/clear the specified bit.
  • Added ExtractByte and friends
  • Added Evaluate(bool) (used internally, so may as well expose it)
  • Removed all Span<T> overloads per @tannergooding advice. Will maybe submit in separate proposal
  • Moved TrailingOnes and LeadingOnes into to do later proposal in comments
  • Added Log2
  • Removed redundant overloads
  • Removed doc-comments so spec is easier to read
  • Added byte overloads to all crud methods based on code analysis of existing callsites. eg eg bool ExtractBit(byte value, int bitOffset)
  • Added Scope section at top of spec
  • Separated calls into existing and proposed sections
  • Added task list, worded spec in a more concise manner
  • Moved RotateByte, etc into to do later proposal in comments
api-needs-work area-System.Numerics

Most helpful comment

Motivation and duplicate implementations of the proposed methods.
Initial search was timeboxed to 1 hour, so there is likely to be more.

OTHER
Found quite a few more callsites whilst implementing the PR.
See the code for more details.

POPCNT
https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/BitArithmetic.cs
https://github.com/dotnet/corert/blob/cf5dc501e870b1efe8cba3d6990752538d174773/src/System.Private.CoreLib/shared/System/Buffers/Text/FormattingHelpers.CountDigits.cs
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1002
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1277

LZCNT
https://github.com/dotnet/corefx/blob/62c70143cfbb08bbf03b5b8aad60c2add84a0d9e/src/Common/src/CoreLib/System/Number.BigInteger.cs#L700
https://github.com/dotnet/coreclr/pull/22100
https://github.com/dotnet/coreclr/pull/19006
https://github.com/dotnet/corefx/blob/151a6065fa8184feb1ac4a55c89752342ab7c3bb/src/Common/src/CoreLib/System/Decimal.DecCalc.cs#L728
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1254 (LeadingOnes)

TZCNT
https://github.com/dotnet/coreclr/pull/22118
https://github.com/dotnet/corert/blob/db8723b1787e18f0124e94c387542d0ef3aac128/src/System.Private.CoreLib/shared/System/BitOps.cs#L16

ROTL/ROTR
https://github.com/dotnet/coreclr/issues/21583
https://github.com/dotnet/coreclr/pull/1830
https://github.com/dotnet/corert/blob/87e58839d6629b5f90777f886a2f52d7a99c076f/src/System.Private.CoreLib/src/System/Marvin.cs#L120-L124
https://github.com/dotnet/coreclr/issues/1619
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L972
https://github.com/dotnet/corert/blob/1bb85b989d9b01563df9eb83dce1c6a3415ce182/src/ILCompiler.ReadyToRun/src/Compiler/ReadyToRunHashCode.cs#L214
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/Common/src/Internal/NativeFormat/TypeHashingAlgorithms.cs#L20
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/Common/src/Internal/Text/Utf8String.cs#L47
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/Internal/Runtime/CompilerServices/OpenMethodResolver.cs#L178
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/System/RuntimeMethodHandle.cs#L67
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/ILCompiler.Compiler/src/Compiler/DependencyAnalysis/NodeFactory.NativeLayout.cs#L345
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.TypeLoader/src/Internal/Runtime/TypeLoader/TypeLoaderEnvironment.NamedTypeLookup.cs#L121
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/Internal/Runtime/CompilerServices/OpenMethodResolver.cs#L178
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/System/RuntimeFieldHandle.cs#L47
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.TypeLoader/src/Internal/Runtime/TypeLoader/TypeLoaderEnvironment.NamedTypeLookup.cs#L121

InsertBit
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1013
https://github.com/dotnet/corefx/blob/bd414c68872c4e4c6e8b1a585675a8383b3a9555/src/System.DirectoryServices.AccountManagement/src/System/DirectoryServices/AccountManagement/Utils.cs#L51
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L51
https://github.com/dotnet/wpf/blob/2cbb1ad9759c32dc575c7537057a29ee7da2e1b2/src/Microsoft.DotNet.Wpf/src/System.Xaml/System/Xaml/Schema/Reflector.cs#L363

ClearBit
https://github.com/dotnet/corefx/blob/bd414c68872c4e4c6e8b1a585675a8383b3a9555/src/System.DirectoryServices.AccountManagement/src/System/DirectoryServices/AccountManagement/Utils.cs#L46
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1314
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L45
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1020

ExtractBit
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L57
https://github.com/dotnet/wpf/blob/2cbb1ad9759c32dc575c7537057a29ee7da2e1b2/src/Microsoft.DotNet.Wpf/src/System.Xaml/System/Xaml/Schema/Reflector.cs#L334
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1294

ComplementBit
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1317
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L196

All 102 comments

cc @mburbea, @tannergooding , @CarolEidt

Please drop the in modifier from the parameters. These are all operating on primitives and passing them around by reference is going to be generally less efficient than just passing them by-value (which should happen in register).

Additional Comments/Thoughts:

  • Whether or not these methods take a ref and return the original value of the bit or just return the modified value will likely need more discussion (possibly just in the API design review).
  • I'm not sure I like the methods taking a bool.

    • InsertBit(value, offset, on) might be better as InsertBit(value, offset) and ResetBit(value, offset) (or ClearBit, ZeroBit, etc).

    • Likewise, LeadingCount and TrailingCount are likely better split into LeadingZeroCount, TrailingZeroCount, LeadingOneCount, and TrailingOneCount

  • I think NegateBit may be more "better" than FlipBit (or ComplementBit)
  • System.BitOps seems appropriate as a namespace (and lets this live next to BitConverter). Especially since several of these may get used in the Runtime/Framework itself
  • I imagine we will want to just provide overloads for the signed types

I'm not sure I like the methods taking a bool.

I went back and forth on that one. The bool makes the api tighter. But it means that ultimately the implementation has branching instructions, which means perf suffers. So I am happy to refactor.

Whether or not these methods take a ref and return the original value of the bit or just return the modified value will likely need more discussion (possibly just in the API design review).

I designed this based on a previous suggestion of yours, but I am happy to revert. Perhaps you meant ExchangeBit (BTS/BTR,BTC) to be a separate additional methods? Otherwise we could return a tuple but that has more overhead.

I think NegateBit may be more "better" than FlipBit (or ComplementBit)

I will make that change

@grant-d
public static bool InsertBit(ref uint value, byte offset);

Will JIT be able to optimize it properly if the value is an enregistered local?
Like in this code:
C# int M(uint value) { return InsertBit(in value, 3) ? 5 :6; }

It needs to be a ref, ie:

private static int M(uint value) => InsertBit(ref value, 3, true) ? 5 : 6; 

I don't have enough knowledge of what the optimizer would do, someone else may be able to advise us.

Um, ExtractBit takes byte and ushort, but neither of those are present for InsertBit/ClearBit (instead there's int and long)?

These need to be passed by value. Passing things around by ref or in will make it more difficult to optimize in the long run. If you need a variant that both modifies the value and returns whether a bit was set, it should be a different variant, and the "normal" form of InsertBit should just return the modified value.

Sounds good, I have made the change. @tannergooding, let me know if you think the BTS/BTR/BTC variants need to be in the first deliverable.

Will JIT be able to optimize it properly

It will now - I have changed all ref type methods to be scalar functions; see notes in the spec.

API review probably won't like the contraction BitOps. BitOperations is not so bad or possibly Bits even...

I like Bits, I have added all of these to the naming suggestions.

If you need a variant that both modifies the value and returns whether a bit was set, it should be a different variant.

Done - both variants exist.

Note to self:

using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;

This is pointless unless you also the attribute for AggressiveInlining as the function call overhead will eat your gains otherwise. Therefore, AggressiveInlining should be contractual on these.

AggressiveInlining should be contractual

The POC already does so. I will add a note to the spec

This is pointless unless you also the attribute for AggressiveInlining as the function call overhead will eat your gains otherwise.

Many of these functions will be treated as intrinsic and will be specially handled by the JIT to emit a single instruction, regardless of what the actual method body contains for the software fallback.

The JIT is also very good at inlining small functions, regardless of whether or not the AggressiveInlining attribute exists.

InsertBit(value, offset, on) is needed because sometimes the on value is dynamic. There should be separate SetBit and ClearBit methods.

@GSPP can this not be handled from the callsite.
if (on) SetBit(i) else ClearBit(i)
I am trying to keep the API surface as concise as possible, let me know if you have a compelling reason

@grant-d: InsertBit can be implemented branchlessly on any reasonably CPU.

InsertBit is already implemented without branching, see POC code.

The question was whether there could be a mutator that accepts a bool, then conditionally inserts or extracts. My preference is to externalize branching (at the callsite).

And I'm saying the dynamic set-clear version can be implemented w/o branching.

ulong mask = 1 << i;
ulong bit = (ulong)on << i;
return (value & ~mask) | bit;

Got you. Thanks for the gist. I still am curious whether the 'dynamic' use case is compelling enough to have a somewhat overlapping API?

@jhudsoncedaron, just because you can implement that in software, doesn't necessarily mean it is (necessarily) a good API to expose here or that it will be efficient.

Many of these functions will be treated as intrinsic, which means that they will will compile down to a single CPU instruction (rather than 5+ instructions). As an example, we can use the BSF (Bit Scan Forward), BSR (Bit Scan Reverse), BT (Bit Test), BTC (Bit Test and Complement), BTR (Bit Test and Reset), BTS (Bit Test and Set), LZCNT (Count the Number of Leading Zero Bits), and POPCNT (Return the Count of Number of Bits Set to 1) instructions on x86/x64 to make these super efficient.

Also, I don't think you can convert a bool to ulong. You will get error CS0030: Cannot convert type 'bool' to 'ulong'

I only ever use the dynamic version of InsertBit. That might not be valid C# but the algorithm is correct anyway. bools are stored as either a 1 or a 0 so it's just a matter of convincing the compiler to emit the right code. The harder this is to do, the stronger the argument for putting it in the library.

I tend to agree with @jhudsoncedaron that the dynamic version is useful. For this particular case if we really wanted to minimize API surface area I might vote for just the dynamic case, since I would expect the value to be either clearly a constant or not. And it would be easier for the jit to make that optimization than to recognize a sequence with branches.

[Edited]

I assumed the gist was pseudocode; in C# we can do one of the following.
Micro benchmarking shows better perf (0.6x) than the idiomatic expression.

return condition ? 37 : 0; // 1.0x (idiomatic)

return (*(byte*)&condition) * 37; // 0.95x (cannot be inlined)
return Unsafe.As<bool, byte>(condition) * 37; // 0.60x
return new UnionStruct { Bool = condition }.Value * 37; // 0.76x (ExplicitLayout, etc)

This relies on internal knowledge of bool but I guess that's no different to the rest of twiddling where we rely on internal knowledge of bit layouts? I recall (perhaps wrongly) something about VB (maybe v6, maybe .Net) using -1 for false. But since this is ultimately CLI, I guess that's not a concern.

That being said. I started the spec out with the dynamic signature, and followed a suggestion to move to explicit. I don't mind one or the other (or both) but we need a decision here. Maybe I should just add both and we can see which one gets downvoted?

VB .NET did in fact use -1 for booleans; however this technique would reveal 1 anyway. It really wrote 1 to the underlying type to avoid driving the framework code bonkers. (Go ahead, use this unsafe to make a bool with a value of 3 and watch things fail in amusing ways.)

[Edited]
OK, added conditional overrides to the spec.
Here's the POC implementation.

  • Let's wait for review & voting to see what we keep. There's enough twiddling to avoid branching that I would tend to agree it's a useful overload.
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static byte WriteBit(byte value, int offset, bool on)
        {
            int shft = offset & 7;
            uint mask = 1U << shft;

            uint onn = Unsafe.As<bool, uint>(on); // true ? 1 : 0
            onn <<= shft;

            return (byte)((value & ~mask) | onn);
        }

Added ExtractByte and friends

@tannergooding

Many of these functions will be treated as intrinsic and will be specially handled by the JIT to emit a single instruction, regardless of what the actual method body contains for the software fallback.

True, but you are assuming a single JIT (Microsoft) and also that the caller function didn't trip on a particular (warranted or not) limit to inlining. There is actually no shortcoming for contractually ensure Bits methods must be inlined; as implementing 90%+ of those methods on a loop without inlining would kill you performance wise.

There's an ambiguity in some of these signatures that I am trying to get my head around.
It results in corrupt data.

TLDR
Some methods are not safe under implicit upcast operations. This can lead to ambiguity and corruption.
For example foo(123u) != foo((byte)123).
Suggestion is to name all such methods explicitly to avoid said issue:
foo8(), foo16(), foo32(), foo64()

Detail
For example, I want to rotate a byte value left. I have a byte in hand, and pass it to the function:

byte foo = 0b_1000_0001;
var result = RotateLeft(foo, 2); // = 0000_0110

Yay, that worked. A few milestones later, but this time I still have a byte value but I happen to have it stored in a uint. Maybe it was a parameter originally declared as byte and someone else changed it. Or maybe that's how it came off the wire or protobuf. Or maybe I am being careful about alignment. Either way I know it's a byte and thus expect byte semantics.

uint foo = 0b_1000_0001; // Same byte value, upcast to uint
// Implicit upcast - my callsite did not change:
var result = RotateLeft(foo, 2); // = 0000_0010_0000_0100

Different answer. Ultimately silent corruption.

Of course the easy workaround is:

uint foo = 0b_1000_0001;
var result = RotateLeft((byte)foo, 2); // = 0000_0110

But that means I will need to use this pattern at _all_ my callsites. That is untenable.

It seems the shape of this kind of method makes it easy to call the wrong overload, despite the niceties of having a method group. (Note that all 4 overloads are necessary because they behave differently, and furthermore we are trying to avoid throwing exceptions in guard clauses.)

Suggestion: Maybe we should distinguish such methods explicitly, something like:

uint foo = 0b_1000_0001; // Unsigned integer of some bitsize
var result1 = RotateLeft8(foo, 2); // May support signed overloads too, so suffix is `8` not `UInt8`
var result2 = RotateLeft16(foo, 2); // A nice side effect is terser names
var result3 = RotateLeft32(foo, 2); // See next code block
var result4 = RotateLeft64(foo, 2);

This actually woks nicely should we decide to support signed integers:

int foo = 0b_1000_0001; // Signed integer of some bitsize
var result1 = RotateLeft8(foo, 2); // Suffix of '8' works nicely for byte|sbyte
var result2 = RotateLeft16(foo, 2);
var result3 = RotateLeft32(foo, 2);
var result4 = RotateLeft64(foo, 2);

Note that this does _not_ apply to all method groups. For example:

byte foo = 0b_1000_0001;
var result1 = PopCount(foo); // = 2
var result2 = PopCount((ulong)foo); // = 2

Implicit upcast is safe here; we get the same answer both ways.

Here's a different example. Once again I have a byte in hand:

byte foo = 0b_1000_0000;
// By convention, shift operators are mod-bitsize, so 15 is exactly equivalent to 15 % 8 == 7
var result = ExtractBit(foo, 15); // == 1
// In other words this is exactly equivalent to calling:
result = ExtractBit(foo, 7); // == 1

Now the parameter changes or whatever, and I now have a byte by happenstance stored in a uint:

uint foo = 0b_1000_0000;
// Here, 15 is exactly equivalent to 15 % 32 == 15
var result = ExtractBit(foo, 15); // == 0

We get a different answer.
Per previous suggestion, we'd need to distinguish these methods too:

uint foo = 0b_1000_0000;
var result1 = ExtractBit8(foo, 15); // May support signed overloads too, so suffix is 8 not UInt8
var result2 = ExtractBit16(foo, 15);
var result3 = ExtractBit32(foo, 15);
var result4 = ExtractBit64(foo, 15);

Considering the question on Big/Little Endian. If we ship this api shape as-is (which is only LE), we are not future proof in consideration of one day deciding to support BE. An alternative to the proposed design is to follow the pattern employed in classes like Encoding where we have an abstract base class that contains (in this case) two singletons, ostensibly named LE and BE:

public abstract class BitOps
{
    public static BitOps BE { get; }
    public static BitOps LE { get; }

    protected BitOps()
    { }
}

Since the number of overloads is so high (explosion due to byte, ushort, uint, ulong as well as Span<T> combinatorials), perhaps we can extend this idea even further:

public abstract class BitOps<T> where T : unmanaged
{
    public static BitOps<byte> ByteBE { get; }
    public static BitOps<ushort> UInt16BE { get; }
    public static BitOps<uint> UInt32BE { get; }
    public static BitOps<ulong> UInt64BE { get; }

    public static BitOps<sbyte> SByteBE { get; }
    public static BitOps<short> Int16BE { get; }
    public static BitOps<int> Int32BE { get; }
    public static BitOps<long> Int64BE { get; }
}

@grant-d : I don't get it. Aren't all of these operating on integers of the appropriate size? Shouldn't those already be in the correct endian?

[Edited]
My concern (perhaps unwarranted, please correct me) is that when inserting an int into, say, a Span<byte> then LE/BE may be an issue?
Also, for functions like LeadingZeros, does BE/LE not affect the algorithm since assumptions about the bit layout are perhaps invalid.

I see it now. Because the JIT can't inline function calls it can't statically resolve, I think we'd be much better off with

public static BitOps<byte> ByteBE { get; }
public static BitOps<ushort> UInt16BE { get; }
public static BitOps<uint> UInt32BE { get; }
public static BitOps<ulong> UInt64BE { get; }

public static BitOps<sbyte> SByteBE { get; }
public static BitOps<short> Int16BE { get; }
public static BitOps<int> Int32BE { get; }
public static BitOps<long> Int64BE { get; }

and use #if at compile time of .NET Core to pick the correct implementation.

(sorry; I copied the wrong code block; this was intended to be the ExtractInt16, InsertInt16, etc. functions)

I am leaning that way too. Though I am not sure about the #if.
Is there a use-case for being on a LE platform but wanting to do BE operations. Perhaps because it's network format, or transpilation, or ...

Correct; when messing with file formats we want the file format's bit order. Lots of file formats are in BE order now because classically we had htons/l/q to ensure BE order but no functions to ensure LE order.

Almost no operation out there in the framework is Endian aware AFAIK with the exception of a few HostToNetwork usually in networking classes.
Encoding the endianness into the method is unseen on the framework as far as I can remember. (That being said, I wouldn't start now if it would be my decision). IMHO it is the 'implementors' burden to deal with differences in endianess... because as you dont touch the disk or/the network there is really no difference for you. So it is an interoperation thing only.

@redknightlois : I guess you don't remember the BitConverter madness when mono was ported to Alpha. It turned out almost all the calls to BitConverter in application code wanted a specific endian rather than machine endian.

Yes, endianess would be a concern. However, only the Span<T> operations really have to worry about it, which probably means it should be a parameter?

On that note:

Suggestion: Maybe we should distinguish such methods explicitly, something like:

uint foo = 0b_1000_0001;
var result1 = RotateLeft8(foo, 2); // May support signed overloads too, so suffix is 8 not UInt8
var result2 = RotateLeft16(foo, 2); // A nice side effect is terser names
var result3 = RotateLeft32(foo, 2);
var result4 = RotateLeft64(foo, 2);

The problem I have for this is that it doesn't completely solve your problem, it just spins a new one: what happens if there are bits set in the upper registers? There's a discoverability issue - the method may sound like it rotates bits by 8, with an offset. Also, my immediate reaction is, "why is it only ever the lowest n?" - I think I want something like:

uint RotateWindowLeft(uint value, int bits, int start_index = 0, int window_length = 32);



Some of this is a reaction to:

Yay, that worked. A few milestones later, but this time I still have a byte value but I happen to have it stored in a uint. Maybe it was a parameter originally declared as byte and someone else changed it. Or maybe that's how it came off the wire or protobuf. Or maybe I am being careful about alignment. Either way I know it's a byte and thus expect byte semantics.

uint foo = 0b_1000_0001; // Same byte value, upcast to uint
var result = RotateLeft(foo, 2); // = 0000_0010_0000_0100

Different answer. _Ultimately silent corruption._
(Emphasis mine)
...uh, shouldn't your unit tests catch that kind of thing?

Of your mentioned reasons for the change, only stored alignment strikes me as particularly relevant - and at that, that you're overriding/ignoring the native alignment behavior of the framework. At which point, I'm more likely to want to be casting on each operation anyways, since any values outside the 'real' size should be invalid, and should be truncated.

If this is going to go to API review, it needs to stop being modified. The larger the proposal is, the less likely it will get reviewed in normal triage and the more likely it will get pushed back to a special triage session (which can take much longer to schedule/review/approve/etc).

I would say that, the following should be considered out of scope and should be handled in a new/separate proposal:

  • Evaluate
  • IsPowerOf2
  • Parity
  • Anything that takes a Span<T>, rather than a primitive

This leaves:

  • ExtractBit
  • InsertBit
  • ClearBit
  • ResetBit
  • ComplementBit
  • ReadBit
  • WriteBit
  • RotateLeft
  • RotateRight
  • PopCount
  • LeadingZeros
  • TrailingZeros
  • LeadingOnes
  • TrailingOnes

This is still a fairly large proposal and covers a number of items that we currently know have use-cases both in and outside the framework.
These, for the most part, are "intrinsic" operations that most (modern) platforms support as a single operation or can be sufficiently accelerated.
These do not have to worry about any special semantics, such as whether it is big/little endian or how it operates on larger data inputs

Thanks @tannergooding , that gives me relief. The surface has been growing and I was not sure how to scope it back down. I will trim the spec.
[Edit] Done

For posterity, here's the trimmed-out spec that we can maybe submit it in a later proposal.
Note that Evaluate (Edit: now called If) is used internally, so it's useful to expose for external use too.

public static partial class BitOps // .Span and sundry
{
    /// <summary>
    /// Count the number of leading one bits in a mask.
    /// </summary>
    /// <param name="value">The mask.</param>
    byte LeadingOnes(byte value);
        // For brevity, ushort|uint|ulong overloads not shown

    /// <summary>
    /// Count the number of trailing one bits in a mask.
    /// </summary>
    /// <param name="value">The mask.</param>
    byte TrailingOnes(byte value);
        // For brevity, ushort|uint|ulong overloads not shown

    /// <summary>
    /// Converts a bool to an integer value, without branching.
    /// Returns <paramref name="trueValue"/> if True, else returns <paramref name="falseValue"/>.
    /// </summary>
    /// <param name="condition">The value to convert.</param>
    /// <param name="trueValue">The value to return if True.</param>
    /// <param name="falseValue">The value to return if False.</param>
    uint Iff(bool condition, uint trueValue, uint falseValue);
    uint Iff(bool condition, uint trueValue);
        // For brevity, int|long|ulong overloads not shown
        // byte and ushort overloads are redundant
        // Benchmark: 0.64 cost of idiomatic bool expression (condition ? 1 : 0)
    byte AsByte(bool condition); // Normalized (see notes). Naming TBD
    byte AsByte(ref bool condition);

    /// <summary>
    /// Returns 1 if <paramref name="value"/> is non-zero, else returns 0.
    /// Does not incur branching.
    /// Logically equivalent to CMOVNZ.
    /// </summary>
    /// <param name="value">The value to inspect.</param>
    public static byte Any(ushort value);
        // For brevity, short|int|uint|long|ulong overloads not shown

    /// <summary>
    /// Returns 1 if <paramref name="value"/> is zero, else returns 0.
    /// Does not incur branching.
    /// Logically equivalent to CMOVZ.
    /// </summary>
    /// <param name="value">The value to inspect.</param>
    public static byte NotAny(ushort value);
        // For brevity, short|int|uint|long|ulong overloads not shown

    /// <summary>
    /// Returns 1 if the bit count is odd, else 0.
    /// Logically equivalent to PopCount mod 2.
    /// </summary>
    /// <param name="value">The value.</param>
    int Parity(uint value);
    int Parity(ulong value);
        // byte and ushort overloads are redundant

    /// <summary>
    /// Returns True if the value is a power of 2, else False.
    /// </summary>
    /// <param name="value">The value.</param>
    bool IsPowerOf2(uint value);
    bool IsPowerOf2(ulong value);
        // byte and ushort overloads are redundant

    /// <summary>
    /// Extracts a byte value from a longer integer.
    /// </summary>
    /// <param name="value">The bit mask.</param>
    /// <param name="bitOffset">The ordinal position to read.</param>
    byte ExtractByte(ushort value, int bitOffset);
    byte ExtractByte(uint value, int bitOffset);
    byte ExtractByte(ulong value, int bitOffset);

    /// <summary>
    /// Inserts a byte value into a longer integer.
    /// </summary>
    /// <param name="value">The bit mask.</param>
    /// <param name="bitOffset">The ordinal position to write.</param>
    /// <param name="insert">The value to insert.</param>
    ushort InsertByte(ushort value, int bitOffset, byte insert);
    uint InsertByte(uint value, int bitOffset, byte insert);
    ulong InsertByte(ulong value, int bitOffset, byte insert);

    ushort ExtractUInt16(uint value, int bitOffset);
    ushort ExtractUInt16(ulong value, int bitOffset);

    uint InsertUInt16(uint value, int bitOffset, ushort insert);
    ulong InsertUInt16(ulong value, int bitOffset, ushort insert);

    uint ExtractUInt32(ulong value, int bitOffset);
    ulong InsertUInt32(ulong value, int bitOffset, uint insert);

    /// <summary>
    /// Reads whether the specified bit in a mask is set.
    /// </summary>
    /// <param name="value">The mask.</param>
    /// <param name="offset">The ordinal position of the bit to read.</param>
    bool ExtractBit(ReadOnlySpan<byte> value, uint offset);
        // For brevity, ushort|uint|ulong overloads not shown

    /// <summary>
    /// Sets the specified bit in a mask and returns whether the bit was originally set.
    /// </summary>
    /// <param name="value">The mask.</param>
    /// <param name="bitOffset">The ordinal position of the bit to set.</param>
    bool InsertBit(Span<byte> value, uint bitOffset);
        // For brevity, ushort|uint|ulong overloads not shown

    /// <summary>
    /// Clears the specified bit in a mask and returns whether the bit was originally set.
    /// </summary>
    /// <param name="value">The mask.</param>
    /// <param name="bitOffset">The ordinal position of the bit to clear.</param>
    bool ClearBit(Span<byte> value, uint bitOffset);
        // For brevity, ushort|uint|ulong overloads not shown

    /// <summary>
    /// Complements the specified bit in a mask and returns whether the bit was originally set.
    /// </summary>
    /// <param name="value">The mask.</param>
    /// <param name="bitOffset">The ordinal position of the bit to complement.</param>
    bool ComplementBit(Span<byte> value, uint bitOffset);
        // For brevity, ushort|uint|ulong overloads not shown

    /// <summary>
    /// Reads the specified byte from a span, given the bit offset.
    /// </summary>
    /// <param name="span">The span.</param>
    /// <param name="bitOffset">The ordinal position to read.</param>
    byte ExtractByte(ReadOnlySpan<byte> span, int bitOffset);
    byte ExtractByte(ReadOnlySpan<ushort> span, int bitOffset);
    byte ExtractByte(ReadOnlySpan<uint> span, int bitOffset);
    byte ExtractByte(ReadOnlySpan<ulong> span, int bitOffset);

    ushort ExtractUInt16(ReadOnlySpan<byte> span, int bitOffset);
        // For brevity, ushort|uint|ulong overloads not shown

    uint ExtractUInt32(ReadOnlySpan<byte> span, int bitOffset);
        // For brevity, ushort|uint|ulong overloads not shown

    ulong ExtractUInt64(ReadOnlySpan<byte> span, int bitOffset);
        // For brevity, ushort|uint|ulong overloads not shown

    /// <summary>
    /// Writes the specified value to a span, given the bit offset, and returns the original value.
    /// </summary>
    /// <param name="span">The span.</param>
    /// <param name="bitOffset">The ordinal position to write.</param>
    /// <param name="value">The value to write.</param>
    byte InsertByte(Span<byte> span, int bitOffset, byte value);
    byte InsertByte(Span<ushort> span, int bitOffset, byte value);
    byte InsertByte(Span<uint> span, int bitOffset, byte value);
    byte InsertByte(Span<ulong> span, int bitOffset, byte value);

    byte InsertUInt16(Span<byte> span, int bitOffset, ushort value);
        // For brevity, ushort|uint|ulong overloads not shown

    byte InsertUInt32(Span<byte> span, int bitOffset, uint value);
        // For brevity, ushort|uint|ulong overloads not shown

    byte InsertUInt64(Span<byte> span, int bitOffset, ulong value);
        // For brevity, ushort|uint|ulong overloads not shown

    // Alternative pattern that mitigates throwing OutOfRange exceptions
    bool TryInsertByte(Span<byte> span, int bitOffset, byte value);
    bool TryExtractByte(ReadOnlySpan<byte> span, int bitOffset, out byte value);

    // Since operand size DOES affect result, it would be a breaking change to add these later.
    // eg ROTL32(123u) != ROTL8((byte)123)
    // Unless we (later) add them with a distinguished name, eg RotateLeft8, RotateLeft16

    byte RotateLeft(byte value, int bitOffset);
    ushort RotateLeft(ushort value, int bitOffset);

    byte RotateRight(byte value, int bitOffset);
    ushort RotateRight(ushort value, int bitOffset);

    int LeadingZeroCount(byte value);
    int LeadingZeroCount(ushort value);
}

Questions

  • Support ReadOnlySpan for getters, and Span for mutators. Not sure if we need to explicitly support Span for getters too; I would hope not.
  • Should we support the TryRead/Write pattern. Versus the current shape which is Read/Write but then has to throw for OutOfRange conditions. In other words, since this is a perf-sensitive feature, using Try helps avoid exceptions.
  • For Span overloads, is there a need for overloads that work atomically, eg LeadingZeros(new byte[50]{ }) == 400. Versus overloads that index into a specific byte. Suggestion: No.
  • See conversation item regarding methods like RotateLeft being ambiguous.
  • The following style of method group may have redundancy, since the caller can convert the latter 3 to Span<byte> first?
    byte ExtractByte(ReadOnlySpan<byte> span, int bitOffset);
    // Redundant?
    // Caller can use MemoryMarshal.AsBytes(span) first, though they then have
    // to take care of the offset math themselves. Unless we provide an offset helper.
    byte ExtractByte(ReadOnlySpan<ushort> span, int bitOffset);
    byte ExtractByte(ReadOnlySpan<uint> span, int bitOffset);
    byte ExtractByte(ReadOnlySpan<ulong> span, int bitOffset);
  • BitConverter has already been span-ified, and as such has an overlapping api surface. For example, bool TryWriteBytes(Span<byte> destination, long value) is similar to signatures here.

...shouldn't your unit tests catch that kind of thing?

I agree with the some of your feedback, but the above is a bad developer experience. If I call Math.Max and incur an unexpected implicit upcast, the result typically doesn't change. But with this specific set of functions, it does. Unfortunately, not all consumers of .Net write units.
That design is not the pit of success.

...what implicit upcast? Each of the primitive types has its own copy of RotateLeft, so it would return the same type as was passed in. If you're worried about the cast when assigning back to the same variable, Math.Max would run into the same problem.

No I am not worried about the re-assignment. I am worried that in terms of overload resolution, the result feels a little non-deterministic. I will try another example:

void Foo(byte n) // <-- Note byte param
{
   // a mile of code
   var x = RotateLeft(n, 17); // Produces 123
   var y = Math.Min(n, 17); // Produces 17
}

Sometime later, someone else modifies the signature:

void Foo(ulong n) // <-- Note byte became uint or ulong
{
   // a mile of code 
   var x = RotateLeft(n, 17); // Produces 456 <-- Different result
   var y = Math.Min(n, 17); // Produces 17 <-- Same result
}

Calling RotateLeft32(n, 2) would have produced the same correct result deterministically.

Encoding the size of the type on the signature in cases where they lead to inconsistent results is not unheard of. IMHO it is a sensible tradeoff.

I would say that we leave the APIs as is (all called Rotate) and that it can be discussed further during the design-review.

Changing the signature of an API (from uint to ulong, for example) can naturally cause downstream breaks and BitOps would be, by far, not the only thing that has a potential for different behavior.

OK, I will leave the signatures as-is.

Changing the signature of an API (from uint to ulong, for example) can naturally cause downstream breaks and BitOps would be, by far, not the only thing that has a potential for different behavior.

Oops, bad example. My concern is with _implicit_ integer upcasts. Changed my code sample above to use byte instead of uint, since it's less of a problem between uint and ulong (for which I agree with your statement that's it's an established paradigm)

FYI. Following is the POC implementation of Iff (_was_ called Evaluate).
It's already used in internal code; it may be useful to expose as public?

[Edited]

// Converts a boolean to a byte value without branching.
// Assume AggressiveInlining
public static byte AsByte(ref bool condition)
{
    int val = Unsafe.As<bool, byte>(ref condition); // CLR permits 0..255

    // Normalize bool's underlying value to 0|1
    val = -val; // Negation will set sign-bit iff non-zero
    val >>= 31; // Send sign-bit to lsb (all other bits will be thus zero'd)

    return (byte)val; // 0|1
}

public static uint Iff(bool condition, uint trueValue)
{
    uint val = AsByte(ref condition);
    return val * trueValue;

    // FYI this is slower
    //uint mask = AsByte(ref condition) - 1u; // 0x00000000 | 0xFFFFFFFF
    //return ~mask & trueValue;
}

public static uint Iff(bool condition, uint trueValue, uint falseValue)
{
    uint val = AsByte(ref condition);
    return (val * trueValue) + (1 - val) * falseValue;
}

// For brevity, int|long|ulong overloads not shown

Usage:

var foo = BitOps.Iff(n > 0, 33); // Returns 33 if true, else 0

Example of internal usage:

public static ushort WriteBit(ushort value, int bitOffset, bool on)
{
    int shft = bitOffset & 15;
    uint mask = 1U << shft;

    uint onn = AsByte(ref on); // <-- Note, need to guarantee this is 0|1
    onn <<= shft;

    return (ushort)((value & ~mask) | onn);
}

Benchmark results (post bool-normalization change)

      Method |     Mean |     Error |    StdDev |   Median | Scaled |
------------ |---------:|----------:|----------:|---------:|-------:|
      Branch | 4.449 ns | 0.0890 ns | 0.2297 ns | 4.367 ns |   0.98 | x
  UnsafeCode | 5.645 ns | 0.0117 ns | 0.0091 ns | 5.645 ns |   1.24 |
    UnsafeAs | 4.552 ns | 0.0186 ns | 0.0156 ns | 4.552 ns |   1.00 | x Same perf as Branch
 UnsafeAsRef | 2.529 ns | 0.0157 ns | 0.0131 ns | 2.522 ns |   0.56 | xx Looking at this too
 UnionStruct | 5.646 ns | 0.0155 ns | 0.0137 ns | 5.644 ns |   1.24 |

Benchmark results (pre bool-normalization change)

      Method |     Mean |     Error |    StdDev | Scaled |
------------ |---------:|----------:|----------:|-------:|
      Branch | 3.923 ns | 0.0884 ns | 0.1322 ns |   1.50 | !
  UnsafeCode | 3.431 ns | 0.0358 ns | 0.0334 ns |   1.31 |
    UnsafeAs | 2.629 ns | 0.1356 ns | 0.1392 ns |   1.00 | x Much faster than Branch
 UnsafeAsRef | 2.329 ns | 0.0178 ns | 0.0149 ns |   0.89 | xx
 UnionStruct | 3.505 ns | 0.1408 ns | 0.1446 ns |   1.34 |

@grant-d this If would not support boolean values bigger than 1. Are they simply not supported? They are legal on the CLR. The C# specification ignores their existence.

One could fix it by doing !!condition with unclear performance cost.

If could also be:

var mask = (uint)AsByte(ref condition) - 1; //0x00000000 or 0xFFFFFFFF;
return (mask & falseValue) | (~mask & trueValue);

@GSPP: If you stuff a 2 into a bool, either the C# compiler or the Jitter (not sure which) generates incorrect code for the xor operator or select statement on bools. Ergo bool must be either 0 or 1.

Linky: https://stackoverflow.com/questions/48344107/create-bool-value-which-is-neither-true-or-false

The runtime (which includes the JIT) specifies that a bool 1-byte and is true (any non-zero bit pattern) or false (a zero bit pattern). You can see III.1.1.2 in ECMA 335 for more info.

The C# language specifies that a bool is 1-byte and is true (1) or false (0). You can see https://github.com/dotnet/roslyn/issues/24652 for more information.

OK, I have updated the code sample above.
AsByte now normalizes the bool's underlying byte value using the fastest routine I could think of.
Maybe there's an intrinsic for that.

Unfortunately that extra work means that it's about the same performance as branching now. Not sure if that still makes it worth doing, or if we should just incur the actual branch.

@tannergooding I am not sure what the C# contractual obligations are w.r.t. interop with other languages/platforms. Meaning the normalization code may or may not be required?

There's a faster way to normalize if you can have a JIT intrinsic.

or al, al
cmovnz al, 1

cmov isn't a branch and won't stall the pipeline.

@tannergooding, any movement on getting this approved?

This was briefly discussed in the last API review meeting and it was determined that we want to have a more in-depth discussion. We also want to have some more real-world scenarios and example usages for these APIs listed.

Example usages

Sure, would units suffice? My fork is close to 100% covered.

Real world usages

Based on your previous comment (below), and since I picked this up from up-for-grabs, I expected this was already well-known?

This is still a fairly large proposal and covers a number of items that we currently know have use-cases both in and outside the framework.

I expected this was already well-known?

Yes, it is just a matter of ensuring the information is collected and presented as part of the API review.

OK. Here are the implementation & units:

BitOps.cs (in corefx, I expect it will be in coreclr in actual PR)
BitOpsTests.cs (in the wrong folder, will fix in actual PR) - see first method BitOps_Samples for TLDR

Added sample callsites to spec.
@tannergooding will you be handling the evidence requirements?

I noticed a couple of method names were inconsistent in the spec vs discussion. Fixed.

will you be handling the evidence requirements?

It is on my backlog, but there are some higher priority items on my plate that need to be handled first.

@tannergooding / @jhudsoncedaron, I have a design question.

For functions like ExtractBit which ostensibly return a bool telling you whether the bit is set or not, would it be more useful in general to return a byte containing 1|0? The caller can easily convert that to a bool, but not the other way around. And a byte is more easily composable with other twiddling functions, without having to 'de-bool' it.

And for methods like PopCount, LeadingZeros, etc maybe a return type of byte would be better than int, since the caller can then use any receiving int type/width they prefer?

@grant-d : All my examples would be ExtractBit() wanting to return a bool and SetBit() wanting to pass a bool. The only other functions I really care about are RotateLeft() and RotateRight() which I ended up having to write myself as extension methods because they're easy to get wrong.

Taking/returning a bool is more desirable and will likely be better handled by the JIT (as, I think, will result ? 1 : 0).

Updated POC code to call into intrinsics where possible, eg

        public static byte PopCount(uint value)
        {
            if (System.Runtime.Intrinsics.X86.Popcnt.IsSupported)
            {
                return (byte)System.Runtime.Intrinsics.X86.Popcnt.PopCount(value);
            }

            // Software fallback...
        }

By the time (if) this ships, hopefully the instrinsics code will be non-preview.
Also see upcoming changes in: https://github.com/dotnet/coreclr/issues/20062

@jkotas pointed out that https://github.com/dotnet/coreclr/issues/21583 was a perf scenario that would be improved by having an intrinsic rotate, ie, something as proposed in this issue, plus JIT work to recognize it.

Note the naming convention used in the related PR, which should be considered in this issue.
TrailingZeroCount is used there, vs TrailingZeros proposed here.
https://github.com/dotnet/coreclr/pull/22118

namespace System
{
    internal static class BitOps // Note internal
    {
        public static int TrailingZeroCount(int matches)

@tannergooding is it not possible to stage this PR: just justify/merge the functions that don't need much justification, such as the rotates, and perhaps the balance of the trailing/leading overloads?

@grant-d Is that the right issue? It seems unrelated.

@GSPP fixed, thanks

is it not possible to stage this PR: just justify/merge the functions that don't need much justification, such as the rotates, and perhaps the balance of the trailing/leading overloads?

Not really, in all cases the APIs must be reviewed and approved before being made part of the public surface area. There is a chance that some smaller subsection of the APIs could be approved before the rest; but that still requires the same work to be done.

If someone wanted to help push this along, they could find the existing usages of proposed APIs above and move them to the currently internal BitOps class. This would help identify all the places we are currently using these APIs and how they are being used.

Motivation and duplicate implementations of the proposed methods.
Initial search was timeboxed to 1 hour, so there is likely to be more.

OTHER
Found quite a few more callsites whilst implementing the PR.
See the code for more details.

POPCNT
https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/BitArithmetic.cs
https://github.com/dotnet/corert/blob/cf5dc501e870b1efe8cba3d6990752538d174773/src/System.Private.CoreLib/shared/System/Buffers/Text/FormattingHelpers.CountDigits.cs
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1002
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1277

LZCNT
https://github.com/dotnet/corefx/blob/62c70143cfbb08bbf03b5b8aad60c2add84a0d9e/src/Common/src/CoreLib/System/Number.BigInteger.cs#L700
https://github.com/dotnet/coreclr/pull/22100
https://github.com/dotnet/coreclr/pull/19006
https://github.com/dotnet/corefx/blob/151a6065fa8184feb1ac4a55c89752342ab7c3bb/src/Common/src/CoreLib/System/Decimal.DecCalc.cs#L728
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1254 (LeadingOnes)

TZCNT
https://github.com/dotnet/coreclr/pull/22118
https://github.com/dotnet/corert/blob/db8723b1787e18f0124e94c387542d0ef3aac128/src/System.Private.CoreLib/shared/System/BitOps.cs#L16

ROTL/ROTR
https://github.com/dotnet/coreclr/issues/21583
https://github.com/dotnet/coreclr/pull/1830
https://github.com/dotnet/corert/blob/87e58839d6629b5f90777f886a2f52d7a99c076f/src/System.Private.CoreLib/src/System/Marvin.cs#L120-L124
https://github.com/dotnet/coreclr/issues/1619
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L972
https://github.com/dotnet/corert/blob/1bb85b989d9b01563df9eb83dce1c6a3415ce182/src/ILCompiler.ReadyToRun/src/Compiler/ReadyToRunHashCode.cs#L214
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/Common/src/Internal/NativeFormat/TypeHashingAlgorithms.cs#L20
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/Common/src/Internal/Text/Utf8String.cs#L47
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/Internal/Runtime/CompilerServices/OpenMethodResolver.cs#L178
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/System/RuntimeMethodHandle.cs#L67
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/ILCompiler.Compiler/src/Compiler/DependencyAnalysis/NodeFactory.NativeLayout.cs#L345
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.TypeLoader/src/Internal/Runtime/TypeLoader/TypeLoaderEnvironment.NamedTypeLookup.cs#L121
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/Internal/Runtime/CompilerServices/OpenMethodResolver.cs#L178
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/System/RuntimeFieldHandle.cs#L47
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.TypeLoader/src/Internal/Runtime/TypeLoader/TypeLoaderEnvironment.NamedTypeLookup.cs#L121

InsertBit
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1013
https://github.com/dotnet/corefx/blob/bd414c68872c4e4c6e8b1a585675a8383b3a9555/src/System.DirectoryServices.AccountManagement/src/System/DirectoryServices/AccountManagement/Utils.cs#L51
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L51
https://github.com/dotnet/wpf/blob/2cbb1ad9759c32dc575c7537057a29ee7da2e1b2/src/Microsoft.DotNet.Wpf/src/System.Xaml/System/Xaml/Schema/Reflector.cs#L363

ClearBit
https://github.com/dotnet/corefx/blob/bd414c68872c4e4c6e8b1a585675a8383b3a9555/src/System.DirectoryServices.AccountManagement/src/System/DirectoryServices/AccountManagement/Utils.cs#L46
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1314
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L45
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1020

ExtractBit
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L57
https://github.com/dotnet/wpf/blob/2cbb1ad9759c32dc575c7537057a29ee7da2e1b2/src/Microsoft.DotNet.Wpf/src/System.Xaml/System/Xaml/Schema/Reflector.cs#L334
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1294

ComplementBit
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1317
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L196

they could find the existing usages of proposed APIs

@tannergooding I have tried to find several examples of each, searching across all dotnet C# repos, eg: popcount language:C# user:dotnet.

Any other examples you're aware of would be welcome.

and move them to the currently internal BitOps class...

Did you mean I should create a PR (specifically keeping the class internal)?
Add the BitOps method, then update the callsite(s)

Did you mean I should create a PR (specifically keeping the class internal)?
Add the BitOps method, then update the callsite(s)

Yes, I think that would be reasonable. It should not only help centralize the code but should give us a clearer picture of who is using it and how.

Yes, I think that would be reasonable. It should not only help centralize the code but should give us a clearer picture of who is using it and how.

PR created: https://github.com/dotnet/coreclr/pull/22225

The first commit shows the unedited code copied _verbatim_ from the known call-sites.
It has not been changed, optimized or consolidated since it is meant to demonstrate duplication.

Subsequent commits clean up and consolidate the code. Units already written and will be merged later.

Trimmed down and cleaned-up the spec based on the above analysis.
Still some questions to answer (such as signed overloads or not) but it's now a much tighter surface area.

@grant-d, the original post gas gotten a bit large. Could you also provide a summary of the changes you made here (removals, additions, etc)

Per the changelog:

  • Moved TrailingOnes and LeadingOnes into 'to do later' proposal in comments
  • Added Log2
  • Removed redundant overloads

The last item is related to the new // comments through the spec regarding overloads for byte and ushort. For example, see RotateLeft in the spec where I emphasize the need for such overloads, but I removed such overloads from (eg) PopCount.
Also added the two new related sections in the spec; Operand Size & Sign.

Per discussion in https://github.com/dotnet/coreclr/pull/22333, changed return type of count-oriented methods (such as PopCount) to int instead of uint.
Also updated the POC code

@tannergooding I completed a set of coreclr PRs to:

  • Centralized & standardized implementation of several intrinsics (LZCNT, LOG2, TZCNT, ROTL/R, POPCNT)
  • Optimized software fallbacks
  • Updated all known callsites in coreclr
  • Added units tests for all BitOps methods
  • Note that BTR was _not_ approved, pending api design

I would like to update downstream callsites in corefx and corert as well as externals such as roslyn.
And of course expose them for public consumption.

But I _think_ I need a reference assembly for all those cases, right?
Please advise me of the next steps?

Updated spec to match findings of code analysis discussed above. There were several existing callsites that used byte overloads.

  • Added required byte overloads to all crud (extract, insert, clear, complement) methods.
    eg bool ExtractBit(byte value, int bitOffset)
  • Removed doc-comments so spec is easier to read
  • Added Scope section at top of spec
  • Separated calls into existing and proposed sections

@grant-d I would scope this issue down to just what we have in CoreLib and that is clearly useful. It will make the API review on this easy and it will allow us to make the most useful APIs public in .NET Core 3.0.

I would split the bit manipulation APIs into a separate issue that will be looked at and discussed separately, on its own schedule.

That's a great idea Jan. I had already split this into 2 asks in the summary, but I will create two separate issues instead

Lets get the subset from dotnet/corefx#35419 through first.

Tagging subscribers to this area: @tannergooding
Notify danmosemsft if you want to be subscribed.

I'd like to move forward and mark this as ready-for-review. However, at the very least BitOps needs to become BitOperations, as that is the name of the already approved class. CC @grant-d

It's worth noting that of the variants being exposed, byte doesn't have hardware support. The x86 hardware only accelerates 16, 32, and 64-bit variants. The proposal, as given, isn't exposing short, ushort, long, or ulong (nor sbyte), which I think is based on @jkotas' request to just trim this down to what S.P.Corelib is using, correct?

The native intrinsics return unsigned char (effectively bool) and take the first parameter by pointer, so I think taking by ref as a variant makes sense and fits in with how these are expected to be used (which is likely in a if (InsertBit(ref _value, 4)) like scenario). @CarolEidt, you raised some concerns around taking ref above, is that still the case given the non mutating variants?

  • If it is, what is the proposed mechanism to handle that _value is mutated and the original value needs to be returned, is it a value tuple like (int newValue, bool originalBitValue)?

which I think is based on @jkotas' request to just trim this down to what S.P.Corelib is using, correct?

The original proposal had methods like PopCount that were already approved and that we had no intrinsic patterns for.

I do not see value in having method for bit tests, etc. I believe that the existing pattern for bit tests like x & (1 << 5) are very concise and easy to read for anybody deals with bits.

The fact that MSVC C++ introduced an intrinsic for bit tests sounds like over-engineering to me. They could have picked that up as pattern match just fine.

They could have picked that up as pattern match just fine.

And I believe they do, but its the same fundamental problem as with RotateLeft and RotateRight, which is that there are a plethora of possible patterns and those may not be obvious, especially to novice/beginner programmers.

On the other hand, a method provides an explicit way to opt into the functionality, know that it will work (there probably isn't some edge case they are forgetting around bit manipulation), and that it will most likely be efficient for the given platform.
I believe this is why we have 13 functions across dotnet (which grant-d found, at least) covering these "concise and easy to read" patterns: https://github.com/dotnet/runtime/issues/27382#issuecomment-457718507. That is, they exist because the API is that much more concise, easy to read, and allows logic updates without finding everywhere you wrote the pattern.

It also solves the problem of matching a more complex pattern where you want to simultaneously branch based off the existing value of a bit while simultaneously setting it to a new value.

RotateLeft and RotateRight are fine with me. The pattern to write that using language operators is not concise, and there are multiple ways to write that.

I believe this is why we have 13 functions across dotnet (which grant-d found, at least) covering these "concise and easy to read" patterns

The most common ones like RotateLeft / RotateRight, PopCount, etc. are addressed already.

The remaining ones are the bit tests: InsertBit, ClearBit, etc. They show a lot less occurences. I have done a few searches for patterns like & (1 <<) or | (1 <<. Based on these searches, I believe that it is about 100x more common for people to use the these direct patterns than to wrap it in the helper method.

I'd like to move forward and mark this as ready-for-review. However, at the very least BitOps needs to become BitOperations, as that is the name of the already approved class. CC @grant-d

Thanks for the ping Tanner, updated the spec's name to BitOperations

FYI, here's my previous research on duplicate functions for each operation.
A year+ old so may be more (or less) now:
https://github.com/dotnet/corefx/issues/32269#issuecomment-457689128

eg, in ImmutableHashMap:
c# [Pure] private static uint InsertBit(int position, uint bits) { Debug.Assert(0 == (bits & (1u << position))); return bits | (1u << position); }

This is a good example: It asserts and sets the bit. We would not be able to replace it with the proposed method, without losing the debug validation.

A question: could be possible to include optimized array/span based versions? Use case is bitmask of arbitrary length backed by array of uints (likely, like BigInteger but this class isnt specially optimized for bitmasking and doesnt expose setting bits by index/position).

If theres no extra gains (like even more vectorization) then forget my question, it will be better to simply implement BitMask on top of current API proposal.

@CarolEidt, you raised some concerns around taking ref above, is that still the case given the non mutating variants? If it is, what is the proposed mechanism to handle that _value is mutated and the original value needs to be returned, is it a value tuple like (int newValue, bool originalBitValue)?

IMO the best API for these cases is a value tuple. However, admittedly I haven't yet had the time to get my PR https://github.com/dotnet/runtime/pull/37928 working, which will enable the best codegen for these cases.

If the ref API is preferred, the JIT can still take one of two avenues to achieve the same result:

  • Implement an internal intrinsic that returns a value tuple, and have the ref version call that.
  • Implement the ref API by effectively creating a value tuple in the JIT. However, this approach requires either creating a new struct type in the JIT, which is likely to be significantly more complicated.

could be possible to include optimized array/span based versions?

I would think that would be doable fully in C#, if it was considered desirable.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matty-hall picture matty-hall  路  3Comments

jzabroski picture jzabroski  路  3Comments

GitAntoinee picture GitAntoinee  路  3Comments

yahorsi picture yahorsi  路  3Comments

jchannon picture jchannon  路  3Comments