Bit manipulation routines are common enough that we should expose a subset as platform primitives.
While some of them may be simple to write, it's much harder to achieve the requisite performance desired, especially since they tend to be used within tight loops.
The aim of this proposal is to scope & design a minimal set of functions, to be implemented with a bias towards performance. Per @tannergooding
The point of these APIs is to provide a general-purpose API that works on all platforms (which means providing a software fallback) and is generally-usable. Hardware Intrinsics are for performance oriented scenarios where you require hardware acceleration and need more direct control of the code that is emitted.
Note that even though some of the formula may be simple, relevant callsites are more self-documenting when using the intrinsics (should the dev _choose_ to use them).
CoreCLR
(https://github.com/dotnet/coreclr/pull/22584)CoreFX
(https://github.com/dotnet/corefx/pull/35193)System.Numerics.BitOperations
as a public
ref assembly in CoreFX
(https://github.com/dotnet/corefx/issues/35419)CoreFX
(https://github.com/dotnet/corefx/pull/34917)The proposed functions are already implemented throughout the stack, often with different algorithms, performance characteristics and test coverage.
Existing callsites below: https://github.com/dotnet/corefx/issues/32269#issuecomment-457689128
(There is likely to be more; the initial search was timeboxed to ~1 hour)
Some of the implementation have suboptimal performance or bugs. Something like ExtractBit
is trivial to implement, but PopCount
is more complex and thus prone to logic and performance issues. Hiding these complex formulae behind friendly signatures makes using them more approachable.
Here's an example of a function (BTC
) whose signature is simple but the algebra is easy to get wrong.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool ComplementBit(ref uint value, int bitOffset)
{
uint mask = 1u << bitOffset;
bool btc = (value & mask) != 0;
value = ~(~mask ^ value);
return btc;
}
However making a call to it meets our goal of abstraction and performance:
uint value = 123;
bool previouslyTrue = BitOperations.ComplementBit(ref value, 6);
The proposed API is purposefully kept lean. We can add more methods in later design iterations. We should view this as an opportunity to get simple, base functionality out the door and not stray into the dangerous territory of adding every bit twiddling hack that exists.
Assume all methods are decorated with [MethodImpl(MethodImplOptions.AggressiveInlining)]
public static class BitOperations
{
// BT
bool ExtractBit(byte value, int bitOffset); // Could name this BitTest or TestBit
bool ExtractBit(uint value, int bitOffset);
bool ExtractBit(int value, int bitOffset);
// BTS (scalar)
byte InsertBit(byte value, int bitOffset); // BitSet or SetBit
uint InsertBit(uint value, int bitOffset);
int InsertBit(int value, int bitOffset);
// True BTS (returns original value)
bool InsertBit(ref byte value, int bitOffset);
bool InsertBit(ref uint value, int bitOffset);
bool InsertBit(ref int value, int bitOffset);
// BTR
byte ClearBit(byte value, int bitOffset); // BitReset or ResetBit
uint ClearBit(uint value, int bitOffset);
int ClearBit(int value, int bitOffset);
bool ClearBit(ref byte value, int bitOffset);
bool ClearBit(ref uint value, int bitOffset);
bool ClearBit(ref int value, int bitOffset);
// BTC
byte ComplementBit(byte value, int bitOffset);
uint ComplementBit(uint value, int bitOffset);
int ComplementBit(int value, int bitOffset);
bool ComplementBit(ref byte value, int bitOffset);
bool ComplementBit(ref uint value, int bitOffset);
bool ComplementBit(ref int value, int bitOffset);
// on ? BTS : BTR
byte WriteBit(byte value, int bitOffset, bool on);
uint WriteBit(uint value, int bitOffset, bool on);
int WriteBit(int value, int bitOffset, bool on);
bool WriteBit(ref byte value, int bitOffset, bool on);
bool WriteBit(ref uint value, int bitOffset, bool on);
bool WriteBit(ref int value, int bitOffset, bool on);
}
InsertBit
/ClearBit
and WriteBit
where the latter conditionally executes the equivalent of either former (using twiddling to do so without branching). But there's enough twiddling in Write
to avoid any branching that it's maybe worth keeping both variants.exception
s. The current design tries to dispense with the requirement by sharpening input types, contractual assumptions (eg out-of-bounds offset
will use mod n
in some functions or result in a no-op
in others) and specific design choices.int
, uint
and byte
are commonly used. Anything else?namespace
this should be in. Decision: namespace System.Numerics
.BitOps
is self-documenting, terse (it might be specified frequently in a callsite, if not aliased with using
) and both terms are well-known names or abbreviations. Decision: BitOperations
PopCount
could be called CountSetBits
but that's not the common lingo used by twiddlers. Furthermore, intrinsics already expose the well-known names, eg Popcnt.PopCount
.TrailingZeroCount
may be more concisely described as TrailingZeros
. Decision: TrailingZeroCount
already chosen by a previous PR.PopCount
return (idiomatic) int
, not uint
offset
or position
parameters be int
or uint
. Latter is preferred since negatives not permitted regardless. Decision: int is an idiomatic input/output type in C#.Log(0)
is mathematically undefined. Should it return 0
or -1
? Decision: Returns 0LeadingZeros
_might_ need a different algorithm on BE/LE platforms. Decision: Endianess only matters if we are reinterpreting integers.The following samples are taken from the linked units, from the method BitOps_Samples
.
The code chooses values that are easy to eyeball for correctness. The real units cover many more boundaries & conditions.
// ExtractBit: Reads whether the specified bit in a mask is set.
Assert.True(BitOps.ExtractBit((byte)0b0001_0000, 4));
Assert.False(BitOps.ExtractBit((byte)0b0001_0000, 7));
// InsertBit: Sets the specified bit in a mask and returns the new value.
byte dest = 0b0000_1001;
Assert.Equal(0b0010_1001, BitOps.InsertBit(dest, 5));
// InsertBit(ref): Sets the specified bit in a mask and returns whether it was originally set.
Assert.False(BitOps.InsertBit(ref dest, 5));
Assert.Equal(0b0010_1001, dest);
// ClearBit: Clears the specified bit in a mask and returns the new value.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.ClearBit(dest, 3));
// ClearBit(ref): Clears the specified bit in a mask and returns whether it was originally set.
Assert.True(BitOps.ClearBit(ref dest, 3));
Assert.Equal(0b0000_0001, dest);
// ComplementBit: Complements the specified bit in a mask and returns the new value.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.ComplementBit(dest, 3));
// ComplementBit(ref): Complements the specified bit in a mask and returns whether it was originally set.
Assert.True(BitOps.ComplementBit(ref dest, 3));
Assert.Equal(0b0000_0001, dest);
// WriteBit: Writes the specified bit in a mask and returns the new value. Does not branch.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.WriteBit(dest, 3, on: false));
// WriteBit(ref): Writes the specified bit in a mask and returns whether it was originally set. Does not branch.
Assert.True(BitOps.WriteBit(ref dest, 3, on: false));
Assert.Equal(0b0000_0001, dest);
InsertBit
accepted a bool that determined whether it set or cleared the bit in question. Such methods have now been refactored in two, InsertBit
and ClearBit
.FlipBit
became ComplementBit
).int
instead of byte
. All POC units pass.WriteBit(value, offset, bool on)
overloads that conditionally set/clear the specified bit.ExtractByte
and friendsEvaluate(bool)
(used internally, so may as well expose it)Span<T>
overloads per @tannergooding advice. Will maybe submit in separate proposalTrailingOnes
and LeadingOnes
into to do later proposal in commentsLog2
RotateByte
, etc into to do later proposal in commentscc @mburbea, @tannergooding , @CarolEidt
Please drop the in
modifier from the parameters. These are all operating on primitives and passing them around by reference is going to be generally less efficient than just passing them by-value (which should happen in register
).
Additional Comments/Thoughts:
InsertBit(value, offset, on)
might be better as InsertBit(value, offset)
and ResetBit(value, offset)
(or ClearBit
, ZeroBit
, etc).LeadingCount
and TrailingCount
are likely better split into LeadingZeroCount
, TrailingZeroCount
, LeadingOneCount
, and TrailingOneCount
NegateBit
may be more "better" than FlipBit
(or ComplementBit
)System.BitOps
seems appropriate as a namespace (and lets this live next to BitConverter
). Especially since several of these may get used in the Runtime/Framework itselfI'm not sure I like the methods taking a bool.
I went back and forth on that one. The bool makes the api tighter. But it means that ultimately the implementation has branching instructions, which means perf suffers. So I am happy to refactor.
Whether or not these methods take a ref and return the original value of the bit or just return the modified value will likely need more discussion (possibly just in the API design review).
I designed this based on a previous suggestion of yours, but I am happy to revert. Perhaps you meant ExchangeBit
(BTS/BTR,BTC) to be a separate additional methods? Otherwise we could return a tuple but that has more overhead.
I think NegateBit may be more "better" than FlipBit (or ComplementBit)
I will make that change
@grant-d
public static bool InsertBit(ref uint value, byte offset);
Will JIT be able to optimize it properly if the value is an enregistered local?
Like in this code:
C#
int M(uint value)
{
return InsertBit(in value, 3) ? 5 :6;
}
It needs to be a ref
, ie:
private static int M(uint value) => InsertBit(ref value, 3, true) ? 5 : 6;
I don't have enough knowledge of what the optimizer would do, someone else may be able to advise us.
Um, ExtractBit
takes byte
and ushort
, but neither of those are present for InsertBit
/ClearBit
(instead there's int
and long
)?
These need to be passed by value. Passing things around by ref
or in
will make it more difficult to optimize in the long run. If you need a variant that both modifies the value and returns whether a bit was set, it should be a different variant, and the "normal" form of InsertBit
should just return the modified value.
Sounds good, I have made the change. @tannergooding, let me know if you think the BTS
/BTR
/BTC
variants need to be in the first deliverable.
Will JIT be able to optimize it properly
It will now - I have changed all ref
type methods to be scalar functions; see notes in the spec.
API review probably won't like the contraction BitOps
. BitOperations
is not so bad or possibly Bits
even...
I like Bits
, I have added all of these to the naming suggestions.
If you need a variant that both modifies the value and returns whether a bit was set, it should be a different variant.
Done - both variants exist.
Note to self:
Span
overloads: https://github.com/dotnet/corefx/issues/32184using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
This is pointless unless you also the attribute for AggressiveInlining as the function call overhead will eat your gains otherwise. Therefore, AggressiveInlining should be contractual on these.
AggressiveInlining should be contractual
The POC already does so. I will add a note to the spec
This is pointless unless you also the attribute for AggressiveInlining as the function call overhead will eat your gains otherwise.
Many of these functions will be treated as intrinsic and will be specially handled by the JIT to emit a single instruction, regardless of what the actual method body contains for the software fallback.
The JIT is also very good at inlining small functions, regardless of whether or not the AggressiveInlining
attribute exists.
InsertBit(value, offset, on)
is needed because sometimes the on
value is dynamic. There should be separate SetBit
and ClearBit
methods.
@GSPP can this not be handled from the callsite.
if (on) SetBit(i) else ClearBit(i)
I am trying to keep the API surface as concise as possible, let me know if you have a compelling reason
@grant-d: InsertBit can be implemented branchlessly on any reasonably CPU.
InsertBit is already implemented without branching, see POC code.
The question was whether there could be a mutator that accepts a bool, then conditionally inserts or extracts. My preference is to externalize branching (at the callsite).
And I'm saying the dynamic set-clear version can be implemented w/o branching.
ulong mask = 1 << i;
ulong bit = (ulong)on << i;
return (value & ~mask) | bit;
Got you. Thanks for the gist. I still am curious whether the 'dynamic' use case is compelling enough to have a somewhat overlapping API?
@jhudsoncedaron, just because you can implement that in software, doesn't necessarily mean it is (necessarily) a good API to expose here or that it will be efficient.
Many of these functions will be treated as intrinsic, which means that they will will compile down to a single CPU instruction (rather than 5+ instructions). As an example, we can use the BSF
(Bit Scan Forward), BSR
(Bit Scan Reverse), BT
(Bit Test), BTC
(Bit Test and Complement), BTR
(Bit Test and Reset), BTS
(Bit Test and Set), LZCNT
(Count the Number of Leading Zero Bits), and POPCNT
(Return the Count of Number of Bits Set to 1) instructions on x86/x64 to make these super efficient.
Also, I don't think you can convert a bool
to ulong
. You will get error CS0030: Cannot convert type 'bool' to 'ulong'
I only ever use the dynamic version of InsertBit. That might not be valid C# but the algorithm is correct anyway. bools are stored as either a 1 or a 0 so it's just a matter of convincing the compiler to emit the right code. The harder this is to do, the stronger the argument for putting it in the library.
I tend to agree with @jhudsoncedaron that the dynamic version is useful. For this particular case if we really wanted to minimize API surface area I might vote for just the dynamic case, since I would expect the value to be either clearly a constant or not. And it would be easier for the jit to make that optimization than to recognize a sequence with branches.
[Edited]
I assumed the gist was pseudocode; in C# we can do one of the following.
Micro benchmarking shows better perf (0.6x) than the idiomatic expression.
return condition ? 37 : 0; // 1.0x (idiomatic)
return (*(byte*)&condition) * 37; // 0.95x (cannot be inlined)
return Unsafe.As<bool, byte>(condition) * 37; // 0.60x
return new UnionStruct { Bool = condition }.Value * 37; // 0.76x (ExplicitLayout, etc)
This relies on internal knowledge of bool
but I guess that's no different to the rest of twiddling where we rely on internal knowledge of bit layouts? I recall (perhaps wrongly) something about VB (maybe v6, maybe .Net) using -1
for false
. But since this is ultimately CLI, I guess that's not a concern.
That being said. I started the spec out with the dynamic
signature, and followed a suggestion to move to explicit
. I don't mind one or the other (or both) but we need a decision here. Maybe I should just add both and we can see which one gets downvoted?
VB .NET did in fact use -1 for booleans; however this technique would reveal 1 anyway. It really wrote 1 to the underlying type to avoid driving the framework code bonkers. (Go ahead, use this unsafe to make a bool with a value of 3 and watch things fail in amusing ways.)
[Edited]
OK, added conditional overrides to the spec.
Here's the POC implementation.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte WriteBit(byte value, int offset, bool on)
{
int shft = offset & 7;
uint mask = 1U << shft;
uint onn = Unsafe.As<bool, uint>(on); // true ? 1 : 0
onn <<= shft;
return (byte)((value & ~mask) | onn);
}
Added ExtractByte
and friends
@tannergooding
Many of these functions will be treated as intrinsic and will be specially handled by the JIT to emit a single instruction, regardless of what the actual method body contains for the software fallback.
True, but you are assuming a single JIT (Microsoft) and also that the caller function didn't trip on a particular (warranted or not) limit to inlining. There is actually no shortcoming for contractually ensure Bits
methods must be inlined; as implementing 90%+ of those methods on a loop without inlining would kill you performance wise.
There's an ambiguity in some of these signatures that I am trying to get my head around.
It results in corrupt data.
TLDR
Some methods are not safe under implicit upcast operations. This can lead to ambiguity and corruption.
For example foo(123u) != foo((byte)123)
.
Suggestion is to name all such methods explicitly to avoid said issue:
foo8(), foo16(), foo32(), foo64()
Detail
For example, I want to rotate
a byte
value left
. I have a byte
in hand, and pass it to the function:
byte foo = 0b_1000_0001;
var result = RotateLeft(foo, 2); // = 0000_0110
Yay, that worked. A few milestones later, but this time I still have a byte
value but I happen to have it stored in a uint
. Maybe it was a parameter originally declared as byte
and someone else changed it. Or maybe that's how it came off the wire or protobuf. Or maybe I am being careful about alignment. Either way I know it's a byte and thus expect byte semantics.
uint foo = 0b_1000_0001; // Same byte value, upcast to uint
// Implicit upcast - my callsite did not change:
var result = RotateLeft(foo, 2); // = 0000_0010_0000_0100
Different answer. Ultimately silent corruption.
Of course the easy workaround is:
uint foo = 0b_1000_0001;
var result = RotateLeft((byte)foo, 2); // = 0000_0110
But that means I will need to use this pattern at _all_ my callsites. That is untenable.
It seems the shape of this kind of method makes it easy to call the wrong overload, despite the niceties of having a method group. (Note that all 4 overloads are necessary because they behave differently, and furthermore we are trying to avoid throw
ing exceptions in guard clauses.)
Suggestion: Maybe we should distinguish such methods explicitly, something like:
uint foo = 0b_1000_0001; // Unsigned integer of some bitsize
var result1 = RotateLeft8(foo, 2); // May support signed overloads too, so suffix is `8` not `UInt8`
var result2 = RotateLeft16(foo, 2); // A nice side effect is terser names
var result3 = RotateLeft32(foo, 2); // See next code block
var result4 = RotateLeft64(foo, 2);
This actually woks nicely should we decide to support signed integers:
int foo = 0b_1000_0001; // Signed integer of some bitsize
var result1 = RotateLeft8(foo, 2); // Suffix of '8' works nicely for byte|sbyte
var result2 = RotateLeft16(foo, 2);
var result3 = RotateLeft32(foo, 2);
var result4 = RotateLeft64(foo, 2);
Note that this does _not_ apply to all method groups. For example:
byte foo = 0b_1000_0001;
var result1 = PopCount(foo); // = 2
var result2 = PopCount((ulong)foo); // = 2
Implicit upcast is safe here; we get the same answer both ways.
Here's a different example. Once again I have a byte
in hand:
byte foo = 0b_1000_0000;
// By convention, shift operators are mod-bitsize, so 15 is exactly equivalent to 15 % 8 == 7
var result = ExtractBit(foo, 15); // == 1
// In other words this is exactly equivalent to calling:
result = ExtractBit(foo, 7); // == 1
Now the parameter changes or whatever, and I now have a byte
by happenstance stored in a uint
:
uint foo = 0b_1000_0000;
// Here, 15 is exactly equivalent to 15 % 32 == 15
var result = ExtractBit(foo, 15); // == 0
We get a different answer.
Per previous suggestion, we'd need to distinguish these methods too:
uint foo = 0b_1000_0000;
var result1 = ExtractBit8(foo, 15); // May support signed overloads too, so suffix is 8 not UInt8
var result2 = ExtractBit16(foo, 15);
var result3 = ExtractBit32(foo, 15);
var result4 = ExtractBit64(foo, 15);
Considering the question on Big/Little Endian. If we ship this api shape as-is (which is only LE), we are not future proof in consideration of one day deciding to support BE. An alternative to the proposed design is to follow the pattern employed in classes like Encoding
where we have an abstract
base class that contains (in this case) two singletons, ostensibly named LE
and BE
:
public abstract class BitOps
{
public static BitOps BE { get; }
public static BitOps LE { get; }
protected BitOps()
{ }
}
Since the number of overloads is so high (explosion due to byte
, ushort
, uint
, ulong
as well as Span<T>
combinatorials), perhaps we can extend this idea even further:
public abstract class BitOps<T> where T : unmanaged
{
public static BitOps<byte> ByteBE { get; }
public static BitOps<ushort> UInt16BE { get; }
public static BitOps<uint> UInt32BE { get; }
public static BitOps<ulong> UInt64BE { get; }
public static BitOps<sbyte> SByteBE { get; }
public static BitOps<short> Int16BE { get; }
public static BitOps<int> Int32BE { get; }
public static BitOps<long> Int64BE { get; }
}
@grant-d : I don't get it. Aren't all of these operating on integers of the appropriate size? Shouldn't those already be in the correct endian?
[Edited]
My concern (perhaps unwarranted, please correct me) is that when inserting an int
into, say, a Span<byte>
then LE/BE may be an issue?
Also, for functions like LeadingZeros
, does BE/LE not affect the algorithm since assumptions about the bit layout are perhaps invalid.
I see it now. Because the JIT can't inline function calls it can't statically resolve, I think we'd be much better off with
public static BitOps<byte> ByteBE { get; }
public static BitOps<ushort> UInt16BE { get; }
public static BitOps<uint> UInt32BE { get; }
public static BitOps<ulong> UInt64BE { get; }
public static BitOps<sbyte> SByteBE { get; }
public static BitOps<short> Int16BE { get; }
public static BitOps<int> Int32BE { get; }
public static BitOps<long> Int64BE { get; }
and use #if
at compile time of .NET Core to pick the correct implementation.
(sorry; I copied the wrong code block; this was intended to be the ExtractInt16, InsertInt16, etc. functions)
I am leaning that way too. Though I am not sure about the #if
.
Is there a use-case for being on a LE platform but wanting to do BE operations. Perhaps because it's network format, or transpilation, or ...
Correct; when messing with file formats we want the file format's bit order. Lots of file formats are in BE order now because classically we had htons/l/q to ensure BE order but no functions to ensure LE order.
Almost no operation out there in the framework is Endian aware AFAIK with the exception of a few HostToNetwork usually in networking classes.
Encoding the endianness into the method is unseen on the framework as far as I can remember. (That being said, I wouldn't start now if it would be my decision). IMHO it is the 'implementors' burden to deal with differences in endianess... because as you dont touch the disk or/the network there is really no difference for you. So it is an interoperation thing only.
@redknightlois : I guess you don't remember the BitConverter madness when mono was ported to Alpha. It turned out almost all the calls to BitConverter in application code wanted a specific endian rather than machine endian.
Yes, endianess would be a concern. However, only the Span<T>
operations really have to worry about it, which probably means it should be a parameter?
On that note:
Suggestion: Maybe we should distinguish such methods explicitly, something like:
uint foo = 0b_1000_0001; var result1 = RotateLeft8(foo, 2); // May support signed overloads too, so suffix is 8 not UInt8 var result2 = RotateLeft16(foo, 2); // A nice side effect is terser names var result3 = RotateLeft32(foo, 2); var result4 = RotateLeft64(foo, 2);
The problem I have for this is that it doesn't completely solve your problem, it just spins a new one: what happens if there are bits set in the upper registers? There's a discoverability issue - the method may sound like it rotates bits by 8, with an offset. Also, my immediate reaction is, "why is it only ever the lowest n?" - I think I want something like:
uint RotateWindowLeft(uint value, int bits, int start_index = 0, int window_length = 32);
Yay, that worked. A few milestones later, but this time I still have a byte value but I happen to have it stored in a uint. Maybe it was a parameter originally declared as byte and someone else changed it. Or maybe that's how it came off the wire or protobuf. Or maybe I am being careful about alignment. Either way I know it's a byte and thus expect byte semantics.
uint foo = 0b_1000_0001; // Same byte value, upcast to uint var result = RotateLeft(foo, 2); // = 0000_0010_0000_0100
Different answer. _Ultimately silent corruption._
(Emphasis mine)
...uh, shouldn't your unit tests catch that kind of thing?
Of your mentioned reasons for the change, only stored alignment strikes me as particularly relevant - and at that, that you're overriding/ignoring the native alignment behavior of the framework. At which point, I'm more likely to want to be casting on each operation anyways, since any values outside the 'real' size should be invalid, and should be truncated.
If this is going to go to API review, it needs to stop being modified. The larger the proposal is, the less likely it will get reviewed in normal triage and the more likely it will get pushed back to a special triage session (which can take much longer to schedule/review/approve/etc).
I would say that, the following should be considered out of scope and should be handled in a new/separate proposal:
Evaluate
IsPowerOf2
Parity
Span<T>
, rather than a primitiveThis leaves:
This is still a fairly large proposal and covers a number of items that we currently know have use-cases both in and outside the framework.
These, for the most part, are "intrinsic" operations that most (modern) platforms support as a single operation or can be sufficiently accelerated.
These do not have to worry about any special semantics, such as whether it is big/little endian or how it operates on larger data inputs
Thanks @tannergooding , that gives me relief. The surface has been growing and I was not sure how to scope it back down. I will trim the spec.
[Edit] Done
For posterity, here's the trimmed-out spec that we can maybe submit it in a later proposal.
Note that Evaluate
(Edit: now called If
) is used internally, so it's useful to expose for external use too.
public static partial class BitOps // .Span and sundry
{
/// <summary>
/// Count the number of leading one bits in a mask.
/// </summary>
/// <param name="value">The mask.</param>
byte LeadingOnes(byte value);
// For brevity, ushort|uint|ulong overloads not shown
/// <summary>
/// Count the number of trailing one bits in a mask.
/// </summary>
/// <param name="value">The mask.</param>
byte TrailingOnes(byte value);
// For brevity, ushort|uint|ulong overloads not shown
/// <summary>
/// Converts a bool to an integer value, without branching.
/// Returns <paramref name="trueValue"/> if True, else returns <paramref name="falseValue"/>.
/// </summary>
/// <param name="condition">The value to convert.</param>
/// <param name="trueValue">The value to return if True.</param>
/// <param name="falseValue">The value to return if False.</param>
uint Iff(bool condition, uint trueValue, uint falseValue);
uint Iff(bool condition, uint trueValue);
// For brevity, int|long|ulong overloads not shown
// byte and ushort overloads are redundant
// Benchmark: 0.64 cost of idiomatic bool expression (condition ? 1 : 0)
byte AsByte(bool condition); // Normalized (see notes). Naming TBD
byte AsByte(ref bool condition);
/// <summary>
/// Returns 1 if <paramref name="value"/> is non-zero, else returns 0.
/// Does not incur branching.
/// Logically equivalent to CMOVNZ.
/// </summary>
/// <param name="value">The value to inspect.</param>
public static byte Any(ushort value);
// For brevity, short|int|uint|long|ulong overloads not shown
/// <summary>
/// Returns 1 if <paramref name="value"/> is zero, else returns 0.
/// Does not incur branching.
/// Logically equivalent to CMOVZ.
/// </summary>
/// <param name="value">The value to inspect.</param>
public static byte NotAny(ushort value);
// For brevity, short|int|uint|long|ulong overloads not shown
/// <summary>
/// Returns 1 if the bit count is odd, else 0.
/// Logically equivalent to PopCount mod 2.
/// </summary>
/// <param name="value">The value.</param>
int Parity(uint value);
int Parity(ulong value);
// byte and ushort overloads are redundant
/// <summary>
/// Returns True if the value is a power of 2, else False.
/// </summary>
/// <param name="value">The value.</param>
bool IsPowerOf2(uint value);
bool IsPowerOf2(ulong value);
// byte and ushort overloads are redundant
/// <summary>
/// Extracts a byte value from a longer integer.
/// </summary>
/// <param name="value">The bit mask.</param>
/// <param name="bitOffset">The ordinal position to read.</param>
byte ExtractByte(ushort value, int bitOffset);
byte ExtractByte(uint value, int bitOffset);
byte ExtractByte(ulong value, int bitOffset);
/// <summary>
/// Inserts a byte value into a longer integer.
/// </summary>
/// <param name="value">The bit mask.</param>
/// <param name="bitOffset">The ordinal position to write.</param>
/// <param name="insert">The value to insert.</param>
ushort InsertByte(ushort value, int bitOffset, byte insert);
uint InsertByte(uint value, int bitOffset, byte insert);
ulong InsertByte(ulong value, int bitOffset, byte insert);
ushort ExtractUInt16(uint value, int bitOffset);
ushort ExtractUInt16(ulong value, int bitOffset);
uint InsertUInt16(uint value, int bitOffset, ushort insert);
ulong InsertUInt16(ulong value, int bitOffset, ushort insert);
uint ExtractUInt32(ulong value, int bitOffset);
ulong InsertUInt32(ulong value, int bitOffset, uint insert);
/// <summary>
/// Reads whether the specified bit in a mask is set.
/// </summary>
/// <param name="value">The mask.</param>
/// <param name="offset">The ordinal position of the bit to read.</param>
bool ExtractBit(ReadOnlySpan<byte> value, uint offset);
// For brevity, ushort|uint|ulong overloads not shown
/// <summary>
/// Sets the specified bit in a mask and returns whether the bit was originally set.
/// </summary>
/// <param name="value">The mask.</param>
/// <param name="bitOffset">The ordinal position of the bit to set.</param>
bool InsertBit(Span<byte> value, uint bitOffset);
// For brevity, ushort|uint|ulong overloads not shown
/// <summary>
/// Clears the specified bit in a mask and returns whether the bit was originally set.
/// </summary>
/// <param name="value">The mask.</param>
/// <param name="bitOffset">The ordinal position of the bit to clear.</param>
bool ClearBit(Span<byte> value, uint bitOffset);
// For brevity, ushort|uint|ulong overloads not shown
/// <summary>
/// Complements the specified bit in a mask and returns whether the bit was originally set.
/// </summary>
/// <param name="value">The mask.</param>
/// <param name="bitOffset">The ordinal position of the bit to complement.</param>
bool ComplementBit(Span<byte> value, uint bitOffset);
// For brevity, ushort|uint|ulong overloads not shown
/// <summary>
/// Reads the specified byte from a span, given the bit offset.
/// </summary>
/// <param name="span">The span.</param>
/// <param name="bitOffset">The ordinal position to read.</param>
byte ExtractByte(ReadOnlySpan<byte> span, int bitOffset);
byte ExtractByte(ReadOnlySpan<ushort> span, int bitOffset);
byte ExtractByte(ReadOnlySpan<uint> span, int bitOffset);
byte ExtractByte(ReadOnlySpan<ulong> span, int bitOffset);
ushort ExtractUInt16(ReadOnlySpan<byte> span, int bitOffset);
// For brevity, ushort|uint|ulong overloads not shown
uint ExtractUInt32(ReadOnlySpan<byte> span, int bitOffset);
// For brevity, ushort|uint|ulong overloads not shown
ulong ExtractUInt64(ReadOnlySpan<byte> span, int bitOffset);
// For brevity, ushort|uint|ulong overloads not shown
/// <summary>
/// Writes the specified value to a span, given the bit offset, and returns the original value.
/// </summary>
/// <param name="span">The span.</param>
/// <param name="bitOffset">The ordinal position to write.</param>
/// <param name="value">The value to write.</param>
byte InsertByte(Span<byte> span, int bitOffset, byte value);
byte InsertByte(Span<ushort> span, int bitOffset, byte value);
byte InsertByte(Span<uint> span, int bitOffset, byte value);
byte InsertByte(Span<ulong> span, int bitOffset, byte value);
byte InsertUInt16(Span<byte> span, int bitOffset, ushort value);
// For brevity, ushort|uint|ulong overloads not shown
byte InsertUInt32(Span<byte> span, int bitOffset, uint value);
// For brevity, ushort|uint|ulong overloads not shown
byte InsertUInt64(Span<byte> span, int bitOffset, ulong value);
// For brevity, ushort|uint|ulong overloads not shown
// Alternative pattern that mitigates throwing OutOfRange exceptions
bool TryInsertByte(Span<byte> span, int bitOffset, byte value);
bool TryExtractByte(ReadOnlySpan<byte> span, int bitOffset, out byte value);
// Since operand size DOES affect result, it would be a breaking change to add these later.
// eg ROTL32(123u) != ROTL8((byte)123)
// Unless we (later) add them with a distinguished name, eg RotateLeft8, RotateLeft16
byte RotateLeft(byte value, int bitOffset);
ushort RotateLeft(ushort value, int bitOffset);
byte RotateRight(byte value, int bitOffset);
ushort RotateRight(ushort value, int bitOffset);
int LeadingZeroCount(byte value);
int LeadingZeroCount(ushort value);
}
ReadOnlySpan
for getters, and Span
for mutators. Not sure if we need to explicitly support Span
for getters too; I would hope not.TryRead/Write
pattern. Versus the current shape which is Read/Write
but then has to throw
for OutOfRange conditions. In other words, since this is a perf-sensitive feature, using Try
helps avoid exception
s.LeadingZeros(new byte[50]{ }) == 400
. Versus overloads that index into a specific byte. Suggestion: No.RotateLeft
being ambiguous.Span<byte>
first? byte ExtractByte(ReadOnlySpan<byte> span, int bitOffset);
// Redundant?
// Caller can use MemoryMarshal.AsBytes(span) first, though they then have
// to take care of the offset math themselves. Unless we provide an offset helper.
byte ExtractByte(ReadOnlySpan<ushort> span, int bitOffset);
byte ExtractByte(ReadOnlySpan<uint> span, int bitOffset);
byte ExtractByte(ReadOnlySpan<ulong> span, int bitOffset);
BitConverter
has already been span-ified, and as such has an overlapping api surface. For example, bool TryWriteBytes(Span<byte> destination, long value)
is similar to signatures here....shouldn't your unit tests catch that kind of thing?
I agree with the some of your feedback, but the above is a bad developer experience. If I call Math.Max
and incur an unexpected implicit upcast, the result typically doesn't change. But with this specific set of functions, it does. Unfortunately, not all consumers of .Net write units.
That design is not the pit of success.
...what implicit upcast? Each of the primitive types has its own copy of RotateLeft
, so it would return the same type as was passed in. If you're worried about the cast when assigning back to the same variable, Math.Max
would run into the same problem.
No I am not worried about the re-assignment. I am worried that in terms of overload resolution, the result feels a little non-deterministic. I will try another example:
void Foo(byte n) // <-- Note byte param
{
// a mile of code
var x = RotateLeft(n, 17); // Produces 123
var y = Math.Min(n, 17); // Produces 17
}
Sometime later, someone else modifies the signature:
void Foo(ulong n) // <-- Note byte became uint or ulong
{
// a mile of code
var x = RotateLeft(n, 17); // Produces 456 <-- Different result
var y = Math.Min(n, 17); // Produces 17 <-- Same result
}
Calling RotateLeft32(n, 2)
would have produced the same correct result deterministically.
Encoding the size of the type on the signature in cases where they lead to inconsistent results is not unheard of. IMHO it is a sensible tradeoff.
I would say that we leave the APIs as is (all called Rotate
) and that it can be discussed further during the design-review.
Changing the signature of an API (from uint
to ulong
, for example) can naturally cause downstream breaks and BitOps
would be, by far, not the only thing that has a potential for different behavior.
OK, I will leave the signatures as-is.
Changing the signature of an API (from uint to ulong, for example) can naturally cause downstream breaks and BitOps would be, by far, not the only thing that has a potential for different behavior.
Oops, bad example. My concern is with _implicit_ integer upcasts. Changed my code sample above to use byte
instead of uint
, since it's less of a problem between uint
and ulong
(for which I agree with your statement that's it's an established paradigm)
FYI. Following is the POC implementation of Iff
(_was_ called Evaluate
).
It's already used in internal
code; it may be useful to expose as public
?
[Edited]
// Converts a boolean to a byte value without branching.
// Assume AggressiveInlining
public static byte AsByte(ref bool condition)
{
int val = Unsafe.As<bool, byte>(ref condition); // CLR permits 0..255
// Normalize bool's underlying value to 0|1
val = -val; // Negation will set sign-bit iff non-zero
val >>= 31; // Send sign-bit to lsb (all other bits will be thus zero'd)
return (byte)val; // 0|1
}
public static uint Iff(bool condition, uint trueValue)
{
uint val = AsByte(ref condition);
return val * trueValue;
// FYI this is slower
//uint mask = AsByte(ref condition) - 1u; // 0x00000000 | 0xFFFFFFFF
//return ~mask & trueValue;
}
public static uint Iff(bool condition, uint trueValue, uint falseValue)
{
uint val = AsByte(ref condition);
return (val * trueValue) + (1 - val) * falseValue;
}
// For brevity, int|long|ulong overloads not shown
Usage:
var foo = BitOps.Iff(n > 0, 33); // Returns 33 if true, else 0
Example of internal
usage:
public static ushort WriteBit(ushort value, int bitOffset, bool on)
{
int shft = bitOffset & 15;
uint mask = 1U << shft;
uint onn = AsByte(ref on); // <-- Note, need to guarantee this is 0|1
onn <<= shft;
return (ushort)((value & ~mask) | onn);
}
Benchmark results (post bool-normalization change)
Method | Mean | Error | StdDev | Median | Scaled |
------------ |---------:|----------:|----------:|---------:|-------:|
Branch | 4.449 ns | 0.0890 ns | 0.2297 ns | 4.367 ns | 0.98 | x
UnsafeCode | 5.645 ns | 0.0117 ns | 0.0091 ns | 5.645 ns | 1.24 |
UnsafeAs | 4.552 ns | 0.0186 ns | 0.0156 ns | 4.552 ns | 1.00 | x Same perf as Branch
UnsafeAsRef | 2.529 ns | 0.0157 ns | 0.0131 ns | 2.522 ns | 0.56 | xx Looking at this too
UnionStruct | 5.646 ns | 0.0155 ns | 0.0137 ns | 5.644 ns | 1.24 |
Benchmark results (pre bool-normalization change)
Method | Mean | Error | StdDev | Scaled |
------------ |---------:|----------:|----------:|-------:|
Branch | 3.923 ns | 0.0884 ns | 0.1322 ns | 1.50 | !
UnsafeCode | 3.431 ns | 0.0358 ns | 0.0334 ns | 1.31 |
UnsafeAs | 2.629 ns | 0.1356 ns | 0.1392 ns | 1.00 | x Much faster than Branch
UnsafeAsRef | 2.329 ns | 0.0178 ns | 0.0149 ns | 0.89 | xx
UnionStruct | 3.505 ns | 0.1408 ns | 0.1446 ns | 1.34 |
@grant-d this If
would not support boolean values bigger than 1. Are they simply not supported? They are legal on the CLR. The C# specification ignores their existence.
One could fix it by doing !!condition
with unclear performance cost.
If could also be:
var mask = (uint)AsByte(ref condition) - 1; //0x00000000 or 0xFFFFFFFF;
return (mask & falseValue) | (~mask & trueValue);
@GSPP: If you stuff a 2 into a bool, either the C# compiler or the Jitter (not sure which) generates incorrect code for the xor operator or select statement on bools. Ergo bool must be either 0 or 1.
Linky: https://stackoverflow.com/questions/48344107/create-bool-value-which-is-neither-true-or-false
The runtime (which includes the JIT) specifies that a bool 1-byte and is true (any non-zero bit pattern) or false (a zero bit pattern). You can see III.1.1.2 in ECMA 335 for more info.
The C# language specifies that a bool is 1-byte and is true (1) or false (0). You can see https://github.com/dotnet/roslyn/issues/24652 for more information.
OK, I have updated the code sample above.
AsByte
now normalizes the bool
's underlying byte
value using the fastest routine I could think of.
Maybe there's an intrinsic for that.
Unfortunately that extra work means that it's about the same performance as branching now. Not sure if that still makes it worth doing, or if we should just incur the actual branch.
@tannergooding I am not sure what the C# contractual obligations are w.r.t. interop with other languages/platforms. Meaning the normalization code may or may not be required?
There's a faster way to normalize if you can have a JIT intrinsic.
or al, al
cmovnz al, 1
cmov isn't a branch and won't stall the pipeline.
@tannergooding, any movement on getting this approved?
This was briefly discussed in the last API review meeting and it was determined that we want to have a more in-depth discussion. We also want to have some more real-world scenarios and example usages for these APIs listed.
Example usages
Sure, would units suffice? My fork is close to 100% covered.
Real world usages
Based on your previous comment (below), and since I picked this up from up-for-grabs, I expected this was already well-known?
This is still a fairly large proposal and covers a number of items that we currently know have use-cases both in and outside the framework.
I expected this was already well-known?
Yes, it is just a matter of ensuring the information is collected and presented as part of the API review.
OK. Here are the implementation & units:
BitOps.cs (in corefx
, I expect it will be in coreclr
in actual PR)
BitOpsTests.cs (in the wrong folder, will fix in actual PR) - see first method BitOps_Samples
for TLDR
Added sample callsites to spec.
@tannergooding will you be handling the evidence requirements?
I noticed a couple of method names were inconsistent in the spec vs discussion. Fixed.
will you be handling the evidence requirements?
It is on my backlog, but there are some higher priority items on my plate that need to be handled first.
@tannergooding / @jhudsoncedaron, I have a design question.
For functions like ExtractBit
which ostensibly return a bool
telling you whether the bit is set or not, would it be more useful in general to return a byte
containing 1|0
? The caller can easily convert that to a bool
, but not the other way around. And a byte
is more easily composable with other twiddling functions, without having to 'de-bool' it.
And for methods like PopCount
, LeadingZeros
, etc maybe a return type of byte
would be better than int
, since the caller can then use any receiving int type/width they prefer?
@grant-d : All my examples would be ExtractBit() wanting to return a bool and SetBit() wanting to pass a bool. The only other functions I really care about are RotateLeft() and RotateRight() which I ended up having to write myself as extension methods because they're easy to get wrong.
Taking/returning a bool
is more desirable and will likely be better handled by the JIT (as, I think, will result ? 1 : 0
).
Updated POC code to call into intrinsics
where possible, eg
public static byte PopCount(uint value)
{
if (System.Runtime.Intrinsics.X86.Popcnt.IsSupported)
{
return (byte)System.Runtime.Intrinsics.X86.Popcnt.PopCount(value);
}
// Software fallback...
}
By the time (if) this ships, hopefully the instrinsics code will be non-preview.
Also see upcoming changes in: https://github.com/dotnet/coreclr/issues/20062
@jkotas pointed out that https://github.com/dotnet/coreclr/issues/21583 was a perf scenario that would be improved by having an intrinsic rotate, ie, something as proposed in this issue, plus JIT work to recognize it.
Note the naming convention used in the related PR, which should be considered in this issue.
TrailingZeroCount
is used there, vs TrailingZeros
proposed here.
https://github.com/dotnet/coreclr/pull/22118
namespace System
{
internal static class BitOps // Note internal
{
public static int TrailingZeroCount(int matches)
@tannergooding is it not possible to stage this PR: just justify/merge the functions that don't need much justification, such as the rotates
, and perhaps the balance of the trailing
/leading
overloads?
See new use case for LZCNT in https://github.com/dotnet/coreclr/pull/22100
Also
https://github.com/dotnet/coreclr/pull/19006
@grant-d Is that the right issue? It seems unrelated.
@GSPP fixed, thanks
is it not possible to stage this PR: just justify/merge the functions that don't need much justification, such as the rotates, and perhaps the balance of the trailing/leading overloads?
Not really, in all cases the APIs must be reviewed and approved before being made part of the public surface area. There is a chance that some smaller subsection of the APIs could be approved before the rest; but that still requires the same work to be done.
If someone wanted to help push this along, they could find the existing usages of proposed APIs above and move them to the currently internal BitOps
class. This would help identify all the places we are currently using these APIs and how they are being used.
Motivation and duplicate implementations of the proposed methods.
Initial search was timeboxed to 1 hour, so there is likely to be more.
OTHER
Found quite a few more callsites whilst implementing the PR.
See the code for more details.
POPCNT
https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/BitArithmetic.cs
https://github.com/dotnet/corert/blob/cf5dc501e870b1efe8cba3d6990752538d174773/src/System.Private.CoreLib/shared/System/Buffers/Text/FormattingHelpers.CountDigits.cs
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1002
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1277
LZCNT
https://github.com/dotnet/corefx/blob/62c70143cfbb08bbf03b5b8aad60c2add84a0d9e/src/Common/src/CoreLib/System/Number.BigInteger.cs#L700
https://github.com/dotnet/coreclr/pull/22100
https://github.com/dotnet/coreclr/pull/19006
https://github.com/dotnet/corefx/blob/151a6065fa8184feb1ac4a55c89752342ab7c3bb/src/Common/src/CoreLib/System/Decimal.DecCalc.cs#L728
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1254 (LeadingOnes)
TZCNT
https://github.com/dotnet/coreclr/pull/22118
https://github.com/dotnet/corert/blob/db8723b1787e18f0124e94c387542d0ef3aac128/src/System.Private.CoreLib/shared/System/BitOps.cs#L16
ROTL/ROTR
https://github.com/dotnet/coreclr/issues/21583
https://github.com/dotnet/coreclr/pull/1830
https://github.com/dotnet/corert/blob/87e58839d6629b5f90777f886a2f52d7a99c076f/src/System.Private.CoreLib/src/System/Marvin.cs#L120-L124
https://github.com/dotnet/coreclr/issues/1619
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L972
https://github.com/dotnet/corert/blob/1bb85b989d9b01563df9eb83dce1c6a3415ce182/src/ILCompiler.ReadyToRun/src/Compiler/ReadyToRunHashCode.cs#L214
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/Common/src/Internal/NativeFormat/TypeHashingAlgorithms.cs#L20
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/Common/src/Internal/Text/Utf8String.cs#L47
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/Internal/Runtime/CompilerServices/OpenMethodResolver.cs#L178
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/System/RuntimeMethodHandle.cs#L67
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/ILCompiler.Compiler/src/Compiler/DependencyAnalysis/NodeFactory.NativeLayout.cs#L345
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.TypeLoader/src/Internal/Runtime/TypeLoader/TypeLoaderEnvironment.NamedTypeLookup.cs#L121
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/Internal/Runtime/CompilerServices/OpenMethodResolver.cs#L178
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/System/RuntimeFieldHandle.cs#L47
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.TypeLoader/src/Internal/Runtime/TypeLoader/TypeLoaderEnvironment.NamedTypeLookup.cs#L121
InsertBit
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1013
https://github.com/dotnet/corefx/blob/bd414c68872c4e4c6e8b1a585675a8383b3a9555/src/System.DirectoryServices.AccountManagement/src/System/DirectoryServices/AccountManagement/Utils.cs#L51
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L51
https://github.com/dotnet/wpf/blob/2cbb1ad9759c32dc575c7537057a29ee7da2e1b2/src/Microsoft.DotNet.Wpf/src/System.Xaml/System/Xaml/Schema/Reflector.cs#L363
ClearBit
https://github.com/dotnet/corefx/blob/bd414c68872c4e4c6e8b1a585675a8383b3a9555/src/System.DirectoryServices.AccountManagement/src/System/DirectoryServices/AccountManagement/Utils.cs#L46
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1314
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L45
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1020
ExtractBit
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L57
https://github.com/dotnet/wpf/blob/2cbb1ad9759c32dc575c7537057a29ee7da2e1b2/src/Microsoft.DotNet.Wpf/src/System.Xaml/System/Xaml/Schema/Reflector.cs#L334
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1294
ComplementBit
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1317
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L196
they could find the existing usages of proposed APIs
@tannergooding I have tried to find several examples of each, searching across all dotnet
C# repos, eg: popcount language:C# user:dotnet
.
Any other examples you're aware of would be welcome.
and move them to the currently internal BitOps class...
Did you mean I should create a PR (specifically keeping the class internal
)?
Add the BitOps
method, then update the callsite(s)
Did you mean I should create a PR (specifically keeping the class internal)?
Add the BitOps method, then update the callsite(s)
Yes, I think that would be reasonable. It should not only help centralize the code but should give us a clearer picture of who is using it and how.
Yes, I think that would be reasonable. It should not only help centralize the code but should give us a clearer picture of who is using it and how.
PR created: https://github.com/dotnet/coreclr/pull/22225
The first commit shows the unedited code copied _verbatim_ from the known call-sites.
It has not been changed, optimized or consolidated since it is meant to demonstrate duplication.
Subsequent commits clean up and consolidate the code. Units already written and will be merged later.
Trimmed down and cleaned-up the spec based on the above analysis.
Still some questions to answer (such as signed overloads or not) but it's now a much tighter surface area.
@grant-d, the original post gas gotten a bit large. Could you also provide a summary of the changes you made here (removals, additions, etc)
Per the changelog:
The last item is related to the new // comments
through the spec regarding overloads for byte
and ushort
. For example, see RotateLeft
in the spec where I emphasize the need for such overloads, but I removed such overloads from (eg) PopCount
.
Also added the two new related sections in the spec; Operand Size & Sign.
Per discussion in https://github.com/dotnet/coreclr/pull/22333, changed return type of count-oriented methods (such as PopCount
) to int
instead of uint
.
Also updated the POC code
@tannergooding I completed a set of coreclr
PRs to:
coreclr
BitOps
methodsI would like to update downstream callsites in corefx
and corert
as well as externals such as roslyn
.
And of course expose them for public consumption.
But I _think_ I need a reference assembly for all those cases, right?
Please advise me of the next steps?
Updated spec to match findings of code analysis discussed above. There were several existing callsites that used byte
overloads.
byte
overloads to all crud (extract
, insert
, clear
, complement
) methods.bool ExtractBit(byte value, int bitOffset)
@grant-d I would scope this issue down to just what we have in CoreLib and that is clearly useful. It will make the API review on this easy and it will allow us to make the most useful APIs public in .NET Core 3.0.
I would split the bit manipulation APIs into a separate issue that will be looked at and discussed separately, on its own schedule.
That's a great idea Jan. I had already split this into 2 asks in the summary, but I will create two separate issues instead
Lets get the subset from dotnet/corefx#35419 through first.
Tagging subscribers to this area: @tannergooding
Notify danmosemsft if you want to be subscribed.
I'd like to move forward and mark this as ready-for-review
. However, at the very least BitOps
needs to become BitOperations
, as that is the name of the already approved class. CC @grant-d
It's worth noting that of the variants being exposed, byte
doesn't have hardware support. The x86 hardware only accelerates 16, 32, and 64-bit variants. The proposal, as given, isn't exposing short
, ushort
, long
, or ulong
(nor sbyte
), which I think is based on @jkotas' request to just trim this down to what S.P.Corelib is using, correct?
The native intrinsics return unsigned char
(effectively bool
) and take the first parameter by pointer, so I think taking by ref
as a variant makes sense and fits in with how these are expected to be used (which is likely in a if (InsertBit(ref _value, 4))
like scenario). @CarolEidt, you raised some concerns around taking ref
above, is that still the case given the non mutating variants?
_value
is mutated and the original value needs to be returned, is it a value tuple like (int newValue, bool originalBitValue)
?Meant to link to the native intrinsics above: https://docs.microsoft.com/en-us/cpp/intrinsics/bittest-bittest64?view=vs-2019
which I think is based on @jkotas' request to just trim this down to what S.P.Corelib is using, correct?
The original proposal had methods like PopCount that were already approved and that we had no intrinsic patterns for.
I do not see value in having method for bit tests, etc. I believe that the existing pattern for bit tests like x & (1 << 5)
are very concise and easy to read for anybody deals with bits.
The fact that MSVC C++ introduced an intrinsic for bit tests sounds like over-engineering to me. They could have picked that up as pattern match just fine.
They could have picked that up as pattern match just fine.
And I believe they do, but its the same fundamental problem as with RotateLeft
and RotateRight
, which is that there are a plethora of possible patterns and those may not be obvious, especially to novice/beginner programmers.
On the other hand, a method provides an explicit way to opt into the functionality, know that it will work (there probably isn't some edge case they are forgetting around bit manipulation), and that it will most likely be efficient for the given platform.
I believe this is why we have 13 functions across dotnet (which grant-d found, at least) covering these "concise and easy to read" patterns: https://github.com/dotnet/runtime/issues/27382#issuecomment-457718507. That is, they exist because the API is that much more concise, easy to read, and allows logic updates without finding everywhere you wrote the pattern.
It also solves the problem of matching a more complex pattern where you want to simultaneously branch based off the existing value of a bit while simultaneously setting it to a new value.
RotateLeft and RotateRight are fine with me. The pattern to write that using language operators is not concise, and there are multiple ways to write that.
I believe this is why we have 13 functions across dotnet (which grant-d found, at least) covering these "concise and easy to read" patterns
The most common ones like RotateLeft / RotateRight, PopCount, etc. are addressed already.
The remaining ones are the bit tests: InsertBit, ClearBit, etc. They show a lot less occurences. I have done a few searches for patterns like & (1 <<)
or | (1 <<
. Based on these searches, I believe that it is about 100x more common for people to use the these direct patterns than to wrap it in the helper method.
I'd like to move forward and mark this as
ready-for-review
. However, at the very leastBitOps
needs to becomeBitOperations
, as that is the name of the already approved class. CC @grant-dThanks for the ping Tanner, updated the spec's name to
BitOperations
FYI, here's my previous research on duplicate functions for each operation.
A year+ old so may be more (or less) now:
https://github.com/dotnet/corefx/issues/32269#issuecomment-457689128
eg, in ImmutableHashMap
:
c#
[Pure]
private static uint InsertBit(int position, uint bits)
{
Debug.Assert(0 == (bits & (1u << position)));
return bits | (1u << position);
}
This is a good example: It asserts and sets the bit. We would not be able to replace it with the proposed method, without losing the debug validation.
A question: could be possible to include optimized array/span based versions? Use case is bitmask of arbitrary length backed by array of uints (likely, like BigInteger
but this class isnt specially optimized for bitmasking and doesnt expose setting bits by index/position).
If theres no extra gains (like even more vectorization) then forget my question, it will be better to simply implement BitMask
on top of current API proposal.
@CarolEidt, you raised some concerns around taking ref above, is that still the case given the non mutating variants? If it is, what is the proposed mechanism to handle that _value is mutated and the original value needs to be returned, is it a value tuple like (int newValue, bool originalBitValue)?
IMO the best API for these cases is a value tuple. However, admittedly I haven't yet had the time to get my PR https://github.com/dotnet/runtime/pull/37928 working, which will enable the best codegen for these cases.
If the ref API is preferred, the JIT can still take one of two avenues to achieve the same result:
could be possible to include optimized array/span based versions?
I would think that would be doable fully in C#, if it was considered desirable.
Most helpful comment
Motivation and duplicate implementations of the proposed methods.
Initial search was timeboxed to 1 hour, so there is likely to be more.
OTHER
Found quite a few more callsites whilst implementing the PR.
See the code for more details.
POPCNT
https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/BitArithmetic.cs
https://github.com/dotnet/corert/blob/cf5dc501e870b1efe8cba3d6990752538d174773/src/System.Private.CoreLib/shared/System/Buffers/Text/FormattingHelpers.CountDigits.cs
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1002
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1277
LZCNT
https://github.com/dotnet/corefx/blob/62c70143cfbb08bbf03b5b8aad60c2add84a0d9e/src/Common/src/CoreLib/System/Number.BigInteger.cs#L700
https://github.com/dotnet/coreclr/pull/22100
https://github.com/dotnet/coreclr/pull/19006
https://github.com/dotnet/corefx/blob/151a6065fa8184feb1ac4a55c89752342ab7c3bb/src/Common/src/CoreLib/System/Decimal.DecCalc.cs#L728
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1254 (LeadingOnes)
TZCNT
https://github.com/dotnet/coreclr/pull/22118
https://github.com/dotnet/corert/blob/db8723b1787e18f0124e94c387542d0ef3aac128/src/System.Private.CoreLib/shared/System/BitOps.cs#L16
ROTL/ROTR
https://github.com/dotnet/coreclr/issues/21583
https://github.com/dotnet/coreclr/pull/1830
https://github.com/dotnet/corert/blob/87e58839d6629b5f90777f886a2f52d7a99c076f/src/System.Private.CoreLib/src/System/Marvin.cs#L120-L124
https://github.com/dotnet/coreclr/issues/1619
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L972
https://github.com/dotnet/corert/blob/1bb85b989d9b01563df9eb83dce1c6a3415ce182/src/ILCompiler.ReadyToRun/src/Compiler/ReadyToRunHashCode.cs#L214
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/Common/src/Internal/NativeFormat/TypeHashingAlgorithms.cs#L20
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/Common/src/Internal/Text/Utf8String.cs#L47
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/Internal/Runtime/CompilerServices/OpenMethodResolver.cs#L178
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/System/RuntimeMethodHandle.cs#L67
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/ILCompiler.Compiler/src/Compiler/DependencyAnalysis/NodeFactory.NativeLayout.cs#L345
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.TypeLoader/src/Internal/Runtime/TypeLoader/TypeLoaderEnvironment.NamedTypeLookup.cs#L121
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/Internal/Runtime/CompilerServices/OpenMethodResolver.cs#L178
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/System/RuntimeFieldHandle.cs#L47
https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.TypeLoader/src/Internal/Runtime/TypeLoader/TypeLoaderEnvironment.NamedTypeLookup.cs#L121
InsertBit
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1013
https://github.com/dotnet/corefx/blob/bd414c68872c4e4c6e8b1a585675a8383b3a9555/src/System.DirectoryServices.AccountManagement/src/System/DirectoryServices/AccountManagement/Utils.cs#L51
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L51
https://github.com/dotnet/wpf/blob/2cbb1ad9759c32dc575c7537057a29ee7da2e1b2/src/Microsoft.DotNet.Wpf/src/System.Xaml/System/Xaml/Schema/Reflector.cs#L363
ClearBit
https://github.com/dotnet/corefx/blob/bd414c68872c4e4c6e8b1a585675a8383b3a9555/src/System.DirectoryServices.AccountManagement/src/System/DirectoryServices/AccountManagement/Utils.cs#L46
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1314
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L45
https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1020
ExtractBit
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L57
https://github.com/dotnet/wpf/blob/2cbb1ad9759c32dc575c7537057a29ee7da2e1b2/src/Microsoft.DotNet.Wpf/src/System.Xaml/System/Xaml/Schema/Reflector.cs#L334
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1294
ComplementBit
https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1317
https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L196