Runtime: Proposal: Add a BitManipulation class

Created on 6 Oct 2016  ·  101Comments  ·  Source: dotnet/runtime

I propose adding a new class to CoreFx, which will contain many common bit manipulation techniques.
Part of having a class like this is that the Jit can target these methods class for intrinsics, similar to what was done for Vector<T>.
There is still some open discussion as to what the shape of this API should be, and even the class name.

Class Name

There are several class names being thrown around and none are particularly winning everyone over.

  • Bits
  • BitOperations or BitOps
  • BitUtility
  • BitTwiddler
  • BitView (see the next section)

Two Classes?

@benaadams, correctly notes that there seems to be two different APIs here.
A low level view allowing you to manipulate a numeric type to extract or inject a bit (like a bit vector), byte, short, or int value.
These methods are the equivalent of the following but safer (and possibly faster)

public static unsafe int ExtractInt(ulong u,int pos)=>((int*)&u)[pos];
public static unsafe ulong InjectInt(ulong u, int pos, int toSet) {
    ((int*)&u)[pos] = toSet;
     return u;
}

And another set of utility exposing methods for treating an integer register type like a unit of data allowing you to manipulate it or introspect it.

Does it make sense to keep these APIs in one class?

Method Names

Another point of contention. What should we call these methods? For the "view" members, there is some dislike of the naming convention of Get/Set even though there is prior art like BitVector32 & SqlDataRecord. Everyone seems to like Read/Write more, but while Read is fine. Write isn't neccessarily the operation being done. I'm still looking for some verbiage to note that this really takes a value, modifies it a bit (no pun intended), and spit out a new one.

PopCount/HammingWeight/ CountSetBits - We can't decide on this name. I personally like PopCount as it is a well known name for the algorithm. However, for someone who does not CPU intrinsics or bit tricks this name might mean nothing to you. the .net api is split where common or well known algorithms are simply called that (e.g. Rijndael) and sometimes descriptive for new users. I think that this class in general is fairly low-level so even a novice should be expected to do a quick google search in the subject area.

And even naming the methods as actions (e.g. CountTrailingZeroes) or properties (e.g. TrailingZeroesCount).

Hardware Acceleration / Detection

@benaadams, suggested adding a simple flag to determine if the class has hardware acceleration. I personally suggest going a step further and adding an enum to describe which methods are accelerated (not in the PoC).
Unfortunately, the enum based approach does raise the question if the jit could do branch elimination on a comparison like the following if it could replace AcceleratedOperations as a runtime constant. (unknown)

// 
if (Bits.AcceleratedOperations == BitOps.PopCount){
   // use popcount
}
else{
  // work around it?
}

I admittedly question what you would do differently if it isn't accelerated. This isn't like Vector<T> where you might switch to a different approach and use ulong like smaller vectors. The methods should be pretty good solutions to these problems and outside of switching to some native code I don't see us doing better.
Methods that could be accelerated (in AMD64 at least)::
PopCount => POPCNT
RotateRight => ROR (already accelerated!)
RotateLeft => ROL (already accelerated!)
CountLeadingZeroes => LZCNT (in ABM compliant hardware) or (BSR followed by xor 63)
CountTrailingZeroes => TZCNT / BSF
ReadBit => BT
WriteBit => BTS/BTC (maybe?)
ReadByte/ReadInt16/ReadInt32 => BEXTR (possibly)

Updated spec:

public static class Bits{
       public static int PopCount(ulong value);
       public static int PopCount(long value);
       public static int PopCount(uint value);
       public static int PopCount(int value);

       public static ulong RotateRight(ulong value, int offset);
       public static uint  RotateRight(uint value, int offset);

       public static ulong RotateLeft(ulong value, int offset);
       public static uint  RotateLeft(uint value, int offset);

       public static int CountLeadingZeros(ulong value);
       public static int CountLeadingZeros(long value);
       public static int CountLeadingZeros(uint value);
       public static int CountLeadingZeros(int value);

       public static int CountTrailingZeroes(ulong value);
       public static int CountTrailingZeroes(long value);
       public static int CountTrailingZeroes(uint value);
       public static int CountTrailingZeroes(int value);

       public static  bool ReadBit(ulong value, int offset);
       public static  bool ReadBit(long value, int offset);
       public static  bool ReadBit(uint value, int offset);
       public static  bool ReadBit(int value, int offset);

       public static ulong WriteBit(ulong value, int offset, bool toSet);
       public static long WriteBit(long value, int offset, bool toSet);
       public static uint WriteBit(uint value, int offset, bool toSet);
       public static int WriteBit(int value, int offset, bool toSet);

       public static byte ReadByte(ulong value, int offset);
       public static byte ReadByte(long value, int offset);
       public static byte ReadByte(uint value, int offset);
       public static byte ReadByte(int value, int offset);

       public static short ReadInt16(ulong value, int offset);
       public static short ReadInt16(long value, int offset);
       public static short ReadInt16(uint value, int offset);
       public static short ReadInt16(int value, int offset);

       public static int ReadInt32(ulong value,  int offset);
       public static int ReadInt32(long value,  int offset);

       public static ulong WriteByte(ulong value, int offset, byte toSet);
       public static long WriteByte(long value, int offset, byte toSet);
       public static uint WriteByte(uint value, int offset, byte toSet);
       public static int WriteByte(int value, int offset, byte toSet);

       public static ulong WriteInt16(ulong value, int offset, short toSet);
       public static long WriteInt16(long value, int offset, short toSet);
       public static uint WriteInt16(uint value, int offset, short toSet);
       public static int WriteInt16(int value, int offset, short toSet);

       public static ulong WriteInt32(ulong value, int position, int toSet);
       public static long WriteInt32(long value, int position, int toSet);
}

Edit: POC class
https://gist.github.com/mburbea/c9a71ac1b1a25762c38c9fee7de0ddc2

More updates! Removed signed rotate operators.

api-needs-work area-System.Numerics

All 101 comments

c# public int IsBitSet(long value, int position);

Should this return bool instead of int?

@nguerrera I recall you had mentioned some internal helpers that Roslyn was using that were related to this sort of functionality. Any input here?

CountBits is the main one. Search for BitArithmetic in Roslyn and corefx. I've long wanted a nice class to hide the scariness of https://graphics.stanford.edu/~seander/bithacks.html

Yes that should return bool. If this seems like something that would be liked I'd happily submit a pr for little endian architecture.

This issue, at least partially, overlaps with https://github.com/dotnet/coreclr/issues/6906

Shouldn't the methods be declared as static?
And these have variable name conflicts and a missing comma on SetInt16.

public void SetBit(long value, int position, bool value); public void SetByte(long value, int position, byte value); public void SetInt16(long value, int position. short value); public void SetInt32(long value, int position, int value);

corrected
public static void SetBit(long value, int position, bool to); public static void SetByte(long value, int position, byte to); public static void SetInt16(long value, int position, short to); public static void SetInt32(long value, int position, int to);

All and all I like the idea of a central location for common bit manipulating functions.

Yes, I fixed the error thanks @SunnyWar .
@Tornhoof ,Yes it does, for some of the intrinsics.

@mburbea All the Set methods need a return value or the value as an out param.

Very, very much +1 for this idea. Could contain other useful methods such as

public int CountBits(int value);
public int RoundUpToPowerOfTwo(int value);
public int IsPowerOfTwo(int value);

to prevent people from having their own hand-rolled, inefficient implementations.

IMO, we may also want to name it something like BitUtility instead of BitManipulation. The latter feels a bit verbose, and is more prone to typos.

Isn't CountBits just pop count?

@mburbea Yes, indeed. I believe the JIT does not currently generate that instruction yet though (related issue: dotnet/runtime#14781) because the code pattern is too hard to recognize, so this might be a good entry point to expose it.

Yeah, @jamesqo, all I was saying that the method was covered. I wouldn't be opposed to calling it something else, I went with the name because it was suggested by @mellinoe .
I think these are good additions and once we have some of these intrinsics could be implemented quite easily.

public static long IsPowerOfTwo(long value)=> x == (x&-x) && x!=0;
public static long RoundUpToPowerOfTwo(long value)=> 1 << (64 -CountLeadingZeroes(value - 1));

@mburbea

Yeah, @jamesqo, all I was saying that the method was covered. I wouldn't be opposed to calling it something else, I went with the name because it was suggested by @mellinoe .

Ah, ok, I see.

Question-- do you think it's necessary to have methods like IsBitSet, GetByte, SetByte, etc? I mean most people with basic knowledge of bit manipulation would know they could just write (value & (1 << position)) != 0 instead of IsBitSet, (value & 0xFF00) >> 8 instead of GetByte(value, 1), etc. I think the most useful methods in this class would be the ones that aren't very obvious to implement.

Also, there is somewhat of an ambiguity-- people don't know if the position parameter starts from the msb or lsb.

@jamesqo I think it's easier to _read_ a single method call with a descriptive name than an expression with three operators and two magic constants (even if they are just 0 and 1). Especially when it comes to negation, I think it's relatively easy to confuse (value & (1 << position)) != 0 and (value & (1 << position)) == 0, but you aren't likely to confuse IsBitSet(value, position) with !IsBitSet(value, position).

You are free to keep writing it the way you're used to, but I think having those methods in the base library as an option would be useful.

I'm not opposed to removing them if they're considered a hindrance to api acceptance, but I like having convience methods. None of those methods are hard to write, but like most C# developers most of my code is higher level, so I rarely twiddle bits. When I need to, I often end up googling these things, so centralizing them made sense to me.

@mburbea @jamesqo I write a lot of bit twiddling code and still would appreciate to be able to get a proper API I could reason about instead of having my entire codebase filled with magic constants, shifts and binary operations. Always as long as the actual code generated is as efficient (guaranteed) as the code generated using them.

@redknightlois Hmm, ok. I guess popular vote wins since everyone seems to want those methods :smile: There is still the issue of ambiguity whether position starts from the msb or lsb, though.

That might be a question of endianness (as in big endian long, the msb is where the 0th bit is stored and the lsb is where the 63rd bit is stored, but in little endian it's reversed),

Intuitively, I feel its implied that its in sequential order for your hardware. e.g. SetBit(0, 0, true) should return the number 1 regardless of endian order.

@mellinoe @CarolEidt https://arxiv.org/abs/1611.07612 faster popcnt than popcnt using AVX2
Apache License 2.0 source https://github.com/CountOnes/hamming_weight

@mellinoe , @CarolEidt What else needs to be resolved for this API proposal to move forward?

@mburbea I think that this is in a good state and should be ready for our meeting next week.

For our reviewing, I think having a real implementation for all of these methods would actually be very helpful. A big part of the benefit of these API's is that they will hide the "scariness" of complicated bit manipulation from the user. It would be good if we could see just how "scary" the code is on a case-by-case basis. For example, it's not clear that the code for SetBit (above) is scary enough to justify inclusion, but I would bet that the code for PopulationCount definitely is. Having the code ready to look at would be very helpful for our review.

Additionally, can you update the original post with your finalized proposal, including any extra methods that were suggested that you believe are valuable?

@mellinoe, would these methods have intrinsic support (many of these functions have CPU level instructions that are useable/preferred in some scenarios).

@mellinoe, would these methods have intrinsic support (many of these functions have CPU level instructions that are useable/preferred in some scenarios).

It's something to consider as an optimization, but I don't think it's anything we need to block the proposal or an implementation on.

Are we still considering different names for these methods? IMO, in their current form they are kinda verbose-- for example, BitManipulation.PopulationCount is pretty long to type, and its meaning is not immediately obvious to non-assembly programmers. (It may not even be implemented with popcnt.)

I think we may want to rename some of the methods, e.g.:

  • PopulationCount => CountBits
  • CircularShiftLeft => RotateLeft
  • CircularShiftRight => RotateRight

We can also consider another name for the class, such as BitTwiddler. Compare the number of characters:

BitTwiddler.RotateLeft(i, 5);
BitManipulation.CircularShiftLeft(i, 5);

edit: In addition, GetInt32 / SetInt32 and family sound a bit too Java-like for me. (Set also implies that something is being mutated, rather than a new value being returned.) Not sure what would be a good rename, though.

IMO, C# has always been a bit more on the verbose side. I felt that was intentional as code is more often read than written. The longer names have always been explained away with the fact that most people are writing C# using an IDE which offers Intelisense.

  • I am okay with renaming the class to maybe something like BitUtility.
  • I'll rename CircularShift*=> Rotate*.
  • PopulationCount: hows about PopCount? It is a well known name for this operation, and is shorter than CountBits.

SqlDataRecord has both a GetInt32 and SetInt32, so the names are not exactly unique to Java.
As for the mutation question, would you prefer that the method takes a ref parameter and modify their input?
Edit: Any suggestion for namespace? System.Numerics?

We use Bits and Sparrow.Binary as a namespace to put all that stuff and in practice it reads quite well. https://github.com/ravendb/ravendb/blob/v4.0/src/Sparrow/Binary/Bits.cs (Sparrow is our fast library name, call that System.Binary here instead).

PopCount is a very well known name, the same with RotateLeft32, RotateRight32, RotateLeft64, RotateRight64

I wonder if this is one of the types that is likely going to be used with using static. If that's the case, the length of the class name does not matter much, since it's going to appear only once per file.

@mburbea

IMO, C# has always been a bit more on the verbose side. I felt that was intentional as code is more often read than written.

Verbose does not always equal more readable-- the meaning that's conveyed is what's important. CircularShiftLeft does not tell the reader more information than RotateLeft.

I am okay with renaming the class to maybe something like BitUtility.

I'll rename CircularShift=> Rotate.

:smile: :tada: We should consider both BitUtility (in tandem with WebUtility) and BitTwiddling for the name.

PopulationCount: hows about PopCount? It is a well known name for this operation, and is shorter than CountBits.

IMO, I think people who don't have a background in assembly deciphering what PopCount means without Googling it. CountBits is longer, but only by 1 character. I think the API reviewers should consider both.

SqlDataRecord has both a GetInt32 and SetInt32, so the names are not exactly unique to Java.

Yes, but the point is Get / Set kind of goes against the grain-- .NET has properties, so get/set-naming isn't very common in code. The example you pointed out is an object which mutates state, which is different from the scenario we're dealing with currently.

As for the mutation question, would you prefer that the method takes a ref parameter and modify their input?

ref parameters are not very flexible and do not work well when you want to do multiple operations on the int. I am OK with the returning-a-new-value-each-time scheme, I'm just trying to think of a better name.

Any suggestion for namespace? System.Numerics?

BitConverter is already in System, and it would be inconvenient if someone had to add another using every time they wanted to do bit twiddling. I think it should be System.

@svick

I wonder if this is one of the types that is likely going to be used with using static. If that's the case, the length of the class name does not matter much, since it's going to appear only once per file.

I think we should make it so people don't have to add using static in the first place. I'm not sure what you mean by "one of the types that is likely going to be used with using static;" I rarely use it because it can be very ambiguous to the reader.

Maybe some other people would be interested in this discussion? cc @justinvp, @nietras, @JonHanna

Here is my quick and dirty implementation of the class, I can set up a PR later if this looks good. I'll update the speclet later today.
https://gist.github.com/mburbea/c9a71ac1b1a25762c38c9fee7de0ddc2

As you can see pretty much everything is full of magic constants and shift operators. Suprisingly, the Get/Set were actually some of the hardest to write code (curse you sign extension!). Outside of GetBit/SetBit I couldn't really think of a good reason to use the other operations, so if those get axed I wouldn't mind.
Questions:
Should we do input validation? I think this could be a drag on the performance of these methods. The whole point of this is you know what you're doing and if you ask an invalid question, you get an invalid result.
Big endian? I don't know we should handle it, as certainly some of these methods would fail in big endian environments. There might be some implementation agnostic approach but they might be slower and might lead someone writing low level code to avoid using this class. (defeating it's purpose).

Generally in .NET is ReadX and WriteX isn't it? So ReadInt32 rather than GetInt32

I feel as though input validation on these bits defeats the purpose. If validation were to be done, I would personally want them to wrap a version that did no validation (so you would have FastOperation and ValidatedOperation, where ValidatedOperation does the check and then calls FastOperation and both APIs were publicly available).

I would also guess that big endian does not need to be explicitly supported here. If a big-endian platform was ever supported, then we have other APIs that would need big-endian versions, such as BitConverter.Int64BitsToDouble, and it was explicitly decided that the bridge would be crossed when/if it came to that.

I think that, long term, all of these APIs would be good candidates for intrinsic support, as most of them compile down to a single CPU instruction on most modern computers.

I do not personally like BitTwiddler (it is not necessarily understandable by someone who speaks/reads English as a secondary language).

As for PopCount versus CountBits, I would vote for PopCount as it is used almost universally elsewhere (CPU instructions, Compiler Intrinsics, etc). It may be worth noting that it seems BitCount is used in the few places where PopCount is not.

Anyone doing bit manipulation should likely be reading the summary of the methods they are using or would be familiar with the other versions of PopCount available in other languages.

@benaadams

Generally in .NET is ReadX and WriteX isn't it? So ReadInt32 rather than GetInt32

I like your idea. Bit-twiddling is all about low-level programming, so "read" / "write" jives with me more here than "get" / "set". In addition, they sound smoother.

@mburbea

Should we do input validation?

I don't think we should. This should be a low-level API that has minimal overhead compared to writing manual bit twiddling code.

Big endian? I don't know we should handle it, as certainly some of these methods would fail in big endian environments.

Would they? I don't see any methods where a pointer is being cast to another pointer of a narrower/wider type. How would endianness be an issue here?

@jamesqo, CountTrailingZeroes/CountLeadingZeroes relies on bit-isolation tricks that only works correctly in a little endian environment. In a big endian environment it does not isolate the correct bit.
See this comment from Kestrel where a simmilar technique was used.
https://github.com/aspnet/KestrelHttpServer/pull/1138#issuecomment-255878891

@jamesqo thanks for pinging me.

@mburbea This is definitely interesting and having this functionality at hand would be a great addition to .NET 👍

This got a bit long... just a warning ;)

Naming

Class Name

First, my thoughts on the class name below. Some are nitpicking for sure.

|Class Name | Notes |
|----------------|-------|
|BitManipulation | Not fond of using "manipulation" in the name, this implies manipulating or in other words changing or modifying something. Many methods do not manipulate anything, instead they inspect/measure bits in a value. |
|BitTwiddler | Twiddling is analoguous to manipulation, and is not a good term since it is rather esoteric (less known perhaps?)|
|BitUtility | I think this has precedence, but personally not that happy about it...|
|BitHelpers | Precedence in RuntimeHelpers, HashHelpers, but I'm not that fond of this, what code isn't "helpful"? At least all code should be.|
|BitOperations | A more general name than manipulation, its a bit shorter and still accurate, operations on bits. |
|BitOps | Shorter, still readable, has a bit of precedence in e.g. OpCodes, RuntimeOps... I think |
|Unsafe | Thought of this just before commenting, disregarding the technical issues here, these could also be added to the Unsafe class directly, I don't like it since it is probably better to group them in a separate type, but keeping the options open.|
|?? | Hope I didn't miss any?|

Any of these could and should perhaps be prefixed with Unsafe, as I agree these should have no validation and be as close to metal as possible. In fact, the documentation of these should probably say that if any input parameter is outside an expected range the result will be implementation, machine dependent. E.g. using position 65 for a 64-bit long can do pretty much anything, return a result, crash the entire machine etc.

Method Naming

I definitely think methods should use Read/Write instead of Get/Set, as in the Unsafe class. And I think all methods should start with a verb such as Read, Write, Is, Shift, Rotate, Count, Flip, Toggle, Round etc.

Regarding PopulationCount, I agree this is a well known intrincic, but is the target audience people who know intrinsics? Or any .NET developer in need of a bit operation? I think CountBits is clear in intent, and I am sure a person who knows intrinsics will think this does the same, but perhaps wonder if it is as efficient... so docs should be clear about it.

Grand Scheme

My biggest beef with this is that I have a hope of all intrinsics being handled in a unified way (say System.Intrinsics) that also ensures such methods will be available for larger types (think vector registers), such as 128-bit, 256-bit, 512-bit etc.

With Intrinsics being a fixed size compliment to the variable sized Vector<T> (rant: an annoying and very overloaded name seen from a geometry, mathematics perspective and given C++ vector<T>). We have discussed this many times but unfortunately this is probably a very long term scheme, so adding
these bit methods (in System.Runtime.CompilerServices namespace perhaps?) is a more practical step in the right direction at least.

It would be good to consider how larger types 128-bit, 256-bit etc. could be supported?

My Current Suggestions

Call this UnsafeBitOps place it in System.Runtime.CompilerServices. Although, UnsafeBitOperations works for me too.

@mburbea could your update your proposal in beginning with the latest proposal?

One reason for class splits to consider is the addition of a IsHardwareAccelerated flag and what granularity it works over (and needs to work over)

@nietras, done sorry was a bit slacking. Included my POC.

Naming

Class Name

I vote for BitUtility, Bits or BitManipulation. I definitely do not like the idea of associating unsafe with this class. Or even declaring the class unsafe. I and other developers, generally avoid unsafe code when I can, and will write slower (but safer) code to avoid it. The whole point of this class is to encourage people to use these methods, which are effecient and well tested version of algorithms instead of naive approaches

// Probably performs horribly, but def works
int PopCount(long u) => Convert.ToString(u, 2).Count(x => x == '1');

With an alterior motive, that in a future version of the runtime, some of these methods can be turned into intrinsics.
Furthermore, All of these operations are safe, they don't deal with memory, and using these methods incorrectly will not lead to access violations, or really any exception at all. I think the better concept to associate them with is unchecked.

Method Names

PopCount: I really have never heard of CountBits and naively it doesn't sound like its doing the right thing. A bit can be 0 or 1, why does 1 make something a bit, but 0 does not? I've seen this method called CountOnes and for a naive reader that certainly sounds like the right thing. (How many ones are in this thing?), but at the same point if you need this operation, you'd either code it yourself (hopefully doing a better job than my naive approach) or you'd google it and be told that it's called popcount/hemming weight/horrizontal addition from stack overflow.
Verbs for methods? Eh,, I don't know. If that's the argument we could call it GetPopCount. But not every method has to be a verb. Certainly well known algorithms are just called by their names.
Read/Write, I don't know. As I said, Get/Set is not exactly completely unfounded in C#.

Extending to Vector<T>? Yes! I'd love that. But the literature on how to write such methods is pretty slim as pretty much those methods are intrinsics.

One reason for class splits to consider is the addition of a IsHardwareAccelerated flag and what granularity it works over (and needs to work over)

It would be great to find a general mechanism that we could use to query, on a method-level basis, whether something is supported as an intrinsic by the JIT. We've had some discussions about this in the past, but I don't think there was a concrete proposal that could be implemented efficiently.

@mburbea Regarding your comment earlier

CountTrailingZeroes/CountLeadingZeroes relies on bit-isolation tricks that only works correctly in a little endian environment. In a big endian environment it does not isolate the correct bit.
See this comment from Kestrel where a simmilar technique was used.

I didn't know what a De Bruijn sequence was earlier, so I had to spend an hour making sure I had all my facts straight before I replied to your post :smile: However, I still do not think these will have any dependence on endianness.

For example, in CountTrailingZeroes, ((x ^ (x - 1)) * Debruijn64) >> 58 should always have the same result for a given x. Unless I'm misunderstanding you, it would be completely insane if arithmetic/bitshifting operations returned different results for different architectures. The comment you linked to has this code snippet

  uint8_t valueAtIndexZero = ((uint8_t*)&longValue)[0];
  uint8_t valueAtIndexTwo = ((uint8_t*)&longValue)[2];
  uint8_t valueAtIndexFive = ((uint8_t*)&longValue)[5];

which is violating strict aliasing by casting the uint64_t* to a pointer of a smaller type. I don't think it is possible to tell the difference between big/little endian otherwise. (@benaadams can chime in on this)


Regarding naming,

I vote for BitUtility, Bits or BitManipulation.

Bits sounds like a good name too, even if it is a little generic. Most bitshifting code is very terse (not a lot of words) so less characters is valuable.

Read/Write, I don't know. As I said, Get/Set is not exactly completely unfounded in C#.

Whichever you decide, you may want to change IsBitSet to ReadBit / GetBit to make it consistent with the other methods. Also would prevent weirdness like

int value = BitUtility.SetBit(val, 0, false);
BitUtility.IsBitSet(value, 0); // IsBitSet returns false, after we called SetBit

CountBits and naively it doesn't sound like its doing the right thing

@mburbea you are right! CountBits is not clear. CountOnes better, I can live with PopulationCount, although...

whole point of this class is to encourage people to use these methods, which are effecient and well tested version of algorithms

This sounds more like the target is to be safe, correct and developer friendly, rather than optimal and efficient. Perhaps both can be obtained, at least for some methods for sure, but some look like they require input validation if they are to be for the "average" .NET developer. And in that case, I would think PopulationCount to be less developer friendly for people that have never heard of popcnt.

I was hoping for the priority being "optimal and efficient" rather than "safe". If "safe" is not the case it seems in .NET one has to say either "Unsafe" or "Dangerous" to mark the class or method as being well without hands holding. ;) Not ideal, but at least specific.

Why "optimal and efficient", because that is harder to get to than safe... having them as close to "intrinsics" as possible would be best first in my view. :)

Population count is a bit weird in a high level language as "population" likely means something different to the user; there is a domain change in language with a CPU designer.

Also it is a property of the value rather than an action; so shouldn't start with a verb?

Perhaps more SetBitCount and its opposite UnsetBitCount and maybe an extension method?

Also since popcnt and bitscan will either be hardware or not together; many be group together?

e.g.

public static class BitExtensions
{
       public bool IsHardwareAccelerated {get;}

       public static int SetBitCount(this ulong value);
       public static int SetBitCount(this long value);
       public static int SetBitCount(this uint value);
       public static int SetBitCount(this int value);

       public static int UnsetBitCount(this ulong value);
       public static int UnsetBitCount(this long value);
       public static int UnsetBitCount(this uint value);
       public static int UnsetBitCount(this int value);

       public static int LeadingZeroCount(this ulong value);
       public static int LeadingZeroCount(this long value);
       public static int LeadingZeroCount(this uint value);
       public static int LeadingZeroCount(this int value);

       public static int TrailingZeroCount(this ulong value);
       public static int TrailingZeroCount(this long value);
       public static int TrailingZeroCount(this uint value);
       public static int TrailingZeroCount(this int value);
}

Usage

var hammingDiff = (val1 ^ val2).SetBitCount();

I've updated the proposal voicing my vote for Bits and renaming Get/Set to Read/Write since it seems more well liked. I also changed from IsBitSet=>ReadBit to be more consistent with simmilar methods.

@jamesqo, you might be right. I've no way to test this code in a big endian environment so it may just work.

@nietras, I somewhat get the point, but unsafe is a rather large disincentive. In a CoreCLR project, Visual Studio doesn't even have a friendly way for you to enable unsafe code, you have to update the project.json manually.
If I understood CoreCLR#1830, it seems that rotateRight/RotateLeft can be made input safe (and I think I did so in my code), so really the only methods that cannot be "safe" for any input are the bit/byte/short/int extractions/substitution methods.

@benaadams, I like the idea of a IsHardwareAccelerated method, but some architecture may only have some method accelerated, or poorly accelerated. POPCNT & LZCNT showed up in ABM but TZCNT came latter in BMI1. Before that TZCNT would probably become a bsf followed by xor 63, and would most likely be undefined on 0. 64-bit bsf is known to have been pretty slow on older hardware.

@mburbea its bigger than CPU type; its also version of Jit - i.e. there is nothing particularly in the api that means it couldn't happily be .NETStandard 1.0 - however it will only be hardware accelerated on a Jit that doesn't currently exist (so at least 4.6.3+ and Core 1.2.0+ though maybe version after...)

Also to match the action name the write functions should be

public static void WriteInt16(ref ulong value, int offset, short toSet);

Otherwise you aren't writing; you are returning a changed value

C# public static int SetBitCount(this ulong value); public static int UnsetBitCount(this ulong value);

I don't like these names because they imply that something is being modified, since they start with Set and Unset. I'm also not a fan of using extension methods like that; I think these should all be regular static methods.

they imply that something is being modified, since they start with Set

Ah yes, the ambiguity of English... 😄 But it does highlight an angle I hadn't thought of as to why PopulationCount is bad; as it would be quite a weird name for a non English speaker - is a name by convention rather than clarity.

public static int BitsSetCount(ulong value);
public static int BitsUnsetCount(ulong value);

Example usage

var hammingDiff = Bits.BitsSetCount(val1 ^ val2);

Double Bits looks a bit weird though

Two classes?

public static class Bits
{
    public bool IsHardwareAccelerated { get; }

    public static int CountSet(ulong value);
    public static int CountSet(long value);
    public static int CountSet(uint value);
    public static int CountSet(int value);

    public static int CountUnset(ulong value);
    public static int CountUnset(long value);
    public static int CountUnset(uint value);
    public static int CountUnset(int value);

    public static int CountLeadingZeros(ulong value);
    public static int CountLeadingZeros(long value);
    public static int CountLeadingZeros(uint value);
    public static int CountLeadingZeros(int value);

    public static int CountTrailingZeroes(ulong value);
    public static int CountTrailingZeroes(long value);
    public static int CountTrailingZeroes(uint value);
    public static int CountTrailingZeroes(int value);

    public static ulong RotateRight(ulong value, int offset);
    public static uint RotateRight(uint value, int offset);
    public static ulong RotateLeft(ulong value, int offset);
    public static uint RotateLeft(uint value, int offset);
}

Usage

var hammingDiff = Bits.CountSet(val1 ^ val2);
public static class BitView
{
    public static bool ReadBit(ulong value, int offset);
    public static bool ReadBit(long value, int offset);
    public static bool ReadBit(uint value, int offset);
    public static bool ReadBit(int value, int offset);

    public static void WriteBit(ref ulong value, int offset, bool toSet);
    public static void WriteBit(ref long value, int offset, bool toSet);
    public static void WriteBit(ref uint value, int offset, bool toSet);
    public static void WriteBit(ref int value, int offset, bool toSet);

    public static byte ReadByte(ulong value, int offset);
    public static byte ReadByte(long value, int offset);
    public static byte ReadByte(uint value, int offset);
    public static byte ReadByte(int value, int offset);

    public static short ReadInt16(ulong value, int offset);
    public static short ReadInt16(long value, int offset);
    public static short ReadInt16(uint value, int offset);
    public static short ReadInt16(int value, int offset);

    public static int ReadInt32(ulong value, int offset);
    public static int ReadInt32(long value, int offset);

    public static void WriteByte(ref ulong value, int offset, byte toSet);
    public static void WriteByte(ref long value, int offset, byte toSet);
    public static void WriteByte(ref uint value, int offset, byte toSet);
    public static void WriteByte(ref int value, int offset, byte toSet);

    public static void WriteInt16(ref ulong value, int offset, short toSet);
    public static void WriteInt16(ref long value, int offset, short toSet);
    public static void WriteInt16(ref uint value, int offset, short toSet);
    public static void WriteInt16(ref int value, int offset, short toSet);

    public static void WriteInt32(ref ulong value, int position, int toSet);
    public static void WriteInt32(ref long value, int position, int toSet);
}

Usage

BitView.WriteInt32(ref value, 2, 7);

"Unsafe" doesn't mean you have to enable unsafe for the compiler. It is
nothing more than a name prefix to signify to a developer that the code you
are calling can do unexpected things if you are not aware what you are
doing. Just as the "Unsafe" class that can be used in any normal safe
context, and like e. g. `DangerousGetHandle' or similar that can be used
from normal safe context.

On Dec 9, 2016 7:58 PM, "Ben Adams" notifications@github.com wrote:

Two classes?

public static class Bits
{
public bool IsHardwareAccelerated { get; }

public static int CountSet(ulong value);
public static int CountSet(long value);
public static int CountSet(uint value);
public static int CountSet(int value);

public static int CountUnset(ulong value);
public static int CountUnset(long value);
public static int CountUnset(uint value);
public static int CountUnset(int value);

public static int CountLeadingZeros(ulong value);
public static int CountLeadingZeros(long value);
public static int CountLeadingZeros(uint value);
public static int CountLeadingZeros(int value);

public static int CountTrailingZeroes(ulong value);
public static int CountTrailingZeroes(long value);
public static int CountTrailingZeroes(uint value);
public static int CountTrailingZeroes(int value);

public static ulong RotateRight(ulong value, int offset);
public static uint RotateRight(uint value, int offset);
public static ulong RotateLeft(ulong value, int offset);
public static uint RotateLeft(uint value, int offset);

}

Usage

var hammingDiff = Bits.CountSet(val1 ^ val2);

public static class BitView
{
public static bool ReadBit(ulong value, int offset);
public static bool ReadBit(long value, int offset);
public static bool ReadBit(uint value, int offset);
public static bool ReadBit(int value, int offset);

public static void WriteBit(ref ulong value, int offset, bool toSet);
public static void WriteBit(ref long value, int offset, bool toSet);
public static void WriteBit(ref uint value, int offset, bool toSet);
public static void WriteBit(ref int value, int offset, bool toSet);

public static byte ReadByte(ulong value, int offset);
public static byte ReadByte(long value, int offset);
public static byte ReadByte(uint value, int offset);
public static byte ReadByte(int value, int offset);

public static short ReadInt16(ulong value, int offset);
public static short ReadInt16(long value, int offset);
public static short ReadInt16(uint value, int offset);
public static short ReadInt16(int value, int offset);

public static int ReadInt32(ulong value, int offset);
public static int ReadInt32(long value, int offset);

public static void WriteByte(ref ulong value, int offset, byte toSet);
public static void WriteByte(ref long value, int offset, byte toSet);
public static void WriteByte(ref uint value, int offset, byte toSet);
public static void WriteByte(ref int value, int offset, byte toSet);

public static void WriteInt16(ref ulong value, int offset, short toSet);
public static void WriteInt16(ref long value, int offset, short toSet);
public static void WriteInt16(ref uint value, int offset, short toSet);
public static void WriteInt16(ref int value, int offset, short toSet);

public static void WriteInt32(ref ulong value, int position, int toSet);
public static void WriteInt32(ref long value, int position, int toSet);

}

Usage

BitView.WriteInt32(ref value, 2, 7);


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dotnet/corefx/issues/12425#issuecomment-266092754,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKTG73UGfQnLzGbf6B9sgYGnXrmm6Xn4ks5rGaTXgaJpZM4KQTp-
.

@benaadams, I see your point about splitting these things up and that could be interesting. That said, I think ref is kind of annoying to use and I'd prefer these methods were simple actions on values. If possible maybe we can come up with a name that somehow conveys that it takes a value and produces a new one with some modified bits.
What do you think of

[Flags]
public enum BitOps
{
      None = 0,
      RotateRight = 1,
      RotateLeft = 1,
      PopCount = 2,
      CountLeadingZeroes = 4
      CountTrailingZeroes = 8
}

public static class Bits
{
       public BitOps HardwareAcceleratedOperations {get;}
       ...
}

So old versions of the desktop would report None, current versions of CoreCLR would report RotateRight & RotateLeft, and then the other operations can be added based on your hardware & if a runtime will support it.

Btw, If PopCount really is just disliked, I'd go for CountSetBits

@mburbea need to write out the usage to get a feel, so would be

var hammingDiff = Bits.CountSetBits(val1 ^ val2);

Re acceleration it would be

if (Bits.HardwareAcceleratedOperations & BitOps.RotateLeft != BitOps.RotateLeft)
{
  // ...
}
else
{
  //
}

But the important factor is whether the Jit would eliminate the test as a constant expression and remove one of the branches?

Just a thought are Set/Unset not a bit confusing, especially given Set has many meanings. Would One/Zero not be better? E.g. CountOneBits, CountZeroBits. Well, at least the verb is first, that I like 👍

There are already two terms in use e.g. CountLeadingZeros uses Zero then there is Set this should be consolidated on one word usage e.g. Zero, One.

Regarding hardware accelerated, would BitOps.RotateLeft then mean that all variants of input types would be accelerated or? Wouldn't there be cases where e.g. int would be hardware accelerated, but long would not? ... sorry we always end up with this coarse grain vs fine grain discussion. It would be good if there was some we could capture method signature, and ask about the acceleration of that like @CarolEidt said...

Wouldn't there be cases where e.g. int would be hardware accelerated, but long would not?

public BitOps HardwareAcceleratedOperations<T> where T : struct {get;} 

😜

Ha yeah, so many good options 😆 I like BitOps for the static class name though 😉

@nietras

I like BitOps for the static class name though

I personally am not so sure about BitOps; ops is not a full word.

@jamesqo it has precedence though, but I guess we need some input from FXDC regarding naming, otherwise I think we might just keep repeating our different suggestions. I don't like Bits it is too vague and too general, its short, that is good. But .NET is not about brevity but rather readability... I think the best we can do is present some options for review. Doesn't look like we will be able to agree 😉

I do think we should try to get naming consistent though, having both Set/Unset and One/Zero in same API is not consistent. Perhaps suggest a choice between the two.

Tried to look at other precedence in .NET. All I could find easily was BitArray. It uses Get/ Set for getting/setting a bit in the array e.g. public void Set( int index, bool value) link. I think it would be confusing to have Set both mean that a bit is set i.e. 1 and having a method that sets a certain bit to 1 or 0. Set is overloaded.

But this perhaps shows that Read/Write is not necessarily the best choice... if BitArray is to be mimicked.

@mellinoe are we ready for API review with this issue? I see lots of discussion after you marked it as such ... Do we need more time to flush out the API design?

No, I don't think so.

We think it's a good idea to have a type like this, but it's a larger area so we're hesitant to approve it (for example, we'd like to understand if some of the methods should become intrinsics, what other higher-level types would benefit from it, such as BitArray etc). At the same time, we don't want to block this area from making progress by not approving it.

So it might be better to develop this idea in conjunction with an implementation in CoreFxLab.

@KrzysztofCwalina?

I've updated the spec and added all my open questions for discussion points:
@terrajobst , @mellinoe , @jamesqo , @benaadams, @nietras Any other points you'd like to discuss?

@mburbea

  • The idea of having an enum to determine hardware accelerated options is nice. It removes the ambiguity from having a single flag, and is also flexible because it will permit adding more hardware accelerated methods in the future. :+1: (I think BitOperations or BitMethods might be a better name though.

Unfortunately, the enum based approach does raise the question if the jit could do branch elimination on a comparison like the following if it could replace AcceleratedOperations as a runtime constant.

I think that any scalar type is jitted to a constant. https://github.com/dotnet/coreclr/issues/1079#issuecomment-108635719

PopCount/HammingWeight/ CountSetBits - We can't decide on this name. I personally like PopCount as it is a well known name for the algorithm. However, for someone who does not CPU intrinsics or bit tricks this name might mean nothing to you. the .net api is split where common or well known algorithms are simply called that (e.g. Rijndael) and sometimes descriptive for new users. I think that this class in general is fairly low-level so even a novice should be expected to do a quick google search in the subject area.

I am OK with PopCount and CountSetBits. I see what the ambiguity with CountBits is now ("shouldn't that always be 32?"). I would prefer CountSetBits because the rest of the methods on this class start with verbs.

Once we have the ability to allocate spans on the stackfe context, it would be great if the bit manipulation routines worked with spans.
For example,
```c#
var lotsOfBits= stackalloc new Span(20);
Bits.Set(lotsOfBits, 120);
Assert.True(Bits.IsSet(lotsOfBits, 120);

And more generally, it would be great if it was possible to create a custom struct larger than 64 bits and then use it with the Bits APIs.
```c#
struct Bits128 : IBits 
{ 
    uint bits0to63; 
    uint bits64to127;
}
public static class Bits {
    public void Set<T>(T bits, int index) where T:IBits { ... }
}
Bits128 bits = new Bits128();
Bits.Set(bits, 120);

Blocked on CoreFxLab implementation - https://github.com/dotnet/corefx/issues/12425#issuecomment-267139670

@karelz, Should I create an issue or a PR in CoreFxLab?

@KrzysztofCwalina, this could be nice to have something like CountLeading/Trailing Zeroes on Span<byte>, but currently its hard to get ideal performance without better intrinsics and compiler support. Even something like Kestrel's seek isn't much of an improvement over simply using longs. I think adding overloads that work on a Span<byte> is probably good enough.

Yes please, create a PR/issue in CoreFxLab.

Intel hardware intrinsic API proposal has been opened at dotnet/corefx#22940

This does not seem blocked on "ability to allocate spans on the stack" anymore.

@mburbea, I'd like to push this through the API review process, but would appreciate if you could make a few changes to the OP first.

Specifically, if you could organize/reformatt it to be more readable/reviewable. The API Review Process doc gives an example of a good layout here: https://github.com/dotnet/corefx/issues/271

Some thoughts of my own:

  • I personally like BitOps or BitOperations the best

    • I think it fits and would allow us to have a single class

  • PopCount

    • I think people using this will be generally familiar with it

    • x86 calls it PopCount

    • ARM exposes a vcnt instruction, not sure what their intrinsic will be named

    • If we don't do PopCount, I think CountSetBits or CountOneBits makes sense (CountOnes would match the naming convention for CountLeadingZeros)

  • I don't think we need a IsAccelerated check for these

    • but if we do have one, it needs to be a check against an enum or something similar since some may be HWAccelerated, while others are not

  • WriteBit probably needs more thought

    • We need some operation for just set this bit, I think WriteBit works for this

    • We should probably have an explicit operation for flipping a bit (maybe FlipBit or NegateBit)

    • We probably should have some method for something that sets, resets, or complements the bit and returns the original value

    • Maybe bool ExchangeBit(ref ulong value, int offset, bool toSet)

    • This makes it easier to do BTS, BTR, and BTC

  • Just a note: Extract and Insert, rather than Read/Write, are the names being used by the hardware intrinsics APIs (at least for the non-bit operations)

I am really glad this issue is moving forward 😄

Just a note: Extract and Insert, rather than Read/Write

IMO it is better for classess which are high/higher level to use standard .NET naming. In this case Write/Read.

I generally agree with @tannergooding on this:

  • My preferences is for BitOps as a namespace; there is precedent for the Ops abbreviation.
  • I think PopCount is generally familiar, but otherwise I think CountOnes would be good. If Bit is in the namespace name, then it should be clear we're talking about bits.
  • I think that for these, they shouldn't have an IsAccelerated (yes, I've changed my view on that), but the implementation can check for the various HW intrinsics and use the accelerated forms if available.
  • I prefer Extract and Insert for bits, rather than Read and Write because bits are not actual types that one can read and write.

@mburbea - any thoughts regarding the suggested changes?

@CarolEidt, their activity log doesn't show them to be very active. I may just create a new issue to track this if it doesn't get a response soon.

@tannergooding , my current role doesn't allow me to participate too much on github :( If you'd like to create a new issue on this, by all means. I think my initial API & PoC are pretty close, but obviously deficient.

I mostly agree with your points but I do feel that an enum for acceleration check would be a good idea though, software workarounds could be brutal performance hit.

I am happy to take this one, but I propose an incremental approach that starts with a simpler API and takes advantage of intrinsics but is software emulated. In later PR, we can add more functions and specific hardware optimizations:

public static partial class BitOps // .Primitive
{
       /// <summary>
       /// Reads whether the specified bit in a mask is set.
       /// </summary>
       /// <param name="value">The bit mask.</param>
       /// <param name="offset">The ordinal position of the bit to read.</param>
       public static bool ExtractBit(in byte value, in byte offset);
       public static bool ExtractBit(in ushort value, in byte offset);
       public static bool ExtractBit(in uint value, in byte offset);
       public static bool ExtractBit(in ulong value, in byte offset);

       /// <summary>
       /// Sets the specified bit in a mask and returns whether it was originally set,
       /// like BTS/BTR.
       /// </summary>
       /// <param name="value">The bit mask.</param>
       /// <param name="offset">The ordinal position of the bit to write.</param>
       /// <param name="on">True to set the bit to 1, or false to set it to 0.</param>
       public static bool InsertBit(ref uint value, in byte offset, in bool on);
       public static bool InsertBit(ref int value, in byte offset, in bool on);
       public static bool InsertBit(ref ulong value, in byte offset, in bool on);
       public static bool InsertBit(ref long value, in byte offset, in bool on);

       /// <summary>
       /// Negates the specified bit in a mask and returns whether it was originally set,
       /// like BTC.
       /// </summary>
       /// <param name="value">The bit mask.</param>
       /// <param name="offset">The ordinal position of the bit to flip.</param>
       public static bool FlipBit(ref byte value, in byte offset);
       public static bool FlipBit(ref ushort value, in byte offset);
       public static bool FlipBit(ref uint value, in byte offset);
       public static bool FlipBit(ref ulong value, in byte offset);

       /// <summary>
       /// Rotates the specified mask left by the specified number of bits.
       /// </summary>
       /// <param name="value">The value to rotate.</param>
       /// <param name="offset">The number of bits to rotate by.</param>
       /// <returns>The rotated value.</returns>
       public static byte RotateLeft(in byte value, in byte offset);
       public static byte RotateRight(in byte value, in byte offset);
       public static ushort RotateLeft(in ushort value, in byte offset);
       public static ushort RotateRight(in ushort value, in byte offset);

       // Takes advantage of existing intrinsics (https://github.com/dotnet/coreclr/pull/1830)
       public static uint RotateLeft(in uint value, in byte offset);
       public static uint RotateRight(in uint value, in byte offset);
       public static ulong RotateLeft(in ulong value, in byte offset);
       public static ulong RotateRight(in ulong value, in byte offset);

       /// <summary>
       /// Returns the population count (number of bits set) in a mask.
       /// </summary>
       /// <param name="value">The bit mask.</param>
       public static int PopCount(in byte value);
       public static int PopCount(in ushort value);
       public static int PopCount(in uint value);
       public static int PopCount(in ulong value);

       /// <summary>
       /// Count the number of leading bits in a mask.
       /// </summary>
       /// <param name="value">The mask.</param>
       /// <param name="on">True to count each 1, or false to count each 0.</param>
       public static int LeadingCount(in byte value, in bool on);
       public static int LeadingCount(in ushort value, in bool on);
       public static int LeadingCount(in uint value, in bool on);
       public static int LeadingCount(in ulong value, in bool on);

       /// <summary>
       /// Count the number of trailing bits in a mask.
       /// </summary>
       /// <param name="value">The mask.</param>
       /// <param name="on">True to count each 1, or false to count each 0.</param>
       public static int TrailingCount(in byte value, in bool on);
       public static int TrailingCount(in ushort value, in bool on);
       public static int TrailingCount(in uint value, in bool on);
       public static int TrailingCount(in ulong value, in bool on);

       // In a separate PR
       // Read/Write:byte/ushort/uint
       // Span<byte> overloads
       // FloorLog2
       // Other...
}

I will create a formal API proposal if there's any interest in doing this.
PoC code here

PopCount, LeadingOnes, LeadingZeroes, TrailingOnes, TrailingZeroes, etc could use a Span interface. In that way the API would add something not readily available on intrinsics interface

I am keeping this design minimal so that we can get the basic functionality out the door. Span overloads are a good idea and (per comments in code) planned for the next iteration.

@grant-d, creating a formal API proposal would be great. It slipped off my radar a while back and I never got around to it.

It would likely be useful to incorporate the feedback @CarolEidt and I gave here (and a couple posts down): https://github.com/dotnet/corefx/issues/12425#issuecomment-356079499

I think the primary feedback was: BitOps for the class name and Insert/Extract rather than Write/Read.

@redknightlois, the point of these APIs is to provide a general-purpose API that works on all platforms (which means providing a software fallback) and is generally-usable.

Hardware Intrinsics are for performance oriented scenarios where you require hardware acceleration and need more direct control of the code that is emitted.

Why would Span overloads prevent getting this "out of the door"? I think at absolute minimum we should design APIs as we want them long term, review the full design, and then possibly implement in stages.

I honestly don't see the use-case for the Span APIs here. Could someone elaborate on why you would want them?

I will create a formal PR. In the meantime, I have updated the class above with @tannergooding / @CarolEidt feedback (and had already incorporated previous feedback re: BTR/BTS/BTC).

@tannergooding For example, whenever you want to use Popcount is for data structures which are compressed in bitstreams (the same apply for LeadingOnes, TrailingZeroes and the like) which means that you are not doing it on a long, but on a variable size byte stream.

Furthermore, optimized Popcount is not as simple as apply popcount and sum the results, in fact depending on the architecture different implementations may be exercised for those operations. Depending on the bitstream size it would make sense to prime the L2 cache, etc... If size is big enough, the Avx2 based Harley Seal algorithm have 50% memory throughput than using the built-in version because of higher instruction parallelism, etc...

So, it makes sense that all of that should be abstracted away given that those primitives are quintessential to high performance compressed data structures and many other performance sensitive operations. Not including the batch versions while easier could counter the actual gains that could be obtained in code simplification on its intended usage scenario.

Furthermore, optimized Popcount is not as simple as apply popcount and sum the results, in fact depending on the architecture different implementations may be exercised for those operations.

This seems like a case of complicating the implementation/exposed APIs for something that may be very use-case specific and which may be better suited for exposure elsewhere.

How does it complicate APIs? I do agree it's more complex implementation, but we don't split logical overloads into separate classes/features just because implementation of some of the overloads is more tricky.

How does it complicate APIs?

Because, PopCount(uint) is a simple, general-purpose API that everyone can use and which is generally useful. It can also be used to build up more complex implementations. (and for which we have known use-cases today, even in the framework/runtime).

PopCount(Span<uint>) is a more complex API that, for the naive implementation, is just a for loop and sum (which is trivial to implement), but for an "optimized" implementation may have a lot of fine-tuning for your exact target/scenario.

I would think that PopCount(Span<uint>) is something better suited to a 3rd party library, or which should be reviewed separately if enough evidence indicating it was needed came up. The way I see it is that you are basically talking about the difference between supporting primitive types and adding APIs which support the concept of a BitVector type.

Because, PopCount(uint) is a simple, general-purpose API that everyone can use and which is generally useful. It can also be used to build up more complex implementations. (and for which we have known use-cases today, even in the framework/runtime).

Indeed PopCount is a simple primitive that just happen to have a single instruction to act on registers on harware, but the operation is defined over every bitstream of data. Say for simplication a Span<byte> to avoid having to introduce an entirely unrelated concept like BitVector.

PopCount(Span<uint>) is a more complex API that, for the naive implementation, is just a for loop and sum (which is trivial to implement), but for an "optimized" implementation may have a lot of fine-tuning for your exact target/scenario.

That's exactly what a pit of success looks like IMHO. A sensible API should give you the latter by default and not expect the user to have to deal with that, until there is no other choice.

That's exactly what a pit of success looks like IMHO. A sensible API should give you the latter by default and not expect the user to have to deal with that, until there is no other choice.

Right. But is PopCount(Span<uint>) (and similar overloads for the other BitOps methods) a common enough use-case that it warrants being part of the BCL?

For PopCount(uint) (and most of the other methods listed above), we already know they are useful. Not only have we had a number of requests for this, but we need it ourselves for various parts of the framework/runtime. However, it is still a "niche" scenario and generally targets people wanting or needing to write performance oriented code.

PopCount(Span<>) is likely even more esoteric; potentially to the point where implementing it as part of the BCL is not worthwhile.

There are many algorithms which can benefit from PopCount, but, for most inputs; it may be more desirable to intersperse the PopCount with other instructions (for better throughput, overall) or to just do a trivial for loop (because the dataset is small enough).

So, personally, I believe PopCount(Span<>) falls into the category of: "I want to see more data to support this first"

Agree with @tannergooding. I don't deny that there might be a need for Span et al, but my aim is to create a highly scoped feature set (just the primitives) in order to get it out of limbo. We can create an independent issue to track more advanced scenarios, I am more than happy to help on that too.
(I do agree there's a risk in that approach in that a staggered api might be inconsistent, we'll need diligence)

Where I come from popcount over higher than 64 bits is hardly esoteric, it is the base operation. I don't care about PopCount anymore now that I can have the intrinsic because I built the Span one myself (with AVX2). But, I know the pitfalls, I am being paid to know exactly that. Most of that knowledge is akin to dark arts in most development circles, having a single operation aka PopCount(Span<byte>) makes yet another usual pitfall not matter for the 99%.

So, personally, I believe PopCount(Span<>) falls into the category of: "I want to see more data to support this first"

I usually tend to agree, but having seen PopCount, TrailingZeroes, etc routines crop up in several places with absurdly suboptimal implementations over the years just because 'we want to see more data' makes me a bit hesitant on this front.

EDIT: And the problem is that the for and sum is one of those cases. There is 60% difference between the two in GB/sec.

Case in point: https://github.com/dotnet/corefx/issues/2209#issuecomment-139636114

@redknightlois trying to understand the requirement: would RotateLeft(Span<byte> span, int n) rotate just the specified byte or would it rotate the whole array by n bits?

Rotate is too implementation dependent, I personally haven't seen yet any use for that one on bitstreams though. I am talking about PopCount, Trailing\Leading|Ones\Zeroes which are typical operations. We could consider say Extract\Flip\Insert because they are easy to implement but they don't have obvious pitfalls that I have seen.

EDIT: If say RotateLeft is implemented on a Span (I probably would not implement it) the whole Span should be considered an atom, so rotate should rotate the whole thing left to be consistent.

I have to agree with the position that having a Span<uint> overload makes a lot of sense. I would wager that the number of places where a simple uint version would be used without it being a building block for a larger dense bitvector would be a very small percentage. Perhaps I'm looking at it too simplistically, but I can't see that it would be overly complex to support.

Here's a stab at the Span<T> signatures, should we decide it is part of the initial design.

    public static partial class BitOps // .Span
    {
        bool ExtractBit(ReadOnlySpan<byte> value, in uint offset);
        bool ExtractBit(ReadOnlySpan<uint> value, in uint offset);
        bool ExtractBit(ReadOnlySpan<ulong> value, in uint offset);
        bool ExtractBit(ReadOnlySpan<ushort> value, in uint offset);

        bool FlipBit(Span<byte> value, in uint offset);
        bool FlipBit(Span<uint> value, in uint offset);
        bool FlipBit(Span<ulong> value, in uint offset);
        bool FlipBit(Span<ushort> value, in uint offset);

        bool InsertBit(Span<byte> value, in uint offset, in bool on);
        bool InsertBit(Span<uint> value, in uint offset, in bool on);
        bool InsertBit(Span<ulong> value, in uint offset, in bool on);
        bool InsertBit(Span<ushort> value, in uint offset, in bool on);

        long PopCount(ReadOnlySpan<byte> value);
        long PopCount(ReadOnlySpan<uint> value);
        long PopCount(ReadOnlySpan<ulong> value);
        long PopCount(ReadOnlySpan<ushort> value);

        long LeadingCount(ReadOnlySpan<byte> value, in bool on);
        long LeadingCount(ReadOnlySpan<uint> value, in bool on);
        long LeadingCount(ReadOnlySpan<ulong> value, in bool on);
        long LeadingCount(ReadOnlySpan<ushort> value, in bool on);

        long TrailingCount(ReadOnlySpan<byte> value, in bool on);
        long TrailingCount(ReadOnlySpan<uint> value, in bool on);
        long TrailingCount(ReadOnlySpan<ulong> value, in bool on);
        long TrailingCount(ReadOnlySpan<ushort> value, in bool on);
    }

You shouldn't need in or ref for any of the Span<T> signatures.

I like the self-documenting nature of them, even if in the case of Span they are redundant. I have removed them for consistency's sake.

Was this page helpful?
0 / 5 - 0 ratings