bitset<N>::count()
https://github.com/microsoft/STL/blob/8e8453c67723cc28a2773fbb8eacd5821f059386/stl/inc/bitset#L341-L366
should use the same compiler magic: as <bit> popcount(x) when it is available:
https://github.com/microsoft/STL/blob/8e8453c67723cc28a2773fbb8eacd5821f059386/stl/inc/bit#L143-L150
Blocked by the same as #313 .
Disabled here:
Also tracked by DevCom-967643 and Microsoft-internal VSO-1090007 / AB#1090007.
You can directly reference the relevant code here in the issue. That helps a lot as one does not have to search for the code first
updated
DevCom-967643 - independently reported.
Well, I can imagine that compiler can optimize std::bitset<N>::count() without builtin, by just figuring out what this loop does :-) (After all, swapping bytes of a dword by shifts gets replaced by bswap)
This is dependent on a rework of std::popcount :D.
Now that we have is_constant_evaluated I'm working on using that instead of the builtins (which are hard to implement correctly in MSVC). Once that happens then maybe bitset can use it. Although there could be problems since the constexpr versions of std::popcount will only work for uint32 and uint64 (we could probably figure out how to make them work for other numbers of bits, but it would be gnarly meta-programming)
which are hard to implement correctly in MSVC
I think builtin would be better here. I imagine that by seeing __builtin_popcount in a loop could emit vectored popcount for /arch:AVX512.
And if runtime CPU detection is problematic, then can __builtin_popcount depend on /arch option?
Although there could be problems since the constexpr versions of std::popcount will only work for uint32 and uint64 (we could probably figure out how to make them work for other numbers of bits, but it would be gnarly meta-programming)
Fortunately, uint32_t and uint64_t are all you need:
https://github.com/microsoft/STL/blob/43e07f3fad934b710d2de21ffdcf4cd72267c826/stl/inc/bitset#L31
which are hard to implement correctly in MSVC
I think builtin would be better here. I imagine that by seeing
__builtin_popcountin a loop could emit vectoredpopcountfor/arch:AVX512.
And if runtime CPU detection is problematic, then can__builtin_popcountdepend on/archoption?
I suppose one could implement vectored instructions, but the folks working on the compiler have better things to do (optimizations that would help more people).
__builtin_popcount and friends have behavior that is not implemented by any other intrinsic my msvc_bit_wip branch has my progress and I hope to make that into a pull request this week.
Most helpful comment
I suppose one could implement vectored instructions, but the folks working on the compiler have better things to do (optimizations that would help more people).
__builtin_popcount and friends have behavior that is not implemented by any other intrinsic my msvc_bit_wip branch has my progress and I hope to make that into a pull request this week.