Stl: <bitset>: count() should use the same approach as std::popcount

Created on 1 Apr 2020  路  7Comments  路  Source: microsoft/STL

bitset<N>::count()

https://github.com/microsoft/STL/blob/8e8453c67723cc28a2773fbb8eacd5821f059386/stl/inc/bitset#L341-L366

should use the same compiler magic: as <bit> popcount(x) when it is available:

https://github.com/microsoft/STL/blob/8e8453c67723cc28a2773fbb8eacd5821f059386/stl/inc/bit#L143-L150

Blocked by the same as #313 .

Disabled here:

https://github.com/microsoft/STL/blob/8e8453c67723cc28a2773fbb8eacd5821f059386/stl/inc/yvals_core.h#L1091-L1096

Also tracked by DevCom-967643 and Microsoft-internal VSO-1090007 / AB#1090007.

performance

Most helpful comment

which are hard to implement correctly in MSVC

I think builtin would be better here. I imagine that by seeing __builtin_popcount in a loop could emit vectored popcount for /arch:AVX512.
And if runtime CPU detection is problematic, then can __builtin_popcount depend on /arch option?

I suppose one could implement vectored instructions, but the folks working on the compiler have better things to do (optimizations that would help more people).

__builtin_popcount and friends have behavior that is not implemented by any other intrinsic my msvc_bit_wip branch has my progress and I hope to make that into a pull request this week.

All 7 comments

You can directly reference the relevant code here in the issue. That helps a lot as one does not have to search for the code first

updated

DevCom-967643 - independently reported.
Well, I can imagine that compiler can optimize std::bitset<N>::count() without builtin, by just figuring out what this loop does :-) (After all, swapping bytes of a dword by shifts gets replaced by bswap)

This is dependent on a rework of std::popcount :D.

Now that we have is_constant_evaluated I'm working on using that instead of the builtins (which are hard to implement correctly in MSVC). Once that happens then maybe bitset can use it. Although there could be problems since the constexpr versions of std::popcount will only work for uint32 and uint64 (we could probably figure out how to make them work for other numbers of bits, but it would be gnarly meta-programming)

which are hard to implement correctly in MSVC

I think builtin would be better here. I imagine that by seeing __builtin_popcount in a loop could emit vectored popcount for /arch:AVX512.
And if runtime CPU detection is problematic, then can __builtin_popcount depend on /arch option?

Although there could be problems since the constexpr versions of std::popcount will only work for uint32 and uint64 (we could probably figure out how to make them work for other numbers of bits, but it would be gnarly meta-programming)

Fortunately, uint32_t and uint64_t are all you need:

https://github.com/microsoft/STL/blob/43e07f3fad934b710d2de21ffdcf4cd72267c826/stl/inc/bitset#L31

which are hard to implement correctly in MSVC

I think builtin would be better here. I imagine that by seeing __builtin_popcount in a loop could emit vectored popcount for /arch:AVX512.
And if runtime CPU detection is problematic, then can __builtin_popcount depend on /arch option?

I suppose one could implement vectored instructions, but the folks working on the compiler have better things to do (optimizations that would help more people).

__builtin_popcount and friends have behavior that is not implemented by any other intrinsic my msvc_bit_wip branch has my progress and I hope to make that into a pull request this week.

Was this page helpful?
0 / 5 - 0 ratings