Dealii: Don't use (un)signed char for integer data types

Created on 12 Feb 2018  路  30Comments  路  Source: dealii/dealii

We have a number of places where we use (un)signed char to denote 8-bit integer data types. This is the "traditional" way of doing things in C, but it has the disadvantage that it's awkward to print these objects -- unless the integral value of a variable happens to be in a certain range, the character you want to output just doesn't show on screen, or does something even less obvious (such as \n).

Rather, C++11 now offers std::uint8_t, which we can use to both denote the type and have it printed correctly to screen. We should really only be using char when we mean "an element of a string of text".

Enhancement

Most helpful comment

Unless I have missed an option that leaves us with the following (please edit this post if I missed something). Lets just vote and be done with it.

  1. ignore improbable portability issues (update: uint8_t is required by POSIX) and just use std::uint8_t (vote :+1:)
  2. go with std::byte (vote :laughing:)
  3. use our own custom type (vote :tada:)

All 30 comments

One can also debate if it is really necessary to deviate from (un)signed ints. I think the memory savings are often overrated.

One can also debate if it is really necessary to deviate from (un)signed ints. I think the memory savings are often overrated.

It depends on what you do. If you're memory limited, using as narrower an integer type as possible should definitely be high on the priority list. For example, certain (tuned) sparse matrix formats/implementations do that. Even though it's usually not the memory consumption but the memory transfer that makes people choose narrow types. And for general purpose code where the bottlenecks are likely somewhere else, as is the case with our way of using boundary indicators and similar things, I'd definitely confirm your point of view.

The previous comment was written by be and not Katharina Kormann - sorry for the confusion. We worked together on that computer on the other PR yesterday but today I forgot to switch back to my user... :smile:

It depends on what you do.

Of course, I agree. I would use ints for boundary ids but not for sparse matrix formats for example like you said.

Annoyingly, std::uint8_t is just a typedef to unsigned char and consequently the stupid printing rules still apply -- it is not printed as an integer, but as a character (with the effect that most values don't actually print to anything useful)...

https://stackoverflow.com/questions/19562103/uint8-t-cant-be-printed-with-cout

Annoyingly, std::uint8_t is just a typedef to unsigned char and consequently the stupid printing rules still apply.

That's a pity. But I think we should still go ahead with the more expressive types in #5894 and #5896, if nobody objects.

That's a pity. But I think we should still go ahead with the more
expressive types in #5894 https://github.com/dealii/dealii/pull/5894
and #5896 https://github.com/dealii/dealii/pull/5896, if nobody objects.

I agree. The name conveys semantics, not only syntax, and we like to
generally do that in deal.II.

Being late to the party, I realized that std::uint8_t is not required to exist. I am not sure if there exist setups suitable to compile deal.II for which actually
don't define an 8bit type. Nevertheless, I propose to err on the safe side and use required types such as std::uint_fast8_t (or std::uint_least8_t) instead.

Hm, in all of the cases I've looked at, we choose unsigned char because we want it to be exactly one byte. C++17 has std::byte, but C++11 doesn't, so we can't use that yet. I think we probably would want to use std::uint_least8_t if that is necessary, though I really can't imagine a compiler that doesn't provide std::uint8_t.

My preference would be, in this order:

  • Ignore the issue -- I can't imagine it being a problem on any compiler we may want to use.
  • Proceed with the patches already posted and merged, and in a second step change every occurrence of the type to types::uint8_t, which we're going to typedef to std::uint_least8_t. Or maybe call it types::byte. (I just find the name uint_least8_t bulky.)

What do others think?

I find std::uint_least8_t to be confusing in an interface and I would prefer char or our own types::byte.

We could do std_cxx17::byte (and provide our own implementation if necessary), but since thats not a printable type it doesn't really help with the original issue.

That would be another reason to switch to ints in all places where performance/memory doesn't matter much.

So what do we do? I don't like char because it really has nothing to do with strings. I could be talked into using std_cxx17::byte and converting all of the places I've converted in previous patches.

Just as @drwells already said, we can"t use std::byte directly for printing. We can of course just define a suitable overload for operator<< (calling to_int).

What about something like (https://godbolt.org/g/G9vVaG)
```#include

ifdef DEAL_II_WITH_CXX17

include

else

include

endif

namespace dealii
{
namespace types
{
struct ubyte
{

ifdef DEAL_II_WITH_CXX17

  byte(const unsigned char b_)
  : b(static_cast<std::byte>(b_))
  {}

  std::byte b;

else

  std::uint_least8_t b;

endif

};

}
}

namespace std
{
ostream& operator<<(ostream& os, const dealii::types::ubyte& dt)
{
os << static_cast(dt.b);
return os;
}
}
```

This would work, though I'd like it to (i) be just types::byte because the way C++17 defines std::byte is as a type that is not arithmetic, i.e., is neither signed nor unsigned (see http://en.cppreference.com/w/cpp/types/byte). It's just a collection of bits that you can dwiddle; (ii) C++17 defines the type as follows:

  enum class byte : unsigned char {} ;

Could we do the same?

(Or does the implementation using the base unsigned char for the enum require C++17 semantics for conversion to/from integers?)

This works as well

#include <iostream>
#ifdef DEAL_II_WITH_CXX17
#  include <cstddef>
#else
#  include <cstdint>
#endif

namespace dealii
{
  namespace types
  {
#ifdef DEAL_II_WITH_CXX17
    enum class byte : unsigned char
    {};
#else
    enum class byte : std::uint_least8_t b
    {};
#endif
  }
}

namespace std
{
  ostream& operator<<(ostream& os, const dealii::types::byte& dt)
  {
    os << static_cast<unsigned char>(dt);
    return os;
  }
}

int main()
{
  dealii::types::byte b {3};
  dealii::types::byte c {4};  
  std::cout << b << c << b;

  return 0;
}

Does this look reasonable to you?

Almost: I think we should use

#ifdef DEAL_II_WITH_CXX17
    using std::byte;   // std::byte is available, so just use it
#else
    enum class byte : unsigned char  // just build on something we know for sure is 8 bits
    {};
#endif

ifdef DEAL_II_WITH_CXX17 using std::byte; // std::byte is available, so just use it #else enum class byte : unsigned char // just build on something we know for sure is 8 bits {}; #endif

In this way, we would define operator<< for std::ostream and std::byte. This might conflict with an implementation that already has this overload. Since the standard doesn't define it, this is probably fine though.

I am nervous about adding such a global overload: it seems very likely that another library will ultimately add this and we will have strange linking problems.

This is getting complicated. Are we sure that we will ever run into a system that really doesn't have this type? It seems to me that we're trying to do things that will turn out to not be necessary, and that make the code more complicated than necessary.

To summarize, it sounds like we want:

  1. clear semantics: we should not use char for things that are not elements of strings
  2. sizeof(T) == 1
  3. a printable type

We can get the first two by using std::byte or std_cxx17::byte. We already have custom printing code for CellID so I don't think dealing with 3 is a big deal and we do not need a global printing overload.

We could also roll our own (which seems crazy; I am surprised C and C++ do not have a portable 1 byte integral type), something like

namespace types {struct uint8;}

std::ostream &operator<<(std::ostream &os, const types::uint8 c);

namespace types
{
  struct uint8
  {
    uint8(const int b) : b(static_cast<unsigned char>(b)) {}
    uint8(const unsigned char b) : b(b) {}
    uint8 &operator=(int c) {b = c; return *this;}

  private:
    friend std::ostream &::operator<<(std::ostream &os, const types::uint8 c);
    unsigned char b;
  };
}

std::ostream &operator<<(std::ostream &os, const types::uint8 c)
{
  os << int(c.b);
  return os;
}

for internal use.

I think we should just go with std_cxx17::byte and deal with printing as needed without any sort of global overload.

Unless I have missed an option that leaves us with the following (please edit this post if I missed something). Lets just vote and be done with it.

  1. ignore improbable portability issues (update: uint8_t is required by POSIX) and just use std::uint8_t (vote :+1:)
  2. go with std::byte (vote :laughing:)
  3. use our own custom type (vote :tada:)

This is getting complicated. Are we sure that we will ever run into a system that really doesn't have this type? It seems to me that we're trying to do things that will turn out to not be necessary, and that make the code more complicated than necessary.

I agree. This is way to much cognitive load for such a simple interface. I would have to spend 20 minutes to explain this to future students. Let's either leave everything as is or switch to unsigned int as the type.

I looked more carefully atuint8_t. This is required by POSIX, so portability is not a concern here.

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/stdint.h.html

Windows has not always been POSIX compliant. But I'm confident that any other system is compliant. So the question is: Does MS VC have std::uint8_t?

Does MS VC have std::uint8_t?

I can compile a current version (that uses already std::uint8_t at some places) using VS2017.

Let's then just take std::int8_t and tell users that they have to cast the variable for printing.

In the majority of cases, printing is not actually relevant. It's just about a compressed storage format. The only places currently relevant are the refinement flags, which I suspect few people will actually want to print.

Was this page helpful?
0 / 5 - 0 ratings