Pcl: Rationale behind int and not std::size_t

Created on 19 Sep 2019 · 11Comments · Source: PointCloudLibrary/pcl

Is there a reason for using int and not std::size_t everywhere to represent size, including size of stdlib containers?

EDIT:
I've noticed an inconsistency. Several places use std::size_t, others do not.

Expected Behavior

const auto index = container.size();

for(std::size_t i = 0; i < index; ++i)
// OR
for(decltype(index) i = 0; i < index; ++i)

Current Behavior

const int index = static_cast<int>(container.size());

for(int i = 0; i < index; ++i)

Possible Solution

Use std::size_t instead of int for indices

Using std::size_t is recommended for portability
It saves a cast (which is not a no-op on 64-bit machines) which can induce bugs (see last point)
It needs less code
Allows larger indices to work (18446744073709551615 vs 2147483647)
Prevents negative indices (negative indices can be a feature actually, see Python, but in the code I've seen, negative indices aren't considered), since the cast can potentially create negative indices

stale

Source

kunaltyagi

All 11 comments

Speaking about point indices, I assume that int was chosen because it takes less space, yet supports point clouds over 2 billion points in size (which probably felt sufficient to the original authors). However, I don't see any good reason why they chose int over unsigned int. As you have mentioned, negative indices aren't used in PCL.

Speaking of container size, could you post a link to some examples of what you mean?

taketwo on 19 Sep 2019

I'm really convinced it is/was just a byproduct of code review absence.

SergioRAgostinho on 19 Sep 2019

😄1

Speaking of container size, could you post a link to some examples of what you mean?

Take Vertex for example. It has std::vector<uint32_t> vertices and lots of places use int num_vertices = static_cast<int>(vertex.vertices)

kunaltyagi on 19 Sep 2019

Ah, OK. Yes, I fully agree with Sergio's opinion.

taketwo on 19 Sep 2019

I remember if not careful using size_t will cause an infinite loop when looping backward. int also takes less space as well.

Also if you read the google coding style here

We use int very often, for integers we know are not going to be too big, e.g., loop counters. Use plain old int for such things. You should assume that an int is at least 32 bits, but don't assume that it has more than 32 bits.

jasjuang on 20 Sep 2019

if not careful using size_t will cause an infinite loop when looping backward.

Good point. I think this issue comes while looping forwards too (while using unsigned for eg.). If we agree on converting int to something larger, maybe creating special values such as pcl::equality_limits<uint>::min or PCL_UINT_MIN_EQUAL might be sensible.

we know are not going to be too big

Sadly, I hit the limit of int while concatenating point clouds. I had my pipeline wrong (no duplicate removal or voxel filtering) but I believe I'll hit the limits as more data comes in. 60 GB of points isn't an unfathomable data anymore for me. I personally prefer fixed width int (int16_t, int32_t) over int

It's a bit hard to argue what can be "too big" and maybe we could limit use of int to things we know will never be bigger than 2 billion (such as fields in a point cloud?)

kunaltyagi on 20 Sep 2019

👍1

Just chiming in with the CppCoreGuidelines link as to the reasons why a signed index should be preferred in normal use cases. It requires having using index = std::ptrdiff_t; somewhere and casts everywhere though. The pros and cons seem clear enough, and I do see taketwo's point with never ever having negative indices. Moreover the STL itself having it "wrong" doesn't improve the situation.

aPonza on 9 Oct 2019

@aPonza Yes. That's the big issue. Signed size everywhere would make so much more sense.

As for negative indices not used anywhere, it turns out, PCL uses one special value as marker, ie: -1 when item is not found else correct positive index. A redesign by using std::optional would overcome the issue, but requires a) C++17 or Boost b) Redesign.

I've been trying to remove use of indices as much as possible and use range for loops or algorithms instead. One thing that's controversial is that I'm converting all internal index counters to std::size_t for uniformity. (That's mostly for my use-case so far only since I'm at 20-30% of the limit) Another option is using gsl instead which also opens the gate for span which I love.

I'm hoping that by the time we finish converting loops to range-for + algorithms, the remainder will be taken care of by using range-adapters among other solutions. There are only a few places where the index is required and ranges allow using adapters or rewriting the code to eliminate a lot of them

kunaltyagi on 9 Oct 2019

👍1

Hi, I made this commit where I replaced int with std::size_t for indices in the sample consensus module, then I found this issue and wanted to chime in. (See also @kunaltyagi 's RFC 0002 in the Wiki)

I am for the introduction of a index_t type, just because std::vector<index_t> is more expressive than std::vector<int> or std::vector<std::size_t>. In the index_t version, it is immediately clear that the vector holds indices and not just some arbitrary numbers.
Alternatively, using Indices consistently would also be more expressive (defined here as using Indices = std::vector<int>;).

Regarding the actual type of indices, I would argue for an unsigned type (e.g. unsigned int, std::size_t, or uint32_t) because:

a negative index is always wrong. Since there are barely any checks for negative indices, that opens the door for bugs. Unsigned indices make checks for negative indices unnecessary.
unsigned numbers have twice the range than signed ones at the same type size. So you can have twice as many points without needing a bigger datatype
the point that infinite loops can occur when looping backwards with unsigned is true, but I would say that backward loops are rather uncommon (in pcl at least)