Pcl: Rationale behind int and not std::size_t

Created on 19 Sep 2019  路  11Comments  路  Source: PointCloudLibrary/pcl

Is there a reason for using int and not std::size_t everywhere to represent size, including size of stdlib containers?

EDIT:
I've noticed an inconsistency. Several places use std::size_t, others do not.

Expected Behavior

const auto index = container.size();

for(std::size_t i = 0; i < index; ++i)
// OR
for(decltype(index) i = 0; i < index; ++i)

Current Behavior

const int index = static_cast<int>(container.size());

for(int i = 0; i < index; ++i)

Possible Solution

Use std::size_t instead of int for indices

  • Using std::size_t is recommended for portability
  • It saves a cast (which is not a no-op on 64-bit machines) which can induce bugs (see last point)
  • It needs less code
  • Allows larger indices to work (18446744073709551615 vs 2147483647)
  • Prevents negative indices (negative indices can be a feature actually, see Python, but in the code I've seen, negative indices aren't considered), since the cast can potentially create negative indices
stale

All 11 comments

Speaking about point indices, I assume that int was chosen because it takes less space, yet supports point clouds over 2 billion points in size (which probably felt sufficient to the original authors). However, I don't see any good reason why they chose int over unsigned int. As you have mentioned, negative indices aren't used in PCL.

Speaking of container size, could you post a link to some examples of what you mean?

I'm really convinced it is/was just a byproduct of code review absence.

Speaking of container size, could you post a link to some examples of what you mean?

Take Vertex for example. It has std::vector<uint32_t> vertices and lots of places use int num_vertices = static_cast<int>(vertex.vertices)

Ah, OK. Yes, I fully agree with Sergio's opinion.

I remember if not careful using size_t will cause an infinite loop when looping backward. int also takes less space as well.

Also if you read the google coding style here

We use int very often, for integers we know are not going to be too big, e.g., loop counters. Use plain old int for such things. You should assume that an int is at least 32 bits, but don't assume that it has more than 32 bits.

if not careful using size_t will cause an infinite loop when looping backward.

Good point. I think this issue comes while looping forwards too (while using unsigned for eg.). If we agree on converting int to something larger, maybe creating special values such as pcl::equality_limits<uint>::min or PCL_UINT_MIN_EQUAL might be sensible.

we know are not going to be too big

Sadly, I hit the limit of int while concatenating point clouds. I had my pipeline wrong (no duplicate removal or voxel filtering) but I believe I'll hit the limits as more data comes in. 60 GB of points isn't an unfathomable data anymore for me. I personally prefer fixed width int (int16_t, int32_t) over int

It's a bit hard to argue what can be "too big" and maybe we could limit use of int to things we know will never be bigger than 2 billion (such as fields in a point cloud?)

Just chiming in with the CppCoreGuidelines link as to the reasons why a signed index should be preferred in normal use cases. It requires having using index = std::ptrdiff_t; somewhere and casts everywhere though. The pros and cons seem clear enough, and I do see taketwo's point with never ever having negative indices. Moreover the STL itself having it "wrong" doesn't improve the situation.

@aPonza Yes. That's the big issue. Signed size everywhere would make so much more sense.

As for negative indices not used anywhere, it turns out, PCL uses one special value as marker, ie: -1 when item is not found else correct positive index. A redesign by using std::optional would overcome the issue, but requires a) C++17 or Boost b) Redesign.

I've been trying to remove use of indices as much as possible and use range for loops or algorithms instead. One thing that's controversial is that I'm converting all internal index counters to std::size_t for uniformity. (That's mostly for my use-case so far only since I'm at 20-30% of the limit) Another option is using gsl instead which also opens the gate for span which I love.

I'm hoping that by the time we finish converting loops to range-for + algorithms, the remainder will be taken care of by using range-adapters among other solutions. There are only a few places where the index is required and ranges allow using adapters or rewriting the code to eliminate a lot of them

Hi, I made this commit where I replaced int with std::size_t for indices in the sample consensus module, then I found this issue and wanted to chime in. (See also @kunaltyagi 's RFC 0002 in the Wiki)

I am for the introduction of a index_t type, just because std::vector<index_t> is more expressive than std::vector<int> or std::vector<std::size_t>. In the index_t version, it is immediately clear that the vector holds indices and not just some arbitrary numbers.
Alternatively, using Indices consistently would also be more expressive (defined here as using Indices = std::vector<int>;).

Regarding the actual type of indices, I would argue for an unsigned type (e.g. unsigned int, std::size_t, or uint32_t) because:

  • a negative index is always wrong. Since there are barely any checks for negative indices, that opens the door for bugs. Unsigned indices make checks for negative indices unnecessary.
  • unsigned numbers have twice the range than signed ones at the same type size. So you can have twice as many points without needing a bigger datatype
  • the point that infinite loops can occur when looping backwards with unsigned is true, but I would say that backward loops are rather uncommon (in pcl at least)

Marking this as stale due to 30 days of inactivity. It will be closed in 7 days if no further activity occurs.

index_t is being added slowly but surely.

Was this page helpful?
0 / 5 - 0 ratings