Cuio has a few places where we unnecessarily add/subtract 1 from a "field length" or "size" or "end pointer". This is caused by some parse implementations using <= rather than <. If we update parse utilities such that the accept begin/end pointers, offsets, or spans, we can avoid these "adjust-by-one" fixes.
With the the current code, size of input to parse utilities must be calculated as end - begin - 1, and likewise the input must be corrected (if the calling code already has correct offsets) as example(begin, end + 1).
It's possible similar logic is used in parqeut/avro, but the portions I've found so far are in csv/json.
Let's standardize on end - begin == size for all offsets, and use for(auto pos = begin; pos < end; ++pos), for example.
CSV
-
The CSV code loops incorrectly, using <= rather than <, and this logic finds it way down in to the parse utils.
https://github.com/rapidsai/cudf/blob/8bafbb3ed3c58fd6015001731faf413798921d56/cpp/src/io/csv/csv_gpu.cu#L603-L608
https://github.com/rapidsai/cudf/blob/8bafbb3ed3c58fd6015001731faf413798921d56/cpp/src/io/csv/csv_gpu.cu#L468-L477
JSON
-
The JSON code loops correctly, but adjusts by one before calling parse utilities.
https://github.com/rapidsai/cudf/blob/8bafbb3ed3c58fd6015001731faf413798921d56/cpp/src/io/json/json_gpu.cu#L330-L331
Parse Utilities
-
The parse utilities code loops incorrectly, using <= rather than <.
https://github.com/rapidsai/cudf/blob/8bafbb3ed3c58fd6015001731faf413798921d56/cpp/src/io/utilities/parsing_utils.cuh#L162-L173
@cwharris Do you think this should be addressed together with https://github.com/rapidsai/cudf/issues/6210, given that both are related to iterator use in the utilities?
@vuule sgtm
Or we just have one pr fixing both issues.
Yeah, that what I've been thinking