chapel 🚀 - Count operator always returns 1..0 when count is zero

A corner case to look out for on this issue which may have been related to the current implementation:

const u = 0:uint;
writeln(u..#0);

bradcray on 17 Jan 2017

We currently have three tests that run into the issue @bradcray mentions above.

[Error matching program output for distributions/vass/densify-1-self]
[Error matching program output for types/range/hilde/count]
[Error matching program output for users/vass/crash3tuple]

Each of these three tries to build empty ranges starting from 0:uint.

daviditen on 6 Apr 2017

1..#0
2..#0
3..#0

are all "an empty range" and they are all represented internally "as if" 1..0

Of course any bounded range with high < low is an empty range but which
empty range should the count operator use for a count of 0?

And what happens if the low bound is the minimum allowable value for the
range type (e.g. const u = 0:uint)?

noakesmichael on 25 Oct 2017

This issue is suggesting that:

1..#0 == 1..0
2..#0 == 2..1
3..#0 == 3..2

The ability to specify and keep the low bound would be useful for example when using array.push_back(). You specify the low bound for a domain/array and use a count of 0, then fill the array with push_back().

The 0:uint..#0 and similar cases should probably issue an error.

daviditen on 25 Oct 2017

Question: Is there a reason why this has not been patched? Is it more of a discussion thing, or a lack-of-time thing?

If it is not intended to be supported for design reasons: if I want to create a domain from a possibly empty range, I should always just compute the index myself to avoid this; since 99% of the time I do not know the if the domain can be empty, I should just go ahead and avoid using the count operator.

LouisJenkinsCS on 16 Jul 2019

@LouisJenkinsCS: My perception on this issue is that mostly nobody has prioritized looking into it. I agree that having lo..#0 return lo..lo-1 is a more natural implementation choice than always pinning it to 1..0 (and more consistent with other things we do with range/domain logic in Chapel), and wouldn't expect much pushback against it. I think the other thing that needs to be decided is what to do if lo == min(idxType) (where perhaps having that case always result in 1..0 is reasonable?).

bradcray on 29 Jul 2019

(I've added the user issue tag to this now that it's been reported by Louis over in #13484. I'm tempted to close that one as a duplicate, though arguably this one is about ranges and that one's about domains...).

bradcray on 29 Jul 2019

I would rather min(idxType)..#0 result in either overflow or halt. That is, it should either be undefined behavior (also allowing the user to code around this), or should not be allowed.

LouisJenkinsCS on 29 Jul 2019

I would rather min(idxType)..#0 result in either overflow or halt. That is, it should either be undefined behavior (also allowing the user to code around this), or should not be allowed.

That surprises me. The implication is that 0-based indexing with uint idxType would be unable to use the count operator with a 0 argument, which doesn't feel like that unusual of a case:

const lo = 0: uint;
...
var r = lo..#0;

I would've thought that defining the behavior as 1..0 or lo+1..lo would be preferable as compared to halting the program or leaving the behavior undefined (the programmer could then still choose to code around it if they wanted to, but wouldn't be forced to, or to have to deal with the surprises that would occur if the program were to halt or change behavior over time).

I also forgot to mention strided ranges until now: I'd guess that (1.. by 2)#0 should return 1..-1 by 2 rather than 1..0 by 2 (seems like a no-brainer). Which suggests that we also need to worry about determining what happens for (lo by str)#0 for lo-str < min(idxType) (where the case we're discussing below is just the str==1 case).

@ronawho: I remember there were similar issues to this related to iteration over maximal ranges that you wrestled with a few years ago. Can you remind me what we ended up doing there?

bradcray on 30 Jul 2019

Iteration over maximal ranges halts today. See chpl_checkIfRangeIterWillOverflow or https://github.com/chapel-lang/chapel/pull/384

ronawho on 30 Jul 2019

👍1

One conclusion of that might be "If iteration over min(idxType)..max(idxType) halts today, maybe 0:uint..#0 ought to as well, but it still seems to me that 0..#0 is much more likely to come up in practice (using symbolic values, not literals) than min(idxType)..max(idxType) is. (which makes it seem more concerning that it would halt...).

Ironically, if we took a lo..#0 => lo..lo-1 approach, we'd get min(idxType)..max(idxType) for lo:uint==0 (unstrided). Which could be another answer to what the result should be, but I don't know if it's a particularly useful one...

bradcray on 30 Jul 2019

👍1

One conclusion of that might be "If iteration over min(idxType)..max(idxType) halts today, maybe 0:uint..#0 ought to as well, but it still seems to me that 0..#0 is much more likely to come up in practice (using symbolic values, not literals) than min(idxType)..max(idxType) is. (which makes it seem more concerning that it would halt...).

Ironically, if we took a lo..#0 => lo..lo-1 approach, we'd get min(idxType)..max(idxType) for lo:uint==0 (unstrided). Which could be another answer to what the result should be, but I don't know if it's a particularly useful one...

The values that a range can hold is one less than the full iteration space of that range:

empty range
(for uint(8)): up to 0..255

Any solution that tries to fit both of these into the singular idxType (uint(8)) isn't going to be satisfactory, though I do agree that empty ranges would come up more frequently than full-iteration ranges, but neither are rare enough that halting or throwing would be acceptable.

Some ideas. One solution is to concede that the idxType isn't sufficent to hold both the empty range and the full-iteration range, so one of these need to be fitted into an idxType of larger size (uint(16)). Since empty ranges are more common, this solution would make 0..255 the one that got turned into uint(16) as an implementation-internal detail.

As for how to represent the empty range when it's lo..lo-1 =~= min(idxType)..max(idxType), one solution would be to make the internal representation of ranges be exclusive ranges rather than internal. The empty range could then be implemented internally as a lo..lo exclusive range.

(As an aside, if the language was redesigned, ranges could have been optionals to have that extra 'empty range' value like a nilable class that has the extra 'nil' value.)

BryantLam on 1 Aug 2019

One of the solutions that we talked about when @ronawho was working on #384 was to have the range iterator test for overflow/underflow conditions using additional yield statements to take care of the last element. For example, a safe iterator for (unstrided) uint(8) ranges could essentially be written as follows (using pseudocode):

const tmpHigh = if (high == 255) then 254 else high;
c_for_loop(i=lo; i<tmpHigh; i++) {
  yield i;
}
if high == 255 then
  yield high;

The main downside to this is that it adds a certain amount of cruft to the iterator for an uncommon case which adds challenges to things like optimally zippering iterators together. We discussed potentially requiring the programmer to explicitly opt into invoking this iterator as a means of avoiding penalizing every for loop.

We also discussed simply outlawing the maximal range (at the time of its construction) and/or reserving it as a sentinel (as we're discussing here).

And then we put off the work, feeling like we had bigger fish to fry for the time being.

I'm not crazy about rounding up the idxType to the next biggest integer for the same reason I'm not crazy about making the addition or multiplication of two int(n)s result in a larger size: It seems most convenient for the common cases to make operations closed on a type rather than to promote; and it raises challenges about what to do when you're at the top type, like int(64).

I'm still chewing on the "internal representation uses exclusive ranges" idea. It feels like it ought to leave something on the floor, but I'm having trouble identifying what it would be. Maybe simply that the "maximal range" can't be represented? (which already runs us into problems, so doesn't seem like a huge deal...).

Even if we took that approach, it seems like it raises questions about what answer the user would get if they made a query like (0:uint(8)..#0).high? (i.e., I think we're just kicking the problem down the field?)

bradcray on 1 Aug 2019

Question: this is a possibly naive solution, but is it possible to have a separate field that denotes that the range is empty at the time of construction? That way, no matter what structure the user creates, or the idxType, it will be able to determine whether or not it is empty. For maximal ranges that exceed the iteration space, I'd think that it should just return max(idxType) as the maximum and call it a day?

LouisJenkinsCS on 5 Aug 2019

We could. I think the main hesitation is whether it's worth growing the necessary record size to store this field for all bounded ranges relative to other solutions. And we'd still need to determine what the behavior of calling .high on an empty range would do.

bradcray on 5 Aug 2019

Over on issue #13650, @npadmana ran into a bug related to this in which we try to create an empty range like 1..0 for an enumerated idxType that only defines a single element (e.g., enum color {black};), raising challenges to defining a range whose high bound is higher than its low bound.

bradcray on 5 Aug 2019

Thinking out loud, I wonder if there's a viable implementation similar to how Rust implements Option types where the compiler can use a discriminated union or a null pointer optimization to elide/optimize-out the empty value. This would be some kind of:

union {
  Empty(lo),
  Range(lo..hi)
}

where only one side can be picked.

BryantLam on 7 Aug 2019

If I'm understanding correctly, this seems similar to Louis's suggestion above where it's not clear to me what the behavior of queries like low or high should be on such an empty range.

bradcray on 7 Aug 2019

Question: when you naively use .high without checking if the range is empty? If I did...

var arr : [0..#0] int;
arr[arr.high] = 0;

This is undefined behavior anyway. But perhaps there are times when using range.high before checking if range.size != 0 is valid, but I am not seeing any. I.E the high should possibly be some defined value for the index type.

LouisJenkinsCS on 7 Aug 2019

If .high, any concerns with just throwing? There isn't a reasonable value IMO if the range is empty. The alternative is to make some of the functions like .high, and maybe even .low return an optional<idxType>.

It could muck up naive use cases like

for i in potentiallyempty.low+3..potentiallyempty.high+3 do ...

I do have some issue with just saying this is undefined behavior because someone will do this eventually (instead of properly using a range-translate function or something). This is partially why I think a throw solution would bring light the immediate problem, but then the code has to handle it which is annoying. The optional approach would be better, but the libraries need to be built with some smarts to handle the optional ranges or optional range ends, which will be an involved process.

BryantLam on 7 Aug 2019

I guess my concern with making .high throw or return anything other than an integer is that it dramatically complicates code that does range algebra (to me, it'd be similarly negatively impactful to making + on integers throw or return a non-integer to handle cases that wraparound). I'm being cognizant both on the impact to users and to the amount of code that we have internally that makes use of such calls. This is why I keep advocating for positions that use sentinel values, wraparound, or specifying undefined behavior for such cases—because they keep the queries returning simple integers.

bradcray on 7 Aug 2019

👍1

Yes please no throwing.

Edit: Honestly if throwing is attached to even simple operations like querying the high of a range, essentially adding overhead of branch instruction (although perhaps if you had something like __unlikely put around it as a hint to branch predictor during code generation it may not be so terrible performance-wise), I'd recommend a new kind of implicit try! around it.

LouisJenkinsCS on 7 Aug 2019

Yeah. I agree, throwing probably isn't a good idea for simple algebra. I'm having trouble rationalizing how sentinels and wrap-around wouldn't effectively amount to an optional-based implementation. The sentinel would require special handling whenever you were checking for the empty range (empty optional would be the range is empty) and wraparound would require special checks on the isEmpty field so a terminal value could be verified, which seems a bit clunky if you have to check the result of .high to see if it was a possibly maxIdxValue and then check the range.isEmpty function, which amounts to the same amount of work as just checking range.isEmpty first. But maybe these details would all be hidden in the internal implementation and not exposed to users, but I suspect they aren't.

BryantLam on 7 Aug 2019

Let me try to summarize the solutions that don't rely on throwing or returning optional as I understand them:

For all of the following options, I'm imagining that lo..#0 will generate lo..lo-stride for any case in which lo-stride >= min(idxType) (i.e., those in which no wraparound occurs), so am just focusing on the case where wraparound would occur (e.g., 0:uint..#0 or more generally min(idxType)..#0 or min(idxType)+stride-1 by stride..#0)

halt We could halt when hitting such cases
- relies on halting which always seems unfortunate
+ no subtle surprises, only non-subtle ones
=> users would need to either
(a) avoid applying #0 to a range, or
(b) be OK with halting programs, or
(c) check for the wraparound condition before making the call (where ranges could potentially provide a utility function to help check for this)
sentinel range We could return 1..0 for such cases:
- results in a range whose bounds have no relation to the original bounds
+ returns something that feels like an empty range (and is the canonical empty range value)
=> users would need to either
(a) avoid applying #0 to a range, or
(b) be OK with losing the original bounds
(c) check for this sentinel case after making the call if they wanted to reason about the original bounds (e.g., r2 = r#0; if (r2.low != r.low && r2.size == 0) then // I know it wrapped...
(d) check for the wraparound condition before making the call (where ranges could potentially provide a utility function to help check for this)
wraparound We could have the high bound wrap around for such cases (e.g., 0:uint..#0 == 0:uint..max(uint)
- returns a range that seems non-empty since high > low
+ retains low bound of original range
~ might suggest changing the language to always have maximal ranges represent empty ranges (which would also arguably dodge other open questions like "how to iterate over a maximal range [efficiently]")
~ note that implementing this for signed integer idxTypes would be slightly non-trivial since we can't rely on C wrapping them around as we'd like
=> users would either need to
(a) avoid applying #0 to a range, or
(b) be OK with not using maximal ranges (if we made the language change)
(c) check for this wrapped-around case after making the call (if we didn't)
(d) check for the wraparound condition before making the call (where ranges could potentially provide a utility function to help check for this)
overflow We could have the high bound wrap around for uint ranges and be undefined for int ranges
+ would be analogous to integer arithmetic
- would make it harder to reason about ranges likemin(int)..#0(but arguably this isn't a common/important case?)
~ otherwise, similar tradeoffs to the above
=> I think the user view is generally similar to the above as well

All of the above have the advantage of being fairly easy to implement, so I didn't list that as an advantage for any of them.

I'm open to any of these four where my preference varies slightly depending on whether I'm the person implementing them or using them. I also don't imagine myself applying #0 to ranges very often, so am probably not the best target audience.

bradcray on 7 Aug 2019

I'm in favor of overflow, its easiest to make sense of (integer overflow is a well-known problem and can be reasoned about, special implementation-specific logic for edge-cases always felt like an easy way to add frustration when trying to debug what is wrong), and has zero-overhead.

I also don't imagine myself applying #0 to ranges very often, so am probably not the best target audience.

You really have never created an empty array or variable-sized array that you end up resizing later?

LouisJenkinsCS on 7 Aug 2019

I also don't imagine myself applying #0 to ranges very often, so am probably not the best target audience.

You really have never created an empty array or variable-sized array that you end up resizing later?

I did say "very often" rather than "never". :) But to be even more accurate, I should've probably said "...to min(idxType).. ranges very often..."

Given this use case and your preference for overflow, do you have concerns about the following pattern causing problems:

const D = {0:uint..#0};
var A: [D] real;

And/or what would you want to have happen? If we took the "maximal ranges are (always) empty" interpretation, then the array allocation code would need to be updated to check for maximal ranges (which is completely reasonable). If we didn't, then the user might end up getting a very large array rather than the empty one they'd hoped for.

bradcray on 7 Aug 2019

I still have the preference that a new field is introduced that explicitly denotes whether a range is empty or not. I'd hope that the above would result in a zero-length array despite the overflow, and that internally the range knew this was the case. I think that adding a single byte isn't that large of a deal, even if makes this 8 bytes larger due to padding.

LouisJenkinsCS on 7 Aug 2019

👍1

I still have the preference that a new field is introduced that explicitly denotes whether a range is empty or not. I'd hope that the above would result in a zero-length array despite the overflow, and that internally the range knew this was the case. I think that adding a single byte isn't that large of a deal, even if makes this 8 bytes larger due to padding.

That returns us to the question of what the result of .high / .low should be for such a range?" Or wait, maybe you are suggesting that .high / .low return the bounds as defined in the overflow case, but that code can distinguish between empty and non-empty maximal ranges by checking the bit?

I suppose that makes sense, though I was (probably obviously) rooting for maximal ranges to always mean "the empty range" because I think it would help avoid other corner-ish cases and open questions like how to iterate over maximal ranges (efficiently). I also suspect it would require fewer code changes on the implementation side (I'm guessing). I agree that an extra byte or 8 on a range is probably not a huge deal, though I worry a little about it having a multiplicative effect (by numdims) for multidimensional rectangular domains.

bradcray on 7 Aug 2019

Or wait, maybe you are suggesting that .high / .low return the bounds as defined in the overflow case, but that code can distinguish between empty and non-empty maximal ranges by checking the bit?

Yes.

LouisJenkinsCS on 7 Aug 2019

I'm leaning towards the empty-bit or the wraparound solutions right now.

It's unfortunate that a maximal range can't be created with the wraparound solution, but I think you could just split your maximal range into two non-maximal ranges to equate a maximal range and avoid the conflict with the empty (= maximal) range. It would work, but it's not ideal because it's confusing for a non-expert of Chapel to read such code.

All of these don't seem like great solutions because they are either painful or confusing to implement or to use, so it's just minimizing how often this corner case will occur.

Edit: One thought would be to provide an idiom to avoid the maximal range via type methods. Spitballing:

for i in uint(8).allValues() do ... // returns an iterator, not a range.

Edit2: The languages that use exclusive ranges have this problem too. You either can do the empty range or the maximal range, but not both. Exclusive ranges naturally support the empty range while inclusive ranges naturally support maximal ranges. Rust examples: Ex1. Ex2. It feels a bit unfortunate that by choosing inclusive ranges, we have to have this corner case for the empty range, but I'm willing to live with it so long as there's a compiler error when someone actually tries to do a maximal range (e.g., error on 0:uint(8)..255 and no error for 0:uint(8)..#0) (Could be a long-term goal for ease of use).

Edit3: Or we can start adding support for exclusive ranges a la Swift and provide user support for both depending on the range operator used. The internal representation ~~will have to be~~ is easiest with a maximal range == empty range solution. https://github.com/chapel-lang/chapel/issues/12988#issuecomment-495799198

BryantLam on 7 Aug 2019

You really have never created an empty array or variable-sized array that you end up resizing later?

Grepping the test/ directory for existing cases that I can run to make sure any changes here work, I think I can now say more definitively that I've never written code that applies #0 to a range. :) (or at least, if I did, I either didn't check it in or it didn't persist).

bradcray on 8 Aug 2019

I'm willing to live with it so long as there's a compiler error when someone actually tries to do a maximal range (e.g., error on 0:uint(8)..255 and no error for 0:uint(8)..#0)

I was imagining this as well, and believe it would be reasonably trivial to implement.

That said, I think if we did wraparound and stored an empty bit, we could store both empty and maximal ranges using the same bounds but differentiating them from another. I think there's more implementation effort in that approach, but _maybe_ less confusion?

For the time being, I'm looking at implementing the wraparound approach without the empty bit to get us partway down this path and see if there are any surprises (which is why I was grepping the test system in the previous comment).

bradcray on 8 Aug 2019

My prototype implementation of this is on PR #13673.

bradcray on 8 Aug 2019

The PR is now in a state that it's passing testing. My current intention is to get review on it, merge if there are no objections, and fork the discussion of an "is empty" bit to a distinct issue.

bradcray on 9 Aug 2019

I propose to follow the precedent of iterating over maximal ranges, which halts with Iteration over a bounded range may be incorrect due to overflow. Implemented in #384 I think.

vasslitvinov on 9 Aug 2019

Looks like "overflow" above is the one implemented. This is fine with me. I think it's probably worth creating an error upon certain patterns of range creation. E.g. min(uint(8))..max(uint(8)) could have a useful error. But, we want the implementation to be able to compute this in certain cases (mainly with #). Perhaps the # could use a different range initializer and the initializer from low/high could halt when checks are on?

mppf on 9 Aug 2019

@vasslitvinov: Halting was generally considered distasteful since 0:uint(8)..#0 is likely to be not that rare of a pattern. Also, the halt that currently occurs when iterating over maximal ranges is something that we've discussed providing ways to work around.

@mppf: Above, we briefly discussed generating errors for creating maximal ranges as another possible way to handle them, and the code paths for creating range literals lo..hi does differ from the one used by #0 so this would be pretty easy to do. That said, if we stored the isEmpty bit in ranges, we wouldn't need the error. As mentioned on email, I'm going to fork the maximal range aspects of this issue into a new issue of its own.

bradcray on 9 Aug 2019

I'm considering PR #13673 to have closed this issue, though I've spawned off issue #13717 as a place to capture and continue discussions about maximal and empty ranges.

bradcray on 13 Aug 2019

Chapel: Count operator always returns 1..0 when count is zero

Bug Report

Steps to Reproduce

Configuration Information

All 38 comments

Related issues