The range count operator ignores the range's low bound when count is 0, and always returns 1..0. It would be better if the low bound would be kept so that the count operator can produce empty ranges with a specified low bound.
writeln(0..#0); // gets: 1..0, want: 0..-1
chpl Version 1.14.0
A corner case to look out for on this issue which may have been related to the current implementation:
const u = 0:uint;
writeln(u..#0);
We currently have three tests that run into the issue @bradcray mentions above.
[Error matching program output for distributions/vass/densify-1-self]
[Error matching program output for types/range/hilde/count]
[Error matching program output for users/vass/crash3tuple]
Each of these three tries to build empty ranges starting from 0:uint.
1..#0
2..#0
3..#0
are all "an empty range" and they are all represented internally "as if" 1..0
Of course any bounded range with high < low is an empty range but which
empty range should the count operator use for a count of 0?
And what happens if the low bound is the minimum allowable value for the
range type (e.g. const u = 0:uint)?
This issue is suggesting that:
1..#0 == 1..0
2..#0 == 2..1
3..#0 == 3..2
The ability to specify and keep the low bound would be useful for example when using array.push_back(). You specify the low bound for a domain/array and use a count of 0, then fill the array with push_back().
The 0:uint..#0 and similar cases should probably issue an error.
Question: Is there a reason why this has not been patched? Is it more of a discussion thing, or a lack-of-time thing?
If it is not intended to be supported for design reasons: if I want to create a domain from a possibly empty range, I should always just compute the index myself to avoid this; since 99% of the time I do not know the if the domain can be empty, I should just go ahead and avoid using the count operator.
@LouisJenkinsCS: My perception on this issue is that mostly nobody has prioritized looking into it. I agree that having lo..#0 return lo..lo-1 is a more natural implementation choice than always pinning it to 1..0 (and more consistent with other things we do with range/domain logic in Chapel), and wouldn't expect much pushback against it. I think the other thing that needs to be decided is what to do if lo == min(idxType) (where perhaps having that case always result in 1..0 is reasonable?).
(I've added the user issue tag to this now that it's been reported by Louis over in #13484. I'm tempted to close that one as a duplicate, though arguably this one is about ranges and that one's about domains...).
I would rather min(idxType)..#0 result in either overflow or halt. That is, it should either be undefined behavior (also allowing the user to code around this), or should not be allowed.
I would rather min(idxType)..#0 result in either overflow or halt. That is, it should either be undefined behavior (also allowing the user to code around this), or should not be allowed.
That surprises me. The implication is that 0-based indexing with uint idxType would be unable to use the count operator with a 0 argument, which doesn't feel like that unusual of a case:
const lo = 0: uint;
...
var r = lo..#0;
I would've thought that defining the behavior as 1..0 or lo+1..lo would be preferable as compared to halting the program or leaving the behavior undefined (the programmer could then still choose to code around it if they wanted to, but wouldn't be forced to, or to have to deal with the surprises that would occur if the program were to halt or change behavior over time).
I also forgot to mention strided ranges until now: I'd guess that (1.. by 2)#0 should return 1..-1 by 2 rather than 1..0 by 2 (seems like a no-brainer). Which suggests that we also need to worry about determining what happens for (lo by str)#0 for lo-str < min(idxType) (where the case we're discussing below is just the str==1 case).
@ronawho: I remember there were similar issues to this related to iteration over maximal ranges that you wrestled with a few years ago. Can you remind me what we ended up doing there?
Iteration over maximal ranges halts today. See chpl_checkIfRangeIterWillOverflow or https://github.com/chapel-lang/chapel/pull/384
One conclusion of that might be "If iteration over min(idxType)..max(idxType) halts today, maybe 0:uint..#0 ought to as well, but it still seems to me that 0..#0 is much more likely to come up in practice (using symbolic values, not literals) than min(idxType)..max(idxType) is. (which makes it seem more concerning that it would halt...).
Ironically, if we took a lo..#0 => lo..lo-1 approach, we'd get min(idxType)..max(idxType) for lo:uint==0 (unstrided). Which could be another answer to what the result should be, but I don't know if it's a particularly useful one...
One conclusion of that might be "If iteration over
min(idxType)..max(idxType)halts today, maybe0:uint..#0ought to as well, but it still seems to me that0..#0is much more likely to come up in practice (using symbolic values, not literals) thanmin(idxType)..max(idxType)is. (which makes it seem more concerning that it would halt...).Ironically, if we took a
lo..#0=>lo..lo-1approach, we'd getmin(idxType)..max(idxType)forlo:uint==0(unstrided). Which could be another answer to what the result should be, but I don't know if it's a particularly useful one...
The values that a range can hold is one less than the full iteration space of that range:
uint(8)): up to 0..255Any solution that tries to fit both of these into the singular idxType (uint(8)) isn't going to be satisfactory, though I do agree that empty ranges would come up more frequently than full-iteration ranges, but neither are rare enough that halting or throwing would be acceptable.
Some ideas. One solution is to concede that the idxType isn't sufficent to hold both the empty range and the full-iteration range, so one of these need to be fitted into an idxType of larger size (uint(16)). Since empty ranges are more common, this solution would make 0..255 the one that got turned into uint(16) as an implementation-internal detail.
As for how to represent the empty range when it's lo..lo-1 =~= min(idxType)..max(idxType), one solution would be to make the internal representation of ranges be exclusive ranges rather than internal. The empty range could then be implemented internally as a lo..lo exclusive range.
(As an aside, if the language was redesigned, ranges could have been optionals to have that extra 'empty range' value like a nilable class that has the extra 'nil' value.)
One of the solutions that we talked about when @ronawho was working on #384 was to have the range iterator test for overflow/underflow conditions using additional yield statements to take care of the last element. For example, a safe iterator for (unstrided) uint(8) ranges could essentially be written as follows (using pseudocode):
const tmpHigh = if (high == 255) then 254 else high;
c_for_loop(i=lo; i<tmpHigh; i++) {
yield i;
}
if high == 255 then
yield high;
The main downside to this is that it adds a certain amount of cruft to the iterator for an uncommon case which adds challenges to things like optimally zippering iterators together. We discussed potentially requiring the programmer to explicitly opt into invoking this iterator as a means of avoiding penalizing every for loop.
We also discussed simply outlawing the maximal range (at the time of its construction) and/or reserving it as a sentinel (as we're discussing here).
And then we put off the work, feeling like we had bigger fish to fry for the time being.
I'm not crazy about rounding up the idxType to the next biggest integer for the same reason I'm not crazy about making the addition or multiplication of two int(n)s result in a larger size: It seems most convenient for the common cases to make operations closed on a type rather than to promote; and it raises challenges about what to do when you're at the top type, like int(64).
I'm still chewing on the "internal representation uses exclusive ranges" idea. It feels like it ought to leave something on the floor, but I'm having trouble identifying what it would be. Maybe simply that the "maximal range" can't be represented? (which already runs us into problems, so doesn't seem like a huge deal...).
Even if we took that approach, it seems like it raises questions about what answer the user would get if they made a query like (0:uint(8)..#0).high? (i.e., I think we're just kicking the problem down the field?)
Question: this is a possibly naive solution, but is it possible to have a separate field that denotes that the range is empty at the time of construction? That way, no matter what structure the user creates, or the idxType, it will be able to determine whether or not it is empty. For maximal ranges that exceed the iteration space, I'd think that it should just return max(idxType) as the maximum and call it a day?
We could. I think the main hesitation is whether it's worth growing the necessary record size to store this field for all bounded ranges relative to other solutions. And we'd still need to determine what the behavior of calling .high on an empty range would do.
Over on issue #13650, @npadmana ran into a bug related to this in which we try to create an empty range like 1..0 for an enumerated idxType that only defines a single element (e.g., enum color {black};), raising challenges to defining a range whose high bound is higher than its low bound.
Thinking out loud, I wonder if there's a viable implementation similar to how Rust implements Option types where the compiler can use a discriminated union or a null pointer optimization to elide/optimize-out the empty value. This would be some kind of:
union {
Empty(lo),
Range(lo..hi)
}
where only one side can be picked.
If I'm understanding correctly, this seems similar to Louis's suggestion above where it's not clear to me what the behavior of queries like low or high should be on such an empty range.
Question: when you naively use .high without checking if the range is empty? If I did...
var arr : [0..#0] int;
arr[arr.high] = 0;
This is undefined behavior anyway. But perhaps there are times when using range.high before checking if range.size != 0 is valid, but I am not seeing any. I.E the high should possibly be some defined value for the index type.
If .high, any concerns with just throwing? There isn't a reasonable value IMO if the range is empty. The alternative is to make some of the functions like .high, and maybe even .low return an optional<idxType>.
It could muck up naive use cases like
for i in potentiallyempty.low+3..potentiallyempty.high+3 do ...
I do have some issue with just saying this is undefined behavior because someone will do this eventually (instead of properly using a range-translate function or something). This is partially why I think a throw solution would bring light the immediate problem, but then the code has to handle it which is annoying. The optional approach would be better, but the libraries need to be built with some smarts to handle the optional ranges or optional range ends, which will be an involved process.
I guess my concern with making .high throw or return anything other than an integer is that it dramatically complicates code that does range algebra (to me, it'd be similarly negatively impactful to making + on integers throw or return a non-integer to handle cases that wraparound). I'm being cognizant both on the impact to users and to the amount of code that we have internally that makes use of such calls. This is why I keep advocating for positions that use sentinel values, wraparound, or specifying undefined behavior for such cases鈥攂ecause they keep the queries returning simple integers.
Yes please no throwing.
Edit: Honestly if throwing is attached to even simple operations like querying the high of a range, essentially adding overhead of branch instruction (although perhaps if you had something like __unlikely put around it as a hint to branch predictor during code generation it may not be so terrible performance-wise), I'd recommend a new kind of implicit try! around it.
Yeah. I agree, throwing probably isn't a good idea for simple algebra. I'm having trouble rationalizing how sentinels and wrap-around wouldn't effectively amount to an optional-based implementation. The sentinel would require special handling whenever you were checking for the empty range (empty optional would be the range is empty) and wraparound would require special checks on the isEmpty field so a terminal value could be verified, which seems a bit clunky if you have to check the result of .high to see if it was a possibly maxIdxValue and then check the range.isEmpty function, which amounts to the same amount of work as just checking range.isEmpty first. But maybe these details would all be hidden in the internal implementation and not exposed to users, but I suspect they aren't.
Let me try to summarize the solutions that don't rely on throwing or returning optional
For all of the following options, I'm imagining that lo..#0 will generate lo..lo-stride for any case in which lo-stride >= min(idxType) (i.e., those in which no wraparound occurs), so am just focusing on the case where wraparound would occur (e.g., 0:uint..#0 or more generally min(idxType)..#0 or min(idxType)+stride-1 by stride..#0)
halt We could halt when hitting such cases
- relies on halting which always seems unfortunate
+ no subtle surprises, only non-subtle ones
=> users would need to either
(a) avoid applying #0 to a range, or
(b) be OK with halting programs, or
(c) check for the wraparound condition before making the call (where ranges could potentially provide a utility function to help check for this)
sentinel range We could return 1..0 for such cases:
- results in a range whose bounds have no relation to the original bounds
+ returns something that feels like an empty range (and is the canonical empty range value)
=> users would need to either
(a) avoid applying #0 to a range, or
(b) be OK with losing the original bounds
(c) check for this sentinel case after making the call if they wanted to reason about the original bounds (e.g., r2 = r#0; if (r2.low != r.low && r2.size == 0) then // I know it wrapped...
(d) check for the wraparound condition before making the call (where ranges could potentially provide a utility function to help check for this)
wraparound We could have the high bound wrap around for such cases (e.g., 0:uint..#0 == 0:uint..max(uint)
- returns a range that seems non-empty since high > low
+ retains low bound of original range
~ might suggest changing the language to always have maximal ranges represent empty ranges (which would also arguably dodge other open questions like "how to iterate over a maximal range [efficiently]")
~ note that implementing this for signed integer idxTypes would be slightly non-trivial since we can't rely on C wrapping them around as we'd like
=> users would either need to
(a) avoid applying #0 to a range, or
(b) be OK with not using maximal ranges (if we made the language change)
(c) check for this wrapped-around case after making the call (if we didn't)
(d) check for the wraparound condition before making the call (where ranges could potentially provide a utility function to help check for this)
overflow We could have the high bound wrap around for uint ranges and be undefined for int ranges
+ would be analogous to integer arithmetic
- would make it harder to reason about ranges likemin(int)..#0(but arguably this isn't a common/important case?)
~ otherwise, similar tradeoffs to the above
=> I think the user view is generally similar to the above as well
All of the above have the advantage of being fairly easy to implement, so I didn't list that as an advantage for any of them.
I'm open to any of these four where my preference varies slightly depending on whether I'm the person implementing them or using them. I also don't imagine myself applying #0 to ranges very often, so am probably not the best target audience.
I'm in favor of overflow, its easiest to make sense of (integer overflow is a well-known problem and can be reasoned about, special implementation-specific logic for edge-cases always felt like an easy way to add frustration when trying to debug what is wrong), and has zero-overhead.
I also don't imagine myself applying #0 to ranges very often, so am probably not the best target audience.
You really have never created an empty array or variable-sized array that you end up resizing later?
I also don't imagine myself applying #0 to ranges very often, so am probably not the best target audience.
You really have never created an empty array or variable-sized array that you end up resizing later?
I did say "very often" rather than "never". :) But to be even more accurate, I should've probably said "...to min(idxType).. ranges very often..."
Given this use case and your preference for overflow, do you have concerns about the following pattern causing problems:
const D = {0:uint..#0};
var A: [D] real;
And/or what would you want to have happen? If we took the "maximal ranges are (always) empty" interpretation, then the array allocation code would need to be updated to check for maximal ranges (which is completely reasonable). If we didn't, then the user might end up getting a very large array rather than the empty one they'd hoped for.
I still have the preference that a new field is introduced that explicitly denotes whether a range is empty or not. I'd hope that the above would result in a zero-length array despite the overflow, and that internally the range knew this was the case. I think that adding a single byte isn't that large of a deal, even if makes this 8 bytes larger due to padding.
I still have the preference that a new field is introduced that explicitly denotes whether a range is empty or not. I'd hope that the above would result in a zero-length array despite the overflow, and that internally the range knew this was the case. I think that adding a single byte isn't that large of a deal, even if makes this 8 bytes larger due to padding.
That returns us to the question of what the result of .high / .low should be for such a range?" Or wait, maybe you are suggesting that .high / .low return the bounds as defined in the overflow case, but that code can distinguish between empty and non-empty maximal ranges by checking the bit?
I suppose that makes sense, though I was (probably obviously) rooting for maximal ranges to always mean "the empty range" because I think it would help avoid other corner-ish cases and open questions like how to iterate over maximal ranges (efficiently). I also suspect it would require fewer code changes on the implementation side (I'm guessing). I agree that an extra byte or 8 on a range is probably not a huge deal, though I worry a little about it having a multiplicative effect (by numdims) for multidimensional rectangular domains.
Or wait, maybe you are suggesting that .high / .low return the bounds as defined in the overflow case, but that code can distinguish between empty and non-empty maximal ranges by checking the bit?
Yes.
I'm leaning towards the empty-bit or the wraparound solutions right now.
It's unfortunate that a maximal range can't be created with the wraparound solution, but I think you could just split your maximal range into two non-maximal ranges to equate a maximal range and avoid the conflict with the empty (= maximal) range. It would work, but it's not ideal because it's confusing for a non-expert of Chapel to read such code.
All of these don't seem like great solutions because they are either painful or confusing to implement or to use, so it's just minimizing how often this corner case will occur.
Edit: One thought would be to provide an idiom to avoid the maximal range via type methods. Spitballing:
for i in uint(8).allValues() do ... // returns an iterator, not a range.
Edit2: The languages that use exclusive ranges have this problem too. You either can do the empty range or the maximal range, but not both. Exclusive ranges naturally support the empty range while inclusive ranges naturally support maximal ranges. Rust examples: Ex1. Ex2. It feels a bit unfortunate that by choosing inclusive ranges, we have to have this corner case for the empty range, but I'm willing to live with it so long as there's a compiler error when someone actually tries to do a maximal range (e.g., error on 0:uint(8)..255 and no error for 0:uint(8)..#0) (Could be a long-term goal for ease of use).
Edit3: Or we can start adding support for exclusive ranges a la Swift and provide user support for both depending on the range operator used. The internal representation will have to be is easiest with a maximal range == empty range solution. https://github.com/chapel-lang/chapel/issues/12988#issuecomment-495799198
You really have never created an empty array or variable-sized array that you end up resizing later?
Grepping the test/ directory for existing cases that I can run to make sure any changes here work, I think I can now say more definitively that I've never written code that applies #0 to a range. :) (or at least, if I did, I either didn't check it in or it didn't persist).
I'm willing to live with it so long as there's a compiler error when someone actually tries to do a maximal range (e.g., error on 0:uint(8)..255 and no error for 0:uint(8)..#0)
I was imagining this as well, and believe it would be reasonably trivial to implement.
That said, I think if we did wraparound and stored an empty bit, we could store both empty and maximal ranges using the same bounds but differentiating them from another. I think there's more implementation effort in that approach, but _maybe_ less confusion?
For the time being, I'm looking at implementing the wraparound approach without the empty bit to get us partway down this path and see if there are any surprises (which is why I was grepping the test system in the previous comment).
My prototype implementation of this is on PR #13673.
The PR is now in a state that it's passing testing. My current intention is to get review on it, merge if there are no objections, and fork the discussion of an "is empty" bit to a distinct issue.
I propose to follow the precedent of iterating over maximal ranges, which halts with Iteration over a bounded range may be incorrect due to overflow. Implemented in #384 I think.
Looks like "overflow" above is the one implemented. This is fine with me. I think it's probably worth creating an error upon certain patterns of range creation. E.g. min(uint(8))..max(uint(8)) could have a useful error. But, we want the implementation to be able to compute this in certain cases (mainly with #). Perhaps the # could use a different range initializer and the initializer from low/high could halt when checks are on?
@vasslitvinov: Halting was generally considered distasteful since 0:uint(8)..#0 is likely to be not that rare of a pattern. Also, the halt that currently occurs when iterating over maximal ranges is something that we've discussed providing ways to work around.
@mppf: Above, we briefly discussed generating errors for creating maximal ranges as another possible way to handle them, and the code paths for creating range literals lo..hi does differ from the one used by #0 so this would be pretty easy to do. That said, if we stored the isEmpty bit in ranges, we wouldn't need the error. As mentioned on email, I'm going to fork the maximal range aspects of this issue into a new issue of its own.
I'm considering PR #13673 to have closed this issue, though I've spawned off issue #13717 as a place to capture and continue discussions about maximal and empty ranges.