Zig: change switch range syntax to be more clear and perhaps also allow exclusive ranges

Created on 3 May 2017  路  31Comments  路  Source: ziglang/zig

Right now, .. is the only slice operator available, and it is exclusive. Meanwhile ... (1 extra dot) is the only switch range operator available, and it is inclusive.

I believe that the difference in exclusivity of each kind of expression is appropriate based on typical use cases, however the difference in syntax is subtle. It may be worth choosing more clear syntax to represent status quo, or perhaps adding the exclusive ability to switch statements.

Here's an example of a switch statement where exclusive ranges are better:

switch (rng.getRandomPercent()) {
    0...30 => std.debug.warn("Choice A\n"),
    30...70 => std.debug.warn("Choice B\n"),
    70...100 => std.debug.warn("Choice C\n"),
}

Right now this would give you an error because 30 and 70 are used twice. To fix it, the code would look like this:

switch (rng.getRandomPercent()) {
    0 ... 30 - 1 => std.debug.warn("Choice A\n"),
    30 ... 70 - 1 => std.debug.warn("Choice B\n"),
    70 ... 100 - 1 => std.debug.warn("Choice C\n"),
}

It's not so bad, especially considering the -1 happens at compile-time, but this is an example of where exclusive range is desired. Another example would be enum ranges. There is no reasonable way to do "enum value" minus 1. Another example would be if they were floats instead of integers. In this case -1 doesn't make sense and you absolutely need the exclusivity ability.

Here are some proposals:

  • Allow .. in switch as well as .... This matches Perl - two dots is exclusive, three dots is inclusive.
  • Change .. slice syntax to :. This matches Python. Switch statements still have no exclusive range operator.
  • Change .. slice syntax to :, and allow : in switch statements as well, so that they have an exclusive range operator available.

If we have a for range syntax (See #358) then that should be taken into consideration as well.

breaking proposal

Most helpful comment

I think having both .. and ... will lead to lots of bugs..

All 31 comments

Since the for over a range is under consideration, I just want to think out loud a bit.... using the two different range operators allowed in both places:

var array: [3]u8 {0, 1, 2 }
array[0..0] == []u8{}
array[0..1] == []u8{0}
array[0...0] == []u8{0}
array[0...1] == []u8{0,1}

// With chars, the range being exclusive can get weird
switch (c) {
    'a'...'b' => {}, // inclusive
    'c'..'f', 'g' => {}, // exclusive (no f)
    'f'..'g' => {}, // OK (no f above, no g here)
}

// Ints are ultimately the same, but easier to reason I guess 
switch (c) {
    1 .. 10 => {},
    10 .. 100 => {},
    100 .. 1000 => {},
}

// If you could do a switch on floats, inclusive would be weird
switch (f) {
    0.0 ... 1.0 => {}, // inclusive
    1.0 .. 2.0 => {}, // Not OK, 1.0 in two branches
    2.0 .. 3.0=> {}, // OK
}

If we're considering python style array slicing could we do negative indices:

var array: [5]u8 {0, 1, 2, 3, 4 }
array[0...] == []u8 {0, 1, 2, 3, 4 }
array[0..3] == []u8 {0, 1, 2 }
array[0...-1] == []u8 {0, 1, 2, 3, 4 }
array[0...-2] == []u8 {0, 1, 2, 3}
array[0..-1] == []u8 {0, 1, 2, 3 }

Could we iterate backwards?

array[-1...] == []u8 {4, 3, 2, 1, 0 }
array[5...0] == []u8 {4, 3, 2, 1, 0 }
array[5..0] == []u8 { 4, 3, 2, 1 }

And finally, could we specify a stride/step?

array[0... : 2] == []u8 {0, 1, 2, 3, 4}
array[0..3 : 2 ] == []u8 {0, 2 }
array[1..5 : 3 ] == []u8 {1,4}
array[0...5 : -1] == []u8 {4, 3, 2, 1, 0} // Would this be a better way to iterate backwards?

This only really makes sense if we're able to do the same thing for the for over a range:

var values = []u8 {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
var slice = values[0 ... 10 : 4];

for (0 ... 10 : 4 ) | x, i | {
    assert( x == slice[i] )
    printf("{}, {}, {}", x, i, slice[i])
} 
>> 0, 0, 0
>> 4, 1, 4
>> 8, 2, 8

I think having both .. and ... is reasonable, although I do want to encourage programmers to use the exclusive one for slices.

We couldn't do the backwards or stride, because a slice only creates a pointer and a length; it does not copy data around.

As for negative... it seems simpler to require a usize for the end argument of a slice and leave it to the user to figure out how to do indexes an offset from the length.

Another problem with negative indexes is that if the compiler doesn't know at comptime if an index is positive or negative, it would have to emit a conditional branch, which sounds like a bad idea. If there was going to be a way to index backwards, it would need to be comptime unambiguous.

Yeah, the python style slicing is not appropriate. Doing step/direction in the for wouldn't require the copying around as it would just be a while loop with a counter, but the point here is to make the syntaxes consistent and it makes things less simple not more... So yeah, disregard the above...

The other only other thought on this subject I wanted to share is specifying a range not with start and end, but start and number of elements...

ie:

```
var array: [5]u8 {0, 1, 2, 3, 4 }

// Exclusive
array[0..2] == []u8{0,1}

// Inclusive
array[0...2] == []u8{0,1, 2}

// Range
array[0 : 0] == []u8{}
array[0 : 2] == []u8{0, 1}
array[2 : 2] == []u8{2, 3}

// Each in the for
for (0 .. 2 ) | x, i | { } // 0, 1
for (0 ... 2) | x, i | { } // 0, 1, 2
for (2 : 2) | x, i | { } // 2, 3

// Each in the switch
switch (c) {
'a'...'b' => {}, // inclusive
'c'..'A' => {}, // exclusive (no A)
'A':26 => {}, // Range - All capital letters
}
```

I think it's reasonable to want to have a start and a length rather than start and end. But I think there's value in the language having a single convention.

Yeah, this is purely syntactic sugar and completely unecessary.

Consider the following:

fn printRange(a, b) {
    for (a ... b) | x, i | { }  // Fails if a > b
    for  (arr[a ... b])  | x, i | { }  // Fails if a > b OR b > arr.len OR a > arr.len
}

fn printN(a, n) {
    for  (a : n)  | x, i | { }  // Can't fail
    for  (arr[a : n])  | x, i | { }  // Fails if n > arr.len OR a + n > arr.len
}

I see your point here, but I question this assertion:

for (a ... b) | x, i | { }  // Fails if a > b

I think this would simply iterate 0 times, the same way that this would:

var i: usize = 100;
while (i < 10; i += 1) {}

As for the other one:

for  (arr[a ... b])  | x, i | { }  // Fails if a > b OR b > arr.len OR a > arr.len

Because of the transitive property, we only have to compare a <= b and then b <= arr.len.

I think this would simply iterate 0 times, the same way that this would:

var i: usize = 100;
while (i < 10; i += 1)

Good point, sounds reasonable. You'd need to check the second case anyway.

Now slicing syntax is .. instead of .... So the syntax is at least not misleading.

Looks good!

It should be said that this syntax is the exact opposite of what Ruby does: https://ruby-doc.org/core-2.1.5/Range.html

Not that Ruby should dictate Zig, but it's very unfortunate.

But the syntax is very clear though. I think it's hard to see the difference clearly.

I had a related comment here: https://github.com/ziglang/zig/issues/358#issuecomment-408850822

I think having both .. and ... will lead to lots of bugs..

I updated the OP to clear up confusion.

Ruby's syntax is nuts. How could more dots mean less numbers in the range? The mnemonic is completely backwards!

If you visualise .. vs ...
a .. b
a ... b

In the .. case, the distance between both letters is smaller, thus b is in the range
In the ... case, the distance is bigger, thus b is not in the range

Somehow this always made sense to me and I actually never mistyped but actually explaining it I reckon the other way around makes just as much, probably actually much more sense.

Huh, alright. That's as reasonable a mnemonic as any, so I'll take back my "nuts" comment :-)
For the purposes of this proposal, I think it does make sense to avoid directly contradictory syntax with other popular languages, if possible.

i don't think stride makes sense in a switch, and it definitely doesn't make sense with floats or enums in a switch. i think stride is really only meaningful in a looping context over in #358.

Oh, right. Oops.

What about n to m?

The more I think about it, the more this syntax is growing on me:

  • a ..< b
  • a ..<= b

It's very clear what these mean without explanation, and we're not directly contradicting any other language's convention. (don't forget to consider Bash's {0..9} syntax too, which is inclusive.) I say we replace the ... in switch with ..<= and add ..< in switch.

But now the question is do we update the slicing syntax to match this? This feels a little weird to me:

  • arr[0 ..< new_len]

The problem is that there's no < comparison going on in slicing (except for the safety check). Slicing is fundamentally an arithmetic operation, not a comparison. A switch case with a range is fundamentally a comparison and not arithmetic. So using this syntax for slicing doesn't make as much sense to me. And a[0..<] looks really stupid.

So I actually think we can leave slices the way they are. We're not explicitly saying whether slicing is inclusive or exclusive at either bounds, but come on. Everyone should know that upper bounds are exclusive for slices, just like everyone should know that indexes start at 0. Slicing is an arithmetic operation, and exclusive upper bounds is how you avoid doing +1/-1 nonsense.

  • arr[a .. b]
  • arr[a..]

wait, if we want to support switching on floats, we need to support exclusive lower bound too. ok new proposal for switch range syntax (slice syntax still unaffected):

  • a <=..<= b this is the status quo a ... b
  • a <=..< b
  • a <..<= b
  • a <..< b

and i'm thinking that for grammar purposes, the .. is a separate token from the comparison operators, so you could put spaces in like a <= .. <= b.

That looks really ugly. And I would also question the need for using floating point in switch cases. It's really finicky, because the floating point value is often not exactly what you typed because it's in binary format and not decimal. Better not go there imo.

I like @thejoshwolfe's suggestion. The a .. b syntax could be generalized. It could be syntactic sugar for Range {.from=a, .to=b}, or some kind of special built-in tuple. But this doesn't make sense if switch uses the same syntax. Like he says, switching is doing a comparison operation, not actually iterating over a range. I think it makes a lot of sense to make those operators look more like comparison operators.

This also resolves the question of wether a .. b is inclusive or exclusive. Then a and b are just two numbers really, and it should be considered obvious that arr[a..b] is exclusive on b. If it's used elsewhere it should be made sure that it's obivous from context as well.

Switch on floats could be nice. I think only a <..< b is safe in that case. Equality on float is tricky. But
if x <= b is allowed on floats then a <..<= b should be too.

Maybe it looks a bit ugly to some, and a few more characters to type, but it's easier to read unambiguously

Yesterday I different switch range usecase came up: I wanted to switch on type and have a case for i0...i63 and then a different one for i64...i65535

Yesterday I different switch range usecase came up: I wanted to switch on type and have a case for i0...i63 and then a different one for i64...i65535

@daurnimator you already can switch on size of integers, just like that:

switch (@typeInfo(arg).Int.bits) {
    0...63 => //
    64...65535 => //
}

I propose this syntax:

switch (c) {
    5 -> 10 => {}, // exclusive, another variant: a ~~ b
    'a' ->+ 'z' => {}, // inclusive, another variant: a ~~+ b
}

Proposal for range syntax

A tiny suggestion: If it's decided that both .. and ... are allowed in some context, maybe it's better to have .. and .... (four dots).

I feel like the difference between two and three dots is small enough that there will be hard-to-find typo bugs, similar to the classic if (mybool);

Four dots would stand out clearly.

if we want to support switching on floats, we need to support exclusive lower bound too.

In python you can write 1 < x <= 20 which translates to (1 < x) and (x <= 20) (except x is only evaluated once). So you can write this:

if (0 <= x <= 10) {
    // ..
} else if (10 < x <= 20) {
    // ..
} else if (x > 20) {
    // ..
}

Which is very intuitive to understand. It's more flexible too because you can use it outside of switch statements as well. Also, in python you can chain more than one expression: ie 0 < x < y < 100 becomes (0 < x) and (x < y) and (y < 100). The expression a == b == c == d becomes (a == b) and (b == c) and (c == d) etc.

I think it is very important to remember, that range bounds may be constants defined somewhere else, so all this +1/-1 may just confuse and make code less readable.

Adding my 2 cents to @thejoshwolfe proposal.

switch (x) {
     0<= ... <  5 => {}, // 0, 1, 2, 3, 4
     5<= ... <=10 => {}, // 5, 6, 7, 8, 9, 10
    10<  ... < 15 => {}, // 11, 12, 13, 14
    14<  ... <=20 => {}, // 15, 16, 17, 18, 19, 20
}

maybe even

switch (getRandom(0, 20)) {
     0<= |x| <  5 => {}, // 0, 1, 2, 3, 4
     5<= |x| <=10 => {}, // 5, 6, 7, 8, 9, 10
    10<  |x| < 15 => {}, // 11, 12, 13, 14
    14<  |x| <=20 => {}, // 15, 16, 17, 18, 19, 20
}

In Odin, there were many options to go for _iff_ I wanted to unify slicing operations and ranges. However, I decided not to unify them and keep them as different concepts because they are fundamentally different ones too. The _act of slicing_ is different to _indexing with a range_, you can treat them as if they were the same, but they are actually different things conceptually.

array[lo:hi] // slicing syntax, [lo, hi)
case a ..< b:  // range syntax [a, b)
case a ..  b:  // range syntax [a, b]

If you wanted to unify these conceptions, these are the possible solutions:

a .. b
a ... b

a .. b   or a ... b
a ..= b

a ..< b
a .. b or a ... b

The first approach is the most confusing for two reasons, the things are not that distinct in their appearance and they can have the opposite meanings in different languages e.g. Ruby vs Rust.

For Odin I settled on the third approach because it's probably the clearest view in my opinion.

I also wanted to give my 2 cents, I like how Raku handles this: https://docs.raku.org/type/Range
adding a caret to either side of the .. that indicates that the point marked with it is excluded from the range.

switch (x) {
     0  ..^  5 => {}, // 0, 1, 2, 3, 4
     5  ..  10 => {}, // 5, 6, 7, 8, 9, 10
    10 ^..^ 15 => {}, // 11, 12, 13, 14
    14 ^..  20 => {}, // 15, 16, 17, 18, 19, 20
}

@ManDeJan In Nim, the caret is used to be shorthand to mean from the end.

This it the problem with choosing syntax. Every other language chooses it differently.

My proposal?

elements[.[a, b)]
elements[.[a, b]]
elements[.(a, b]]
elements[.(a, b)]

switch (x) {
    .[ 0,  5) => {}, // 0, 1, 2, 3, 4
    .[ 5, 10] => {}, // 5, 6, 7, 8, 9, 10
    .(10, 15) => {}, // 11, 12, 13, 14
    .(14, 20] => {}, // 15, 16, 17, 18, 19, 20
}

my feeling now: (>_<) oh sissy me, why not just use .. and ...? off-by-one errors are not new and will never be old, you can also mistype a < b and get a <= b instead.

so my real proposal:

a .. b   // exclusive
a ... b  // inclusive

never mind those crazy ideas about float range, enum range and enum-indexed array/tuple...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

komuw picture komuw  路  3Comments

andrewrk picture andrewrk  路  3Comments

bronze1man picture bronze1man  路  3Comments

daurnimator picture daurnimator  路  3Comments

andersfr picture andersfr  路  3Comments