Julia: Operator precedence of & and | is surprising as element-wise boolean operators

Created on 18 Dec 2013  Â·  89Comments  Â·  Source: JuliaLang/julia

I just got bit by the python-like operator precedence for the operators | and &. As bitor and bitand, it's a wonderful choice (making checking for bit flags much simpler — a common gotcha in C).

But when behaving as their element-wise boolean counterparts to || and &&, it is surprising. A < 1 || A > 2 must be written differently if A is an array: (A .< 1) | (A .> 2). Perhaps all that's needed here is a bit more documentation (i.e., putting https://github.com/JuliaLang/julia/blob/master/src/julia-parser.scm#L1-L19 into the Mathematical Operators page, or as something to mention to users coming from Matlab).

But as I look at the source for |(::StridedArray,::StridedArray), I see that it's actually applying the bitwise operator to all elements. As a radical alternative, what about adding .&& and .|| with similar precedence to && and || that ensures boolean elements? Functionally, & and .&& would behave the same on logical arrays, but they'd each have the precedence one would expect in each context. (Of course, there'd still be some cognitive dissonance here as the elementwise boolean operators couldn't short-circuit like their scalar equivalents).

breaking decision parser

Most helpful comment

Coming back to this, it seems pretty clear to me that (a < b) & (c < d) is way more useful than a < (b & c) < d. Reopening and marking for triage.

All 89 comments

There might not be a single precedence ordering perfect for all cases. The main problem here is that the result of < is seldom combined with anything; i.e. (x<1) + 2 is quite rare, so without explicit parens parses as x<(1 + 2).

I kind of suspect that our precedences could use a little tweaking, but yeah, every choice is bad in some way.

Yes, exactly. I definitely think the precedence for bitand is the right choice. That's what led to my brainstorm of the .&& operators.

Perhaps it's just my Matlab-think that causes this confusion, where & truly is the element-wise little brother to scalar &&.

+1 for .&& and .||. It seems totally consistent with .*, etc. My understanding is Numpy only ended up using & for element-wise "logical and" since they ran out of infix operators. It's a constant source of confusion (eg, from http://wiki.scipy.org/NumPy_for_Matlab_Users,

"Precedence: Numpy's & operator is higher precedence than logical operators like < and >; Matlab's is the reverse.
If you know you have boolean arguments, you can get away with using Numpy's bitwise operators, but be careful with parentheses, like this: z = (x > 1) & (x < 2). The absence of Numpy operator forms of logical_and and logical_or is an unfortunate consequence of Python's design".)

Also this is anecdotal, but I suspect taking the element-wise "logical and" of boolean vectors is a more common operation than bitwise operations and so if & is going to maintain its current meaning, it might be worth tweaking the precedence to be the same as .*.

I must admit that the fact that the priority order differs between |/.== and ||/== reminds me a little of the R Inferno -- something you absolutely want to avoid. Every choice is bad in some way, but inconsistent solutions are bad in all ways! ;-)

Using the same operators for so different things as bitwise and boolean operations is not ideal. Does any other language use that pattern? In Matlab and R & only does the latter, and in Numpy it only does the former.

Adding .&& and .|| might be a good and consistent solution. In that case & and | would better be eventually reserved to bitwise operations (do not offer two slightly different ways of doing the same thing).

What is the difference between bitwise and boolean?

The difference between & and && is that && is a control flow operator; it short-circuits. So .&& doesn't really make much sense.

Yeah, .&& and .|| don't make any sense. I almost regret using | and & for bitwise or and and. Those are pretty uncommon operations and are arguably clearer if you have to write them in function notation – e.g. there are no precedence issues that way. Then we could have used | for piping without punning.

Those are fair points. Maybe something like .&? It would be consistent with most binary operators having a '.' version that is element-wise, and could have the expected precedence (later than .<).

@JeffBezanson IIUC, & and | have a higher priority than && and || because it's more practical for common patterns of bitwise operations. That's why having a separate element-wise boolean operator would make sense, to provide a consistent precedence order for scalar and element-wise comparison operators.

The whole thing is quite confusing because of the multiple meanings of operators...

julia> help("&&")
Base.&&(x, y)

   Boolean and

So && is not only defined as short-circuit: it also intuitively means that you expect to work with booleans only. This gives support to the idea of having an element-wise version working only with boolean arrays. One could also have & work only for scalars (and operating bitwise), and .& for arrays.

Honestly, the issue of short-circuit really seems secondary to me; of course, when working element-wise it doesn't make sense to short-circuit. But I see that as a side effect. The fact that Matlab offers the choice between two operators (though & and | do short-circuit in if!) looks like a design mistake: when do you need that? Even C does not offer this choice. The real question is: what's the appropriate precedence of bitwise and boolean operators, and do they need to be separated because of that.

@StefanKarpinski Yeah, I was thinking too that wasting & and | for bitwise operations is not that great; but no language uses different operators. Is the alternative of using functions a good solution? Matlab and R do that, there might be a good reason.

&& does not have multiple meanings. The short-circuit behavior only makes
sense because of the boolean operation it does; they cannot be separated.

A function like & but accepting only boolean arrays would be silly. The
"boolean" and "bitwise" behaviors are the same; a boolean is 1 bit.
On Dec 22, 2013 12:35 PM, "Milan Bouchet-Valat" [email protected]
wrote:

@JeffBezanson https://github.com/JeffBezanson IIUC, & and | have a
higher priority than && and || because it's more practital for common
patterns of bitwise operations. That's why having a separate element-wise
boolean operator would make sense, to provide a consistent precedence order
for scalar and element-wise comparison operators.

The whole thing is quite confusing because of the multiple meanings of
operators...

julia> help("&&")
Base.&&(x, y)

Boolean and

So && is not only defined as short-circuit: it also intuitively means
that you expect to work with booleans only. This gives support to the idea
of having an element-wise version working only with boolean arrays. One
could also have & work only for scalars (and operating bitwise), and .&for arrays.

Honestly, the issue of short-circuit really seems secondary to me; of
course, when working element-wise it doesn't make sense to short-circuit.
But I see that as a side effect. The fact that Matlab offers the choice
between two operators (though & and | do short-circuit in if!) looks like
a design mistake: when do you need that? Even C does not offer this choice.
The real question is: what's the appropriate precedence of bitwise and
boolean operators, and do they need to be separated because of that.

@StefanKarpinski https://github.com/StefanKarpinski Yeah, I was
thinking too that wasting & and | for bitwise operations is not that
great; but no language uses different operators. Is the alternative of
using functions a good solution? Matlab and R do that, there might be a
good reason.

?
Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/5187#issuecomment-31091723
.

But what about operator precedence? If they are the same, why don't & and && have the same precedence?

The thing is, short-circuiting is not secondary, but essential. If it were
not for that, they would be the same operator. Short-circuit also
immediately implies scalar boolean, since it has to make a single decision.

I guess I would consider changing the precedence of &. But the key is to
see && as primarily for control flow; it is not even a function, while & is
a normal generic function.
On Dec 22, 2013 1:09 PM, "Milan Bouchet-Valat" [email protected]
wrote:

But what about operator precedence? If they are the same, why don't & and
&& have the same precedence?

—
Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/5187#issuecomment-31092386
.

Regardless of whether the precedence of & changes or a third set of boolean operators is introduced specifically for element-wise boolean arrays, I think it's very important to allow syntax such as B[A .< 1 | A .> 2] without parentheses. In a language like Julia, I think that's a much more common operation than, say, checking bitfields.

+1 for what @mbauman said. I would also throw into the mix that Perl and Ruby have and and or operators that have short-circuit behavior like && and || but much lower precedence. I don't recall if that's because && and || have precedence like & and | or not, but it might be. It's quite subjective, of course, but I happen to think that and and or are _much_ more intuitive as control flow primitive because they look like keywords, not operators.

I would _love_ it if Julia had and and or as keywords that did && and ||.

But currently && and || have the lowest precedence, excepting assignment operators. So in the vast majority of cases you do not need parentheses to combine several conditions (assignment in a condition is special anyway). So what would be the point of introducing and and or with the same behavior as && and || except for precedence? Looks like a source of complexity for a very limited gain.

Apparently [1], and and or where added to Perl to allow constructs like this:

open my $fh, '<', $filename or die "A horrible death!";

I'm not sure that's what you want to encourage in Julia. ;-)

1: http://stackoverflow.com/a/15193366/2413179

Yeah, I already do that sort of thing all the time with ||. Guess where I picked it up?

@johnmyleswhite +1 to that

To give a justification for why I like and over &&: the trouble for me with & and && is that their surface similarity makes me believe that they will be more similar than they are. This was an endless problem for me in R: I would have to look up the definitions of both operators 100% of the time I used them, because they just confuse me to this day. Julia's type errors are big step up from this, but I still find the distinction between these operators non-intuitive.

I guess our C bias leaked through by picking && instead of and. I've been using && in C for so long that it didn't even occur to me how easy it is to confuse with &, but that should have been obvious. I don't think we can change it now.

We could lower the precedence of & and | though. While this could cause very hard-to-find bugs, I'm confident that these operators are not used that much,

Is there no mechanism for operators to be deprecated? If that happens over the course of months, it seems easy enough to replace && with and in existing code. Unlike a lot of breaking changes, this one seems like a simple search-and-replace will solve almost every use case without any careful thinking.

Agreed. I don't see why we couldn't migrate from && and || to and and or fairly easily. Steps:

  1. Deprecate the usage of and and or as identifiers.
  2. Add and and or keywords as short-circuit operators.
  3. Deprecate && and || operators.
  4. Remove && and || operators.

We could also just have both.

I'd really prefer that we not have both in the long-term future. If that were the final option, I'd prefer sticking with the unloved && over redundant operators.

Good point that unlike most changes, this can be handled by search and
replace. But it would not fix this issue; we'd still need to change the
precedence of & or the like.

Very true. Let's just open another issue for debating the change of && to and.

+1 for lowering the precedence of & and |

Precedence is often a damned if you do, damned if you don't kind of business, but I have to say that I've often wished that these operators had lower precedence.

Yeah, lowering the precedence of & and | against broadcasting comparisons will be nice for DataFrame row selection, allowing a more natural df[df[:a].==x & df[:b].>=y, :], more comparable to a scalar boolean expression a==x && b>=y.

I'd also love to see this change. Please, please, please let me get what I want.

Please let John get what he wants. Everyone has a right to one free Julia
syntax change.

On Thu, May 15, 2014 at 10:03 AM, John Myles White <[email protected]

wrote:

I'd also love to see this change. Please, please, please let me get what I
want.

—
Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/5187#issuecomment-43236963
.

Ooh, wonder what I'll use my one on...

But is there code that will break if the precedence is lowered? For example,

If FLAG == x & 4
   # do something really important 
   ...
end

Everyone has a right to one free Julia syntax change.

7 billion syntax changes later...

@kmsquire – yes, we would have to do this in stages. Not sure it's worth it.

We have Perl?

On Thu, May 15, 2014 at 11:32 AM, Stefan Karpinski <[email protected]

wrote:

Everyone has a right to one free Julia syntax change.

7 billion syntax later...

—
Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/5187#issuecomment-43247627
.

Perl 7

Is it possible to & to have lower precedence than .== but higher than ==? To do not break on @kmsquire example? Or would this entail wrose issues in the long term?

That would be possible. It does actually seem sensible for the dotted comparisons to have higher precedence than the undotted ones. Consider a .== b == c .< d – what should this mean? The only sensible parsing is (a .== b) == (c .< d). The current behavior is to parse it as a single chained comparison.

+1 for not chaining dotted and undotted comparisons together.

On Thu, May 15, 2014 at 8:53 PM, Stefan Karpinski
[email protected]:

That would be possible. It does actually seem sensible for the dotted
comparisons to have higher precedence than the undotted ones. Consider a
.== b == c .< d – what should this mean? The only sensible parsing is (a
.== b) == (c .< d). The current behavior is to parse it as a single
chained comparison.

—
Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/5187#issuecomment-43250220
.

+1 for the dotted comparisons to have higher precedence than the undotted ones.

It's an interesting idea to put elementwise comparisons at a higher precedence than &, but I'm not sure it'd pan out with the other infix operators. Right now, & is in the same precedence category as multiplication, which makes sense as bitand. If elementwise comparisons are at a higher precedence, that means that they're higher than _all_ the basic scalar arithmetic operators. Now, the deprecation of broadcasting + makes this more of a possibility, but * would still bind really strangely. 5*B .> A would be a bunch of 5s and 0s. You'd have to have a whole second section of precedences for elementwise operators.

But is there code that will break if the precedence is lowered?

I'm sure there are some packages that live closer to the metal and use the bitwise operators in many places, but this only required 80 changes in Base (there's actually one place where the wrong precedence rules were used). I'm actually surprised the tests passed on my first try on my computer… we'll see if Travis is happy about this. https://github.com/mbauman/julia/compare/JuliaLang:master...mbauman:bitwise-precedence

The harder problem is gracefully deprecating the precedence — it'd be nice to detect in the parser when parentheses would be needed and emit a warning. But that is not a simple task for me.

Hacky way to do it: parse it both ways and see if the result is different.

As suggested above by @nalimilan and @malmaud, I would support making .| and .& elementwise (but not broadcast) bit operators with the same precedence as || and &&.

Seems featurey. Why do we need this? Making the bitwise operators broadcast seems sufficient.

Maybe we don't. I was just leery of changing the precedence of & and |.

That said, just to make sure I wasn't missing anything, I went and compared our precedence with C and Matlab, and lo-and-behold, those operators have lower precedence than we currently have.

So I officially change my position: I think that the precedence of these could be lowered.

That's interesting. It does start to look like we should imitate C's precedence rules.

(When I think about it, I really didn't change my position. I actually thought we were following C precedence rules all along, and were about to break rank!)

I didn't realize we were so far from C convention; I think we should change the precedence of |, &, and $ on that basis. What do you think @StefanKarpinski ?

What was the reason we were originally so far? What is the basic difference between us and C?

Far be it from me to be the one to get in the way here, but the precedence level of & and | in C has been widely regarded as a mistake, and is an artifact of the early language. Dennis Ritchie had this to say in 1982.

The current precedences are somewhat similar to Python, which is one of the few languages that I know of to buck the C legacy and move the precedence of bitwise operators to be higher than comparisons. But we're still different from Python, with the precedences higher yet, equal to plus and multiplication. It does make some sense mathematically, and for checking scalar bit-flags it makes things so much simpler.

Python has elegantly extended the meaning of bitwise operators to sets: union maps naturally to |, intersect to &, etc. This makes working with sets so much easier, I think Julia should also implement them (<= is already an alias for issubset, etc). For sets a, b, c, the current precedence rules make sense, e.g. a&b <= c. That should not change.

We once had such definitions, but it was deemed that the difference between
bitwise and elementwise operations was to great. I can't remember where the
discussion was, though. I think it's probably better to use unicode
operators for union, intersection etc.

On Fri, Aug 22, 2014 at 2:52 PM, rfourquet [email protected] wrote:

Python has elegantly extended the meaning of bitwise operators to sets:
union maps naturally to |, intersect to &, etc. This makes working with
sets so much easier, I think Julia should also implement them (<= is
already an alias for issubset, etc). For sets a, b, c, the current
precedence rules make sense, e.g. a&b <= c. That should not change.

—
Reply to this email directly or view it on GitHub
https://github.com/JuliaLang/julia/issues/5187#issuecomment-53056447.

Because of the unusual ease of overloading syntax through the addition new types and new methods, Julia has to be extremely careful not to overload _meaning_ – it has to be crystal clear what a particular operator or function means so that you can overload it in a way that matches that meaning. In particular, when you do methods(&) you don't want to see a mix of unrelated operations like various bitwise AND operations together with a bunch of essentially unrelated intersection methods on collections.

Oh I see, thanks for answering. I knew that the meaning of methods with same name has to be related, but I can't understand how the difference can be "too great" here, as bitwise operators are set operations on integers viewed as sets of bits. Unicode is nice for code in files, but no better than long names (union, etc.) in interactive sessions, so I will keep using my own operator aliases for sets, and still hope the current precedences remains!

I agree that these meanings can definitely be seen as related. Consider writing v & w where v and w are two vectors of integers of the same length. Does this operation take the intersection of the two vectors as collections of integers or do vectorized bitwise AND?

Also, while we would have a <= b iff a & b == a for sets, that clearly does not hold for integers.

I came here because I implemented "flat" sets, i.e. based on vectors, so your example is apropos! I understand it means that it would be too confusing to have v&w mean vectorized bitwise or set intersection depending on whether the containers v and w are sets or vectors, ok (but then I think I would support the introduction of .| etc. for vectorized ops.; if I understand the logic correctly, it is .< etc. which permit to have < for sets. I know I should wait to have more experience with Julia to discuss these things, but can't resist! sorry).

No, it's a fair point and I've often felt the urge myself to write things like A & B | C where those are sets – this is very handy in Ruby. I do think it's a dangerous change – and one we don't have to make with Unicode operators: A ∩ B ∪ C.

@toivoh: so a <= b for integers should be changed to mean a & b == a ;-) More seriously, the operator ⪯ ("\preceq") is sometimes used to mean a & b == a for integers (seen as bit vectors).

I propose keeping this as-is. I agree & is *-like and | is +-like.

So I guess we bailed on @mbauman's point: "I think it's very important to allow syntax such as B[A .< 1 | A .> 2] without parentheses."?

What about making a new precedence level for dot-comparisons between these two?

I don't think we can give .< higher precedence than .+.

Perhaps .+ should be higher as well?

If .< is higher than *, then I think 5A .< 5 would be even more surprising than this issue is.

There are two distinct uses of &, bitwise and boolean-like broadcasting, and I'm afraid neither use will ever be happy with the other's precedence level. :-\

There are two distinct uses of &, bitwise and boolean-like broadcasting, and I'm afraid neither use will ever be happy with the other's precedence level. :-\

Fully agreed. And what about changing the bitwise operators? I think it's been discussed several times before. That would have the advantage of reducing the confusion, as bitwise operators and element-wise boolean operators are used in very different contexts.

Why would anything affect 5A .< 5? Shouldn't the juxtaposition multiplication be treated as having the highest of all precedence? (It can be rewritten as (5 * A) by the compiler, right?)
If that's not what is currently happening, I would suggest that it should be changed.

Juxtaposition multiplication has the precedence of unary operators I believe, which is indeed higher than most operators.

So something like -5A would be parsed as (-5) * A or -(5 * A)? (not that it probably matters in that example, but just wondering, if somebody had some unary operator that really needed to work on (5 * A),
say if A is an array, if that would work without having to put parenthesis.

julia> Meta.show_sexpr(:(-5A))
(:call, :*, -5, :A)

Ah, good catch. Right you are.

I think that changing the precedence of .< might have been a valid solution if https://github.com/JuliaLang/julia/pull/5810 had stuck.

I really like the idea of deprecating the bitwise arithmetic meaning here, and restricting & and | to Bool and AbstractArray{Bool}. Matlab uses the names bitand and bitor for bitwise arithmetic, which seem sensible.

Ah, I should have used a different example, because -5 is being parsed as a number, and _then_ the juxtaposition takes place... it looks like unary operators, it actually is the other way around... (which I think is correct), i.e.:

julia> Meta.show_sexpr(:(~5A))
(:call, :~, (:call, :*, 5, :A))

I really like the idea of deprecating the bitwise arithmetic meaning here, and restricting & and | to Bool and AbstractArray{Bool}. Matlab uses the names bitand and bitor for bitwise arithmetic, which seem sensible.

+100 for this change

I really like the idea of deprecating the bitwise arithmetic meaning here, and restricting & and | to Bool and AbstractArray{Bool}. Matlab uses the names bitand and bitor for bitwise arithmetic, which seem sensible.

-1, that seems backwards to me. |= and &= are really useful syntax for doing masking operations, would be a bummer to lose those. Can someone explain why the elementwise versions of these operators aren't spelled .| and .& ?

I feel like we're really reaching the limits of what sticking dots in front of operators can do for us. #8450

homer rectal probing

&, |, and $ (as well as &=, |=, and $= are very heavily used in the sorts of bit twiddling programming that I frequently do... It would be very bad if they didn't work on unsigned and integer.

Can someone explain why the elementwise versions of these operators aren't spelled .| and .& ?

+1

This is probably worth its own issue for 0.6 (or 1.0), but listening to Guy Steele's JuliaCon talk https://www.youtube.com/watch?v=EZD3Scuv02g the main takeaway that stuck with me as far things that would be realistic to do in Julia would be non-transitive operator precedence.

I'm all for non-transitive operator precedence. I believe @JeffBezanson is pretty sold on the idea too (and as the "syntax czar" of the project, his opinion on the matter is significantly significant). Seem a little tricky to implement though.

I wonder if that could allow = and && or = and || to be on the same level and be right associative so that a = b && c parses as a = (b && c) while a && b = c parses as a && (b = c).

I believe the main point of "non-transitive precedence" was to make ambiguous cases an error that requires parentheses to clarify the code, not adding special cases that let people use even fewer parentheses.

It's both – otherwise you would just require parens all the time, problem solved. Non-transitive precedence allows omitted parens in cases where there's an obvious precedence between certain operators, even if that relation can't be extended to global ordering.

I wonder if that could allow = and && or = and || to be on the same level and be right associative so that a = b && c parses as a = (b && c) while a && b = c parses as a && (b = c).

I really like this idea, IMO I find it really ugly to always write condition(a) && (x = 2) it would be a lot more clear as condition(a) && x = 2 .

Could we allow && and || to participate in dot-fusion now? Especially if the @. macro applied to them, this seems like it could be a nice solution. As I noted in the first post:

A < 1 || A > 2 must be written differently if A is an array: (A .< 1) | (A .> 2)

Are there any arguments against allowing @. A < 1 || A > 2 to fuse into broadcast(x->x < 1 || x > 2, A)? Edit: I suppose we'd have to special case A .|| B to lower to broadcast((x,y)->x||y, A, B) since || isn't a real function. And .&& does parse right now, but it evaluates to a "misplaced "&" expression" error.

Coming back to this, it seems pretty clear to me that (a < b) & (c < d) is way more useful than a < (b & c) < d. Reopening and marking for triage.

For anecdotal evidence, the current precedence of & prompts some people to write things like this:

 hflights[.&(hflights[:Month] .== 1, hflights[:DayofMonth] .== 1), :]

(Perhaps I should note that I have WIP on this front. I am prioritizing things A_mul_B though, so not certain whether I will post that work on a short timescale. Best!)

Has the plan for precedence change of & and | to allow for the expected behavior of B[A .< 1 | A .> 2] been changed?

I haven't seen any update in the 2.5 years since the deprecation.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

iamed2 picture iamed2  Â·  3Comments

omus picture omus  Â·  3Comments

StefanKarpinski picture StefanKarpinski  Â·  3Comments

TotalVerb picture TotalVerb  Â·  3Comments

ararslan picture ararslan  Â·  3Comments