Let's discuss cleaning up some kinks in Julia's AST format to make things easier for macro authors.
In many cases, the parser does too much work, losing information:
:(1(2))
→ :(1 * 2)
a ? b : c
→ if a ...
if a b elseif c d end
→ if a b else (if c d end) end
f() do ...
→ f(() -> ...)
I think we should aim for round-trip-ability here and have this work be done only at lowering time. Short function definitions and broadcasting syntax are examples of this being done right.
This is more of a parser issue than an AST issue, but it's not currently possible to write for $cond body end
as the parser checks for an x = y
iteration specification; instead you must use $(Expr(:for, cond, :body)
. For the same reason, it's not possible to write @capture(ex, for cond_ body_ end)
with MacroTools.
let
and for
have the opposite ordering for declarations and body.
The existence of both kw
and =
is strange as it gives us the opposite problem to that above: we have two ways to represent the same surface syntax. Perhaps this is necessary; MacroTools normalises kw
to =
to avoid surprising failures.
We currently have two ways to represent keyword arguments in calls: f(a, b = c)
which has little processing, and f(a; b = c)
which becomes f($(Expr(:parameters, Expr(:kw, :a, :b)), a)
. The latter is one of the few places (let
is another) where the ordering/structure of the expression mistmatches that of the surface syntax. I suspect a nicer design is possible but I don't have an exact proposal. A matching pattern like f(a, xs__)
currently matches kwargs style and not the other, so at least putting the params at the end of the call consistently would help with that.
Losing information about the AST limits what macros can do, and/or adds strange edge cases to what they can do. Consider @static if
for example:
@static if foo
f()
elseif bar
g()
else
h()
end
@static if foo
f()
else
bar ? g() : h()
end
The fact that we can't tell these two examples apart means that we have to choose a surprising behaviour for one of them.
As another example, due to the parsing of 1(2)
you can't use DataFlow.jl's syntax for graphs if the graph contains literal numbers. This isn't a crucial use case, but it illustrates the kind of surprising edge cases this leads to, even for innocuous-looking transformations.
I'm not sure to what extent this can be done after 1.0. What levels of AST are part of Julia's public API? People will be relying on it in practice, although the users should be few enough and expert enough to alleviate the impact.
This adds some work for macro authors similar to that caused by short/long-form function definitions, in cases where the difference doesn't matter. This should be very manageable, as it's easy to provide utility functions that do various parts of normalisation (as MacroTools does already for longdef
and shortdef
).
Let me know if I've missed anything. Input from other macro writers welcome.
I had assumed it would be easier to write macros if parsing normalized things to some extent, but if people feel that hasn't been the case then ok.
I agree we should try to get rid of the kw
head and only use =
.
elseif
, ?
, and do
should be pretty easy to change.
I assume the alternative parsing for 1(2)
would be as a call
. However given how common call
nodes are, it seems extremely taxing to check every call to see if the first argument is a numeric literal, in which case you have to know that it will be a multiply instead. We could eliminate implicit multiplication with parens, but then we'd be left with 2(...)
as wasted syntax.
I like these being pure sugar and not having to deal with them as separate cases in macros. A tokenizer should be able to round trip these things, but macros shouldn't need to worry about what are semantically just formatting differences in their input.
@JeffBezanson glad to hear that you're on board.
it seems extremely taxing to check every call to see if the first argument is a numeric literal
Assuming you're talking about the burden on macro writers, it shouldn't be that bad. It's a trivial utility function in fact (and this generalises to all the discussed changes):
lower_muls(ex) = postwalk(x -> @capture(x, n_Number(y_)) ? :($n*$y) : x, ex)
Then just call lower_muls
once and you get the same AST you would've before. Of course, most macros won't have to do this, in the same way that most macros don't currently have to normalise function definitions or broadcast syntax.
I think it's worth emphasising that macros have to deal with these kinds of things already; this just reduces the number of awkward special cases. I don't expect this to increase the burden on macros (or I wouldn't propose it) so if anyone has concerns about that I'm happy to see examples and work it through.
macros shouldn't need to worry about what are semantically just formatting differences
They're only "semantically just formatting differences" if you aren't changing the semantics, which is exactly what things like @static if
do.
Yichao had a good point about kw
: https://github.com/JuliaLang/julia/issues/20000#issuecomment-272288503
It avoids changing the meaning of an expression based on the context it's spliced into.
if you aren't changing the semantics, which is exactly what things like
@static if
do.
@static if
doesn't change that the semantics of if
and the ternary operator are completely interchangeable - the fact that they're treated exactly the same in the AST means @static
was easier to write in a way that works on ternary expressions with no extra effort.
I would point out that the round-tripping issue could be addressed by choosing which form to print with based on properties of the expressions. A conditional with multiple statements has to be an if/else; a conditional with single-expression branches can be either one, if the expressions are short enough, we can format them as a ternary operator.
There are quite a few more ambiguous cases , e.g.
import A: b, c
vs import A.b, A.c
or
a ? (b;c) : d
vs
if a
b
c
else
d
end
A couple more things to consider:
(a...; b...) -> f(a...; b...)
gets parsed confusingly as
begin
a...
b...
end -> f(a...; b...)
type a
b
c
end
shouldn't get parsed with a begin block IMHO because b and c are not executed sequentially.
I've been using NumberedLines.jl
for working with numbered lines but I'd argue that at least some of its features should be included in base. Having a NumberedLine type (or AST node) which includes both a line number and expression would be I think both much safer and much more intuitive.
Parsing code blocks as Expr(:block)
should not be a problem here.
(a..., b...) -> ...
could be improved, but not necessarily the parsingExpr(:block)
does not imply evaluation so not executed sequentially
is not a good argument for parsing differently as long as there's no ambiguity from the context. By this argument
begin end
in quote shouldn't be parsed as Expr(:block)
since it's not evaluted, in fact nothing should parse in quote since none of them are evaluated as usual.function f() end
shouldn't be parsed as a call either since it's not evaluated.(a, b) -> 2
the argument list shouldn't be parsed as tuple since it's not evaluated.let a = 1, b = 2, c::Int end
should not use Expr(:block)
for the var list since they are not evaluated with the same rule as a normal block (in fact, this block uses a similar rule to a struct
definition).None of the cases above has any ambiguity and there's no reason to create many more expression types for people to handle in any of these cases.
One more is the arguments to for, which hopefully could be similar to the arguments to :generator
e = quote
result = 0
for i = 1:10, j = 11:20
result = result + i + j
end
(i + j for i = 1:10, j = 11:20)
end
e.args[4].args
e.args[6].args
Status of this:
elseif
and let
have been fixedkw
vs. =
1(2)
as a call, but perhaps there should be an expression head for juxtaposition??
and do
are still up for debateActual round-trip-ability is not on the table here; CSTParser handles that very well and moving fully to that is way out of scope for 1.0.
From triage: preserving do
sounds good, that's next on the list.
Please mention more specific examples of weird ASTs if anybody finds any, but otherwise I think we're done here.
Awesome. Do I take it we are not in favour of preserving ?
, or of delaying for $cond end
errors later in the pipeline?
Most helpful comment
Please mention more specific examples of weird ASTs if anybody finds any, but otherwise I think we're done here.