Crystal: Changes to macros

Created on 20 May 2016 · 5Comments · Source: crystal-lang/crystal

This is a draft text, but before it gets lost or i forget to publish it, here it is:

Hi,

i think at the current macro language is too weak. I think it is more of a
code-template language. This is nice for simple macros that basically are code
templates, but macros that have complex logic in it get extremely complicated.

I think this should be changed. Here is a example
that is difficult to implement and hard to understand:

# flatten a tuple literal
macro flatten_tuple(t)
  {%
    queue = [] of ASTNode
    res = [] of ASTNode
  %}
  {% for e in t %}
    {% queue << e %}
  {% end %}
  {% for e in queue %}
    {% if e.class_name == "TupleLiteral" %}
      {% for n  in e %}
        {% queue << n %}
      {% end %}
    {% else %}
      {% res << e %}
    {% end %}
  {% end %}
  { {{*res}} }
end

Problems:

i always have to put loops in separate "{% %}"
it is actually buggy because it is not possible to implement a depth-first-traversal

Here are some "hacks" i use in order to achieve what i need. Maybe there are
better solutions, but since the documentation is not very extensive this is
the best i could come up with.

whitespaces are a problem, so i usually generate a string macro-variable and evaluate it as a result
not being able to call macros from macros limits what you can do with macros.
there is more, but i dont remember :-)

This is my proposal for a IMO better macro language:

the macro logic should be the default (the stuff in {% %}), not the output. The output should be generated by returning a string or an instance of an AST node.
one should be able to call macros from macros. since one passes AST nodes, it should be kinda like a regular method call
macros should be able to call regular methods, e.g. array methods on ArrayLiteral etc. (maybe just pure functions?)
one should be able to use regular crystal (case statement, etc)
macro parameters should be typeable, so when you pass the wrong thing into a macro you get a type error. also you can overload macros.
some macros should have access to types of ASTNodes. But order matters here: What if you are enumerating the methods of a class, but some of the methods are generate in a macro, or worse, by the current macro. or we need something like c++ templates with template metaprogramming.
you should be able to raise in a macro which halts the compiler
better debug output for macros: when a syntaks error is produced by a macro, output the complete macro output or so

Here are some examples:

def array_reverse(array)
  # regular crystal implementation
end

macro foo(lit : AST::ArrayLiteral)
  lit.to_a.reverse.inspect # returning a string
end

macro flatten_tuple(exp : AST::TupleLiteral)
  result = TupleLiteral.new

  exp.args.each do |arg|
    case exp
      when AST::TupleLiteral then result.args += flatten_tuple(arg)
      else result.args << arg
    end
  end

  result # returning an AST node
end

macro property(t : AST::TypeDeclaration)
  <<-TEMPLATE
  def #{t.name}
    @#{t.name}
  end

  def #{t.name}=(value : #{t.type})
    @#{t.name} = value
  end
  TEMPLATE
end

For me calling macros from macros and changing macros to return a string or AST nodes is the biggest issue. I need that for my parser combinator framework (and probably two other projects of mine). Here is how it is supposed to work:

# Variant 1
class Parser < Syntaks::Parser
  include EBNF

  #rule(root, {assignment >> /[ \t]*\n/})
  rule(root, {assignment})
  rule(assignment, id >> /\s+/ >> "=" >> /\s+/ >> value)
  rule(id, /\w+/)
  rule(value, id | /\d+/)
end

def test_acceptance
  Parser.new.call("test = 15") as Success
end

# Variant 2
class Parser < Syntaks::Parser
  rules do
    root      = call
    call      = "method" >> /\s+/ >> id >> param_list
    param_list = "(" >> params >> ")"
    params    = param >> {"," >> param}
    param     = int_lit | name_lit
    int_lit   = /\d+/
    name_lit  = /\w+/
    id        = /\w+/
  end
end

def test_acceptance
  assert Parser.new.call("method test(banana,1337,9001)").is_a?(Success)
  assert Parser.new.call("method a(1)").is_a?(Success)

  assert Parser.new.call("method test()").is_a?(Failure)
end

That actually works. The problems begin when generating the AST. I dont want to generate a parse tree, but an AST. And also, i want to change the structure by passing blocks in the definition:

rule(:method_definition, method_head >> method_body >> inline_ws_opt >> method_end) do |head, body, _, _|
  MethodDef.new(head, body)
end

The parser combinators would generate nested objects for sequences, so actually i would just retrieve one node in the block which has nested nodes for the sequences. But i want sequences to pass their result as multiple arguments to the block. For that i need to be able to call macros from macros.

draft compiler

Source

Ragmaanir

👍4

Most helpful comment

Thanks for the very detailed explanation!

We actually discuss how to enhance macros from time to time, though many times we stumble upon the same problems, some of which you mention.

Our original idea was to compile macros down to programs and then invoke them passing pointers to AST node that exist in memory, that are in the current program. This has the issue that you mention, that we'd first need to compile the methods defined in the program before the macro invocation, but what if that in turns needs other macros. It's kind of recursive and not doable (I explained it briefly because I don't remember all the details).

It's also curious that every programming language I know that has compile-time features or macros use an interpreter or a VM to expand them:

Lips has a VM/interpreter
Elixir uses Erlang, which has a VM and an interpreter
D has compile-time evaluation by using an intepreter
Nim is similar to D in this regard, although much more powerful because it lets you manipulate AST nodes at compile time, and for this it uses an interpreter (I think Araq said it's now using a JIT and a VM for this)
Rust expands macros at compile-time with a program that's hard-coded in compiler. I think you can do more powerful stuff by registering plugins, so basically you run a program at compile-time. This would be similar to our run macro call, only that our result is a string that is parsed back (though you can of course create AST nodes and then turn them into strings).

As a separate topic, many languages that allow very powerful macros to be created almost always advice you "Don't use macros! Only use them when you really need them! They are dangerous!".

In my opinion, macros should be used to avoid some (not all) boilerplate. Many will probably not like what I'll say now, but I actually like it that macros are kind of limited.

The problem with macros is that when you have a problem to solve, and you have super powerful macros, you stop and think "Hmm... how can I use macros to create a super awesome DSL that will allow me to solve this problem in a very elegant way?". Well, my problem with that is that you suddenly forgot that you had a problem to solve _at runtime_. Macros only work _at compile-time_. Maybe without using macros you could have solved the problem in 5 minutes, maybe with some duplicated code. So with dumb macros you first would think how to solve the problem at runtime, with methods and objects, and then, at the end, see if you find a way to reduce some boilerplate with macros.

An example of the above is JSON.mapping. The whole macro is 100 lines of code (it could be shorter, but the macro allows for a lot of configurations). But the real code that solves JSON is the lexer, the parser and the pull parser. The macro merely generates a bit of code to use the pull parser, and to avoid some boilerplate. Maybe with a powerful macro you'd be tempted to create a specific JSON parser for each macro invocation, but then you'd only cover that use case, while the runtime pull parser covers a lot more cases.

If we look at macros in the standard library we have:

property, getter, setter: avoid boilerplate for defining short methods
record: avoid boilperplate for defining an immutable struct with some fields
pp: avoid repeating an expression in the output
JSON.mapping, YAML.mapping: avoid some boilerplate in a new(JSON::PullParser) you can define (it's really simple not to use JSON.mapping and read the pull parser manually)
Reference#to_s and Reference#inspect: automatically inject boilperplate code that inspects an object's instance variables
spawn(call): avoid the boilperplate of creating a proc, invoking the call inside it, and then invoking the proc with the call's argument
Enum.flags: avoid a bit of duplication in Flag::One | Flag::One | Flag::Three by letting you do Flag.flags(One, Two Three)
ecr: this one uses macro run, to avoid manually translating template code to crystal

In other cases we use macros to loop over some types or expression to define similar methods on similar types.

I'd really like macros to be used in this way, as their use is simple and they are very easy to understand. It also keeps compile times low, more complex macros need more time to execute (specially because they are interpreted). But of course the current macro language is more powerful than those use cases, though not super powerful so it kind of limits you (but I think this is good).

I recommend watching this excellent talk about what macros can do, and why they should be avoided: https://www.youtube.com/watch?v=o69H0MXCNxw

And, if you really want to do "whatever you want" at compile time, you can always use the run macro call, where there's no need to have an interpreter or a VM, it's just Crystal code that executes like any other program (so the implementation of it is also easy, but more powerful than an interpreter).

asterite on 21 May 2016

👍4

All 5 comments

Seems to me, at a glance, that either:

"run-macros" needs to be integrated in to the syntax more simply (externalizing to own source, taking care of possible deps, compiling and adding call formalia automatically), or...
the entire language must be self-interpretable (_don't_ go down this road! it would be a nightmare maintenance-wise and impossible to support as soon as a BigInt or whatever clib-based is used, requiring dual-implementations)

Will follow ideas here with curiosity!

ozra on 20 May 2016

Thanks for the very detailed explanation!

We actually discuss how to enhance macros from time to time, though many times we stumble upon the same problems, some of which you mention.

It's also curious that every programming language I know that has compile-time features or macros use an interpreter or a VM to expand them:

Lips has a VM/interpreter
Elixir uses Erlang, which has a VM and an interpreter
D has compile-time evaluation by using an intepreter
Nim is similar to D in this regard, although much more powerful because it lets you manipulate AST nodes at compile time, and for this it uses an interpreter (I think Araq said it's now using a JIT and a VM for this)
Rust expands macros at compile-time with a program that's hard-coded in compiler. I think you can do more powerful stuff by registering plugins, so basically you run a program at compile-time. This would be similar to our run macro call, only that our result is a string that is parsed back (though you can of course create AST nodes and then turn them into strings).

As a separate topic, many languages that allow very powerful macros to be created almost always advice you "Don't use macros! Only use them when you really need them! They are dangerous!".

In my opinion, macros should be used to avoid some (not all) boilerplate. Many will probably not like what I'll say now, but I actually like it that macros are kind of limited.

If we look at macros in the standard library we have:

property, getter, setter: avoid boilerplate for defining short methods
record: avoid boilperplate for defining an immutable struct with some fields
pp: avoid repeating an expression in the output
JSON.mapping, YAML.mapping: avoid some boilerplate in a new(JSON::PullParser) you can define (it's really simple not to use JSON.mapping and read the pull parser manually)
Reference#to_s and Reference#inspect: automatically inject boilperplate code that inspects an object's instance variables
spawn(call): avoid the boilperplate of creating a proc, invoking the call inside it, and then invoking the proc with the call's argument
Enum.flags: avoid a bit of duplication in Flag::One | Flag::One | Flag::Three by letting you do Flag.flags(One, Two Three)
ecr: this one uses macro run, to avoid manually translating template code to crystal

In other cases we use macros to loop over some types or expression to define similar methods on similar types.

I recommend watching this excellent talk about what macros can do, and why they should be avoided: https://www.youtube.com/watch?v=o69H0MXCNxw

asterite on 21 May 2016

👍4

It's also worth noting that this is just my opinion, and I know that @waj and @bcardiff would like much more powerful and flexible macros (and I'm sure many more in the community too!), though of course we don't have a clear idea of how to achieve that.

asterite on 21 May 2016

Just to chip in, opinionated, I totally agree with @asterite, also thinking macros should be just strong enough to avoid boiler plate. When one starts to go into the DSL territory (which of course is fine for those use cases), run macros are fine, but it would be nice to be able to use them "more transparently", avoiding the run-boilerplate for making run-macros ;-) Fully "self-macroable linguistics" (there's probably a common term for this) feels more like a vanity thing.

ozra on 22 May 2016

We always have macros in our backlog as something to be improved, so this issue doesn't need to remain open. If we find a way to improve this situation, we will.

asterite on 26 Sep 2017

Was this page helpful?

0 / 5 - 0 ratings