Continuing the discussion started in #9266.
The current situation is a bit painful regarding formatting of AST. Macro.to_string/1 is the only thing that we have that takes AST and returns a somehow "formatted" string, and Code.format_string!/2 is the only thing that we have that properly formats (according to the formatter) a string of code. We're missing a piece of the puzzle, that is, a thing that takes AST and returns properly formatted code. The problem is that the formatter needs a string input in order to gather information about the original shape of the code: for example, the formatter needs to know if an integer was written out as 0x00 or as 0, because the AST for those two is the same (0). In fact, the formatter operates on a slightly modified (but still valid) AST where there are no literals. All literals are wrapped inside {:__block__, meta, literal} tuples so that they can have metadata attached to them. The metadata is used to store lots of stuff like the original representation of a literal, or the do: vs do/end blocks, and so on.
With this information, there's two things we can do to add the missing piece to the puzzle.
allow formatter to work on normal AST: the first thing that @josevalim and I discussed was allowing the formatter to work on normal AST. This would mean leaving the formatter as it is today for all "formatter AST", but adding handling of normal AST with sane defaults. For example, an integer like 0x00 would be fed to the formatter as its AST (0) and hence formatted as 0. This could work, but the problem is that if a tool wants to read code, turn it into AST, modify the AST, and then call the formatter on the AST, it would still lose a lot of the original formatting intended by the user. This could allow us to completely get rid of Macro.to_string/1 and only have Code.format_ast/2, but we're not sure yet because of the callback supported by Macro.to_string/2.
inject formatter metadata from Macro.to_string/1: the alternative we have is to remove the "formatting" from Macro.to_string/1 and only have Macro.to_string/1 take AST and decorate it with all the stuff that the formatter needs. This would mean turning every literal into a :__block__, adding necessary metadata, and so on.
I think we're leaning on the option 2. because it's likely simpler to implement. Both options would still have to deal with the callback in Macro.to_string/2 which might be a bit of a pain (but can't be removed for backwards compatibility).
One other thing to consider is performance. In the formatter, we don't really focus on performance since it's a "static" tool that runs outside of the runtime of an application. On the other hand, Macro.to_string/1,2 could be used in production code and is used internally by Elixir itself as well.
The other issue is that Macro.to_string accepts a second argument to annotate the result and that would be very hard to achieve with the formatter. At least ExDoc needs this feature and we would need a way to implement it without much hassle (it could be a separate pass on the formatted AST using regexes in HexDocs case).
Perhaps the best option is not to change Macro.to_string but phase it out while we implement Code.format_quoted.
I think that the biggest problem we should solve with Code.format_quoted/2 is to have a way for tools to read a string of code, modify it, and then write it down as formatted. For example, Credo could enforce , do: blocks or something like that (which it can't do today in an easy way). We could have the formatter accept "normal" AST but it still would lose original user formatting. I think the only way to do this is to somehow expose the formatter-enriched AST somehow. Thoughts?
@whatyouhide I understand but at the moment I would put it as lower priority than having format_quoted itself. I.e. I would not block the development of format_quoted on this feature.
@josevalim if we don't want to replace Macro.to_string/1,2, then what's the use case of format_quoted?
The case you mentioned is not handled by Macro.to_string either, so my point is that it should not be a blocker for adding format_quoted.
Perhaps the best option is not to change Macro.to_string but phase it out while we implement Code.format_quoted.
ExDoc will no longer use Macro.to_string/2. So I propose to introduce either Code.format_quoted or Code.quoted_to_string and completely forget about Macro.to_string.
Macro.to_string/1 could be depreated and removed in future releases. Alternatively we keep only Macro.to_string/1 (as an alias for Code.quoted_to_string) and deprecate /2. The rationale is that Macro.to_string is used in many places, so it is probably not worth removing it (and keeping the alias is very cheap).
Closing this for now as we can't really remove Macro.to_string/2 and there is no one tackling this issue at the moment. Thanks!
Most helpful comment
I think that the biggest problem we should solve with
Code.format_quoted/2is to have a way for tools to read a string of code, modify it, and then write it down as formatted. For example, Credo could enforce, do:blocks or something like that (which it can't do today in an easy way). We could have the formatter accept "normal" AST but it still would lose original user formatting. I think the only way to do this is to somehow expose the formatter-enriched AST somehow. Thoughts?