Following the discussion in https://github.com/manastech/crystal/issues/308, I realized it would be nice if Crystal had a built-in formatting command that would apply some basic style to the code (two-space indentation, etc.)
The Go language handles this quite nicely. Quoting from Effective Go:
Formatting issues are the most contentious but the least consequential. People can adapt to different formatting styles but it's better if they don't have to, and less time is devoted to the topic if everyone adheres to the same style. The problem is how to approach this Utopia without a long prescriptive style guide.
With Go we take an unusual approach and let the machine take care of most formatting issues. The
gofmtprogram (also available asgo fmt, which operates at the package level rather than source file level) reads a Go program and emits the source in a standard style of indentation and vertical alignment, retaining and if necessary reformatting comments.
I think this would be nice to have. Many people are obsesses with correctly formatted syntax... including me :-P
More cases that come to my mind are:
1+2*3.class Foo
def initialize
end
def bar
end
end
I once wrote a D plugin for Eclispe that copied most of JDT's code. I had to adapt the Java formatter to D's language. There I learned how they do it. First they parse the file and get an AST from it. Then they create a lexer for that same file. Then they traverse the AST and lex the source at the same time, while applying formatting rules. That way you can deal with each token while knowing where you are in the AST. Very smart.
We could do the same, but right now the parser will create nodes that don't exactly reflect what was written. For example when you write this:
foo.try &.bar
The parser will actually emit something like this:
foo.try { |_arg0| _arg0.bar }
We would have to define AST nodes for those cases and later unsugar them, like we do with other things (case, &&, ||).
@asterite Isn't it easier with a CST rather than a AST?
@asterite is there any improvement on this :pray:
I really like crystalfmt :+1: to be there :smile:
No progress yet. A formatter is something that will come after 1.0, unless someone else wants to tackle this. But it's not something trivial to do, you would need to extend the Parser and as you parse the code you'd need to format it.
In Go it's much easier because the language is pretty simple in terms of grammar.
One idea would be to rip off a lot of the work that's gone into Rubocop. That could involve converting Rubocop itself and the underlying AST parsing gems (https://github.com/whitequark/parser and https://github.com/yujinakayama/astrolabe) or it could mean building a tool in Crystal that could reuse the cops. Some of the Rubocop cops wouldn't be relevant for Crystal, some would be wrong, but some could be used as is.
Why not implement this in the ast.to_s code? If "standard compile" flag - it renders code any way it wants (for macro use etc.). If "stylize", it looks at all necessary rules passed (indent=4, rowspace_after_def=0,rowspace_after_classdef=1, space_around_ops=1, etc. whatever)
Via "location", original code can be looked up for the "keep as is"-cases, without storing "original token" in AST (where there are more ways to the same AST.. [unless, etc.]).
The amount of options will of course grow daily from requests ;-) )
@ozra The problem with that is that to_s will become cluttered quickly. See http://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/, an article I found on Reddit by the author of dartfmt, Dart's auto formatter. This is really something large-scale enough to belong _outside_ the compiler.
@kirbyfan64 - Yeah I pondered the alternatives, but came to the conclusion that every node's to_s would have an if-statement - one branch being the regular, the other being the "messier" one. So the plain version would still be first in every visitor. I just figured it would keep the stylizer always up to date (not implementing the super-options-renderer part still gives compiler-default formatted and keeps working).
Well, definitely a good point - it would be much clearer keeping it out. I'll take that into consideration for what I'm writing atm.
Thanks for reminding me of the line-breaking issue!!
If you want a built-in formatting tool, either open an issue describing exactly what the code formatter will do, or either submit a PR with the solution. Speaking of a hypothetical code formatter without knowing what it will do is useless.
Alternatively, I think a formatter can perfectly be done as 3rd party library.
I changed my mind :-)
I started a formatter branch. Right now it only works with some AST nodes (nil, chars, numbers, if, array literals, begin/end, parenthesis) but I'll eventually add support for all nodes.
Hey @asterite i'm curious why you changed your mind :smile:
Some reasons:
But note that the formatter I have in mind will mostly affects spaces and newlines: it won't change names or literals, etc. And gofmt works that way too. It would be really bad if the formatter somehow got in your way in a bad way.
The only "worry" I have is for big arrays, like this. But there are some heuristics that could be applied there.
Tried out the formatter as of 757bd1c . Mostly great changes.
Just a few changes jumped out to me that I didn't accept:
It'd be nice to keep the right aligned numbers
- bytes[ 0, 4].hexstring(buffer + 0)
- bytes[ 4, 2].hexstring(buffer + 9)
- bytes[ 6, 2].hexstring(buffer + 14)
- bytes[ 8, 2].hexstring(buffer + 19)
+ bytes[0, 4].hexstring(buffer + 0)
+ bytes[4, 2].hexstring(buffer + 9)
+ bytes[6, 2].hexstring(buffer + 14)
+ bytes[8, 2].hexstring(buffer + 19)
The right alignment of the *Decoder.new is maybe too much, but again the right aligned integers would be nice
# https://github.com/postgres/postgres/blob/master/src/include/catalog/pg_type.h
- register_decoder BoolDecoder.new, 16 # bool
- register_decoder ByteaDecoder.new, 17 # bytea
- register_decoder Int8Decoder.new, 20 # int8 (bigint)
- register_decoder Int2Decoder.new, 21 # int2 (smallint)
- register_decoder IntDecoder.new, 23 # int4 (integer)
- register_decoder DefaultDecoder.new, 25 # text
- register_decoder JsonDecoder.new, 114 # json
- register_decoder JsonbDecoder.new, 3802 # jsonb
- register_decoder Float32Decoder.new, 700 # float4
- register_decoder Float64Decoder.new, 701 # float8
- register_decoder DefaultDecoder.new, 705 # unknown
- register_decoder DateDecoder.new, 1082 # date
- register_decoder TimeDecoder.new, 1114 # timestamp
- register_decoder TimeDecoder.new, 1184 # timestamptz
- register_decoder UuidDecoder.new, 2950 # uuid
+ register_decoder BoolDecoder.new, 16 # bool
+ register_decoder ByteaDecoder.new, 17 # bytea
+ register_decoder Int8Decoder.new, 20 # int8 (bigint)
+ register_decoder Int2Decoder.new, 21 # int2 (smallint)
+ register_decoder IntDecoder.new, 23 # int4 (integer)
+ register_decoder DefaultDecoder.new, 25 # text
+ register_decoder JsonDecoder.new, 114 # json
+ register_decoder JsonbDecoder.new, 3802 # jsonb
+ register_decoder Float32Decoder.new, 700 # float4
+ register_decoder Float64Decoder.new, 701 # float8
+ register_decoder DefaultDecoder.new, 705 # unknown
+ register_decoder DateDecoder.new, 1082 # date
+ register_decoder TimeDecoder.new, 1114 # timestamp
+ register_decoder TimeDecoder.new, 1184 # timestamptz
+ register_decoder UuidDecoder.new, 2950 # uuid
end
My formatting for this is weird, so I don't know if it could make it into a tool
- (((((((((((((((( 0_u64
- ) | ptr[0] ) << 8
- ) | ptr[1] ) << 8
- ) | ptr[2] ) << 8
- ) | ptr[3] ) << 8
- ) | ptr[4] ) << 8
- ) | ptr[5] ) << 8
- ) | ptr[6] ) << 8
- ) | ptr[7] )
+ ((((((((((((((((0_u64) | ptr[0]) << 8) | ptr[1]) << 8) | ptr[2]) << 8) | ptr[3]) << 8) | ptr[4]) << 8) | ptr[5]) << 8) | ptr[6]) << 8) | ptr[7])
I don't understand the first suggestion, spaces will be stripped.
The second suggestion seems pretty arbitrary and kind of depends on the semantic you give the lines. The rule would be "align the last argument of a run of calls if they are number literals"? Mmm... I algo see the names are aligned, kind of really hard to guess what you want there.
The last case is a bug, the formatter should preserve newlines. I'll try to fix it.
By the way, the idea of the formatter is that it gives some kind of beauty and consistency to the code, but there will inevitably be cases you won't like. The idea is to stop worrying about these small cases and learn to live with the formatter.
I don't understand the first suggestion, spaces will be stripped.
That's because I left out a line of the diff that showed what was going on. Before the ones place in the digits and commas were aligned with the 10 line at the end
- bytes[ 0, 4].hexstring(buffer + 0)
- bytes[ 4, 2].hexstring(buffer + 9)
- bytes[ 6, 2].hexstring(buffer + 14)
- bytes[ 8, 2].hexstring(buffer + 19)
+ bytes[0, 4].hexstring(buffer + 0)
+ bytes[4, 2].hexstring(buffer + 9)
+ bytes[6, 2].hexstring(buffer + 14)
+ bytes[8, 2].hexstring(buffer + 19)
bytes[10, 6].hexstring(buffer + 24)
The idea is to stop worrying about these small cases and learn to live with the formatter.
Yes, once the formatter lands I'll go with whatever it does, so giving early feedback :)
These alignments, and especially the register_decoder example, are somewhat akin to the elastic tabstops #1682 (in a formatter, not a realtime elastic way ofc.). Alignments are a real nice feature of formatting, however of course quite a bit more complex to implement than construct-by-construct formatting.
@will I "fixed" the formatting of your long parenthesised expression, though now it ends up like this:
((((((((((((((((0_u64
) | ptr[0]) << 8
) | ptr[1]) << 8
) | ptr[2]) << 8
) | ptr[3]) << 8
) | ptr[4]) << 8
) | ptr[5]) << 8
) | ptr[6]) << 8
) | ptr[7])
The parentheses are aligned. I could try to make it nicer, but maybe this won't happen very often. I also think you might want to use a loop there, LLVM might optimize it away.
The alignment of other things like numbers is hard, it's really hard for the formatter to recognize these patterns and also to decide where to put spaces. In some obvious cases the formatter will align things, like in when and hash literals, but not in every place.
@asterite thanks. regarding optimizations, I did a lot of benchmarking with various approaches, and it seemed that llvm was able to figure out that it was byte swapping in every approach.
@will after a big change to the algorithm used by the formatter, your code with many parentheses remains the same as how you originally wrote it :-)
Yeah, LLVM is a beast.
Amazing
You can try the "formatter" branch of the Sublime plugin to have integration with the formatter :wink: https://github.com/manastech/sublime-crystal/tree/formatter
Because the current release doesn't have the formatter, you need to compile one and specify the path in the settings of the plugin, with something like:
{
"auto_format": true,
"crystal_cmd": "/path/to/crystal/bin/crystal"
}
Most helpful comment
I changed my mind :-)
I started a formatter branch. Right now it only works with some AST nodes (nil, chars, numbers, if, array literals, begin/end, parenthesis) but I'll eventually add support for all nodes.