Crystal: Built-in formatting tool

Created on 9 Dec 2014 · 21Comments · Source: crystal-lang/crystal

Following the discussion in https://github.com/manastech/crystal/issues/308, I realized it would be nice if Crystal had a built-in formatting command that would apply some basic style to the code (two-space indentation, etc.)

The Go language handles this quite nicely. Quoting from Effective Go:

Formatting issues are the most contentious but the least consequential. People can adapt to different formatting styles but it's better if they don't have to, and less time is devoted to the topic if everyone adheres to the same style. The problem is how to approach this Utopia without a long prescriptive style guide.

With Go we take an unusual approach and let the machine take care of most formatting issues. The gofmt program (also available as go fmt, which operates at the package level rather than source file level) reads a Go program and emits the source in a standard style of indentation and vertical alignment, retaining and if necessary reformatting comments.

feature

Source

sferik

👍2

Most helpful comment

I changed my mind :-)

I started a formatter branch. Right now it only works with some AST nodes (nil, chars, numbers, if, array literals, begin/end, parenthesis) but I'll eventually add support for all nodes.

asterite on 3 Oct 2015

❤3

All 21 comments

I think this would be nice to have. Many people are obsesses with correctly formatted syntax... including me :-P

More cases that come to my mind are:

No spaces between operators: 1+2*3.
Extra newlines between definitions:

class Foo

  def initialize
  end


  def bar
  end

end

I once wrote a D plugin for Eclispe that copied most of JDT's code. I had to adapt the Java formatter to D's language. There I learned how they do it. First they parse the file and get an AST from it. Then they create a lexer for that same file. Then they traverse the AST and lex the source at the same time, while applying formatting rules. That way you can deal with each token while knowing where you are in the AST. Very smart.

We could do the same, but right now the parser will create nodes that don't exactly reflect what was written. For example when you write this:

foo.try &.bar

The parser will actually emit something like this:

foo.try { |_arg0| _arg0.bar }

We would have to define AST nodes for those cases and later unsugar them, like we do with other things (case, &&, ||).

asterite on 9 Dec 2014

👍1

@asterite Isn't it easier with a CST rather than a AST?

vendethiel on 9 Dec 2014

@asterite is there any improvement on this :pray:

I really like crystalfmt :+1: to be there :smile:

sdogruyol on 15 Sep 2015

👍1

No progress yet. A formatter is something that will come after 1.0, unless someone else wants to tackle this. But it's not something trivial to do, you would need to extend the Parser and as you parse the code you'd need to format it.

In Go it's much easier because the language is pretty simple in terms of grammar.

asterite on 15 Sep 2015

One idea would be to rip off a lot of the work that's gone into Rubocop. That could involve converting Rubocop itself and the underlying AST parsing gems (https://github.com/whitequark/parser and https://github.com/yujinakayama/astrolabe) or it could mean building a tool in Crystal that could reuse the cops. Some of the Rubocop cops wouldn't be relevant for Crystal, some would be wrong, but some could be used as is.

tristil on 18 Sep 2015

Why not implement this in the ast.to_s code? If "standard compile" flag - it renders code any way it wants (for macro use etc.). If "stylize", it looks at all necessary rules passed (indent=4, rowspace_after_def=0,rowspace_after_classdef=1, space_around_ops=1, etc. whatever)
Via "location", original code can be looked up for the "keep as is"-cases, without storing "original token" in AST (where there are more ways to the same AST.. [unless, etc.]).

The amount of options will of course grow daily from requests ;-) )

ozra on 29 Sep 2015

@ozra The problem with that is that to_s will become cluttered quickly. See http://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/, an article I found on Reddit by the author of dartfmt, Dart's auto formatter. This is really something large-scale enough to belong _outside_ the compiler.

refi64 on 29 Sep 2015

@kirbyfan64 - Yeah I pondered the alternatives, but came to the conclusion that every node's to_s would have an if-statement - one branch being the regular, the other being the "messier" one. So the plain version would still be first in every visitor. I just figured it would keep the stylizer always up to date (not implementing the super-options-renderer part still gives compiler-default formatted and keeps working).
Well, definitely a good point - it would be much clearer keeping it out. I'll take that into consideration for what I'm writing atm.
Thanks for reminding me of the line-breaking issue!!

ozra on 29 Sep 2015

If you want a built-in formatting tool, either open an issue describing exactly what the code formatter will do, or either submit a PR with the solution. Speaking of a hypothetical code formatter without knowing what it will do is useless.

Alternatively, I think a formatter can perfectly be done as 3rd party library.

asterite on 30 Sep 2015

I changed my mind :-)

I started a formatter branch. Right now it only works with some AST nodes (nil, chars, numbers, if, array literals, begin/end, parenthesis) but I'll eventually add support for all nodes.

asterite on 3 Oct 2015

❤3

Hey @asterite i'm curious why you changed your mind :smile:

sdogruyol on 3 Oct 2015

Some reasons:

It's fun and I like challenges :wink:
@waj likes it too, and obviously many others too
I did it in the past in the Descent plugin for Eclipse (well, I copied the code from JDT, but I learned a lot doing it), so it should be pretty easy to do
We have a coding style document but applying it manually to existing code can be tedious. It would be nice if a tool could do it for you.
It can be integrated automatically into editors so you don't have to waste time aligning small spaces, adding/removing lines, etc.
Discussions about indentation and spaces will (hopefully) disappear

But note that the formatter I have in mind will mostly affects spaces and newlines: it won't change names or literals, etc. And gofmt works that way too. It would be really bad if the formatter somehow got in your way in a bad way.

The only "worry" I have is for big arrays, like this. But there are some heuristics that could be applied there.

asterite on 3 Oct 2015

👍1

Tried out the formatter as of 757bd1c . Mostly great changes.

Just a few changes jumped out to me that I didn't accept:

It'd be nice to keep the right aligned numbers

-          bytes[ 0, 4].hexstring(buffer + 0)
-          bytes[ 4, 2].hexstring(buffer + 9)
-          bytes[ 6, 2].hexstring(buffer + 14)
-          bytes[ 8, 2].hexstring(buffer + 19)
+          bytes[0, 4].hexstring(buffer + 0)
+          bytes[4, 2].hexstring(buffer + 9)
+          bytes[6, 2].hexstring(buffer + 14)
+          bytes[8, 2].hexstring(buffer + 19)

The right alignment of the *Decoder.new is maybe too much, but again the right aligned integers would be nice

     # https://github.com/postgres/postgres/blob/master/src/include/catalog/pg_type.h
-    register_decoder    BoolDecoder.new,   16 # bool
-    register_decoder   ByteaDecoder.new,   17 # bytea
-    register_decoder    Int8Decoder.new,   20 # int8 (bigint)
-    register_decoder    Int2Decoder.new,   21 # int2 (smallint)
-    register_decoder     IntDecoder.new,   23 # int4 (integer)
-    register_decoder DefaultDecoder.new,   25 # text
-    register_decoder    JsonDecoder.new,  114 # json
-    register_decoder   JsonbDecoder.new, 3802 # jsonb
-    register_decoder Float32Decoder.new,  700 # float4
-    register_decoder Float64Decoder.new,  701 # float8
-    register_decoder DefaultDecoder.new,  705 # unknown
-    register_decoder    DateDecoder.new, 1082 # date
-    register_decoder    TimeDecoder.new, 1114 # timestamp
-    register_decoder    TimeDecoder.new, 1184 # timestamptz
-    register_decoder    UuidDecoder.new, 2950 # uuid
+    register_decoder BoolDecoder.new, 16     # bool
+    register_decoder ByteaDecoder.new, 17    # bytea
+    register_decoder Int8Decoder.new, 20     # int8 (bigint)
+    register_decoder Int2Decoder.new, 21     # int2 (smallint)
+    register_decoder IntDecoder.new, 23      # int4 (integer)
+    register_decoder DefaultDecoder.new, 25  # text
+    register_decoder JsonDecoder.new, 114    # json
+    register_decoder JsonbDecoder.new, 3802  # jsonb
+    register_decoder Float32Decoder.new, 700 # float4
+    register_decoder Float64Decoder.new, 701 # float8
+    register_decoder DefaultDecoder.new, 705 # unknown
+    register_decoder DateDecoder.new, 1082   # date
+    register_decoder TimeDecoder.new, 1114   # timestamp
+    register_decoder TimeDecoder.new, 1184   # timestamptz
+    register_decoder UuidDecoder.new, 2950   # uuid
   end

My formatting for this is weird, so I don't know if it could make it into a tool

-        (((((((((((((((( 0_u64
-          ) | ptr[0] ) << 8
-          ) | ptr[1] ) << 8
-          ) | ptr[2] ) << 8
-          ) | ptr[3] ) << 8
-          ) | ptr[4] ) << 8
-          ) | ptr[5] ) << 8
-          ) | ptr[6] ) << 8
-          ) | ptr[7] )
+        ((((((((((((((((0_u64) | ptr[0]) << 8) | ptr[1]) << 8) | ptr[2]) << 8) | ptr[3]) << 8) | ptr[4]) << 8) | ptr[5]) << 8) | ptr[6]) << 8) | ptr[7])

will on 12 Oct 2015

I don't understand the first suggestion, spaces will be stripped.

The second suggestion seems pretty arbitrary and kind of depends on the semantic you give the lines. The rule would be "align the last argument of a run of calls if they are number literals"? Mmm... I algo see the names are aligned, kind of really hard to guess what you want there.

The last case is a bug, the formatter should preserve newlines. I'll try to fix it.

By the way, the idea of the formatter is that it gives some kind of beauty and consistency to the code, but there will inevitably be cases you won't like. The idea is to stop worrying about these small cases and learn to live with the formatter.

asterite on 12 Oct 2015

I don't understand the first suggestion, spaces will be stripped.

That's because I left out a line of the diff that showed what was going on. Before the ones place in the digits and commas were aligned with the 10 line at the end

-          bytes[ 0, 4].hexstring(buffer + 0)
-          bytes[ 4, 2].hexstring(buffer + 9)
-          bytes[ 6, 2].hexstring(buffer + 14)
-          bytes[ 8, 2].hexstring(buffer + 19)
+          bytes[0, 4].hexstring(buffer + 0)
+          bytes[4, 2].hexstring(buffer + 9)
+          bytes[6, 2].hexstring(buffer + 14)
+          bytes[8, 2].hexstring(buffer + 19)
           bytes[10, 6].hexstring(buffer + 24)

The idea is to stop worrying about these small cases and learn to live with the formatter.

Yes, once the formatter lands I'll go with whatever it does, so giving early feedback :)

will on 12 Oct 2015

These alignments, and especially the register_decoder example, are somewhat akin to the elastic tabstops #1682 (in a formatter, not a realtime elastic way ofc.). Alignments are a real nice feature of formatting, however of course quite a bit more complex to implement than construct-by-construct formatting.

ozra on 12 Oct 2015

@will I "fixed" the formatting of your long parenthesised expression, though now it ends up like this:

((((((((((((((((0_u64
               ) | ptr[0]) << 8
             ) | ptr[1]) << 8
           ) | ptr[2]) << 8
         ) | ptr[3]) << 8
       ) | ptr[4]) << 8
     ) | ptr[5]) << 8
   ) | ptr[6]) << 8
 ) | ptr[7])

The parentheses are aligned. I could try to make it nicer, but maybe this won't happen very often. I also think you might want to use a loop there, LLVM might optimize it away.

The alignment of other things like numbers is hard, it's really hard for the formatter to recognize these patterns and also to decide where to put spaces. In some obvious cases the formatter will align things, like in when and hash literals, but not in every place.

asterite on 12 Oct 2015

@asterite thanks. regarding optimizations, I did a lot of benchmarking with various approaches, and it seemed that llvm was able to figure out that it was byte swapping in every approach.

will on 12 Oct 2015

@will after a big change to the algorithm used by the formatter, your code with many parentheses remains the same as how you originally wrote it :-)

Yeah, LLVM is a beast.

asterite on 13 Oct 2015

Amazing

will on 14 Oct 2015

You can try the "formatter" branch of the Sublime plugin to have integration with the formatter :wink: https://github.com/manastech/sublime-crystal/tree/formatter

Because the current release doesn't have the formatter, you need to compile one and specify the path in the settings of the plugin, with something like:

{
  "auto_format": true,
  "crystal_cmd": "/path/to/crystal/bin/crystal"
}

waj on 15 Oct 2015

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Lazy getter macro

asterite · 3Comments

Allow type annotation + initialization `@var : Type = value`

oprypin · 3Comments

Unify structs and C structs

oprypin · 3Comments

Allow accessing named tuples with strings

costajob · 3Comments

ICE Bug: `def refill_in_buffer [...]` in crystal/src/io/encoding.cr:123:13 has no type

TechMagister · 3Comments