Compiler: Proposal: Optional Commas

Created on 3 Jul 2015  路  14Comments  路  Source: elm/compiler

Optional Commas Proposal

One of the core design principles that has served Elm best is this:

Make it easy to do the right thing.

Here I will make the case that the current mandatory-comma syntax makes it harder to write readable Elm code than if significant whitespace were allowed as a substitute for commas.

As such, the status quo goes against one of Elm's core design principles. It is worth taking the effort to change this because doing so would be minimally invasive to Elm users, yet it would have a significant positive impact on a large percentage of the typical Elm app's code base.

In short, adopting this proposal this would make it easier to do the right thing.

Motivation

In Elm, multiline case expressions and let bindings enjoy the benefit of significant whitespace as delimiters. This makes them clean and concise, and writing them is an absolute pleasure. However, in multiline List and Record literals, as well as type alias and Union Type declarations, there is no corresponding significant whitespace support, and as such they all suffer in comparison to case and let from both a reading and a writing UX perspective.

If Elm supported this feature, here would be three ways to express the same idea in 6 lines of code:

Version 1: No commas

type alias Model = {
    viewMode     : ViewMode
    docs         : List Doc
    currentDocId : Maybe Identifier
    currentDoc   : Doc
  }

Version 2: Commas at end of line

type alias Model = {
    viewMode     : ViewMode,
    docs         : List Doc,
    currentDocId : Maybe Identifier,
    currentDoc   : Doc
  }

Version 3: Commas at start of line

type alias Model = 
  { viewMode     : ViewMode
  , docs         : List Doc
  , currentDocId : Maybe Identifier
  , currentDoc   : Doc
  }

The current recommended Elm syntax is to use Version 3 (commas at the start of the line; see for example TodoMVC). Drawbacks of this approach compared to the "no commas" approach:

  1. It is visually noisier than "no commas" without improving clarity in any way. There are extra characters to read and to type, but they do not add clarity.
  2. It is strictly harder to scan. Instead of your eye being able to jump to the first letter on the line, it must jump to the comma (or brace) and then parse past it. Based on this point and the previous one, I am comfortable making the claim that Version 3 above is demonstrably harder to read than Version 1.
  3. Any change that involves the first position (e.g. adding something at the top, or reordering it) is more time-consuming in a text editor because it has a different leading character than the other lines.

Version 2 (commas at the end of the line) is just as scannable as Version 1, but it makes the editing UX worse because the last entry works differently than the others. This has the following drawbacks:

  1. If you add a new entry to the end, you have to remember to add a comma to the preceding line, turning the addition you wanted to make into an addition plus an edit to an unrelated piece of code.
  2. Because of the previous point, your VCS diffs become noisier when using this style. (In Python, which allows trailing commas, it is a common best practice to deliberately add trailing commas in order to work around these drawbacks. I've been told by Rubyists to do this in Ruby literals as well, as Ruby also allows trailing commas.)
  3. When you forget a comma at the end of a line, the error message you get will not directly point you in the right direction...and because it's such a tiny visual difference, it's difficult to spot. I just had a coworker spend several minutes being bitten by this in a SQL query recently (which had similar mandatory-comma rules).

The primary drawbacks of supporting Version 1 seem to be uncertainty. Specifically:

  1. It's unclear how newline-implied commas would interact with commas.
  2. It's unclear what should happen if you have a really long line and want to wrap.

This proposal includes simple ways to resolve these uncertainties.

Proposed Changes

Change 1: Newlines Imply Commas

Consider the following code from a recent blog post:

view address model =
  container_
  [ stylesheet "https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css"
  , stylesheet "css/style.css"
  , node "link"
    [ A.href "http://fonts.googleapis.com/css?family=Special+Elite"
    , A.rel "stylesheet"
    ]
    []
  , img [A.src "images/joan.png"]
    []
  , h1_ "Can We Talk!?"
  , row_ [ inputControls address model ]
  , row_ [ messageList model ]
  ]

If newlines implied commas, the above code could be rewritten to the following by changing nothing but commas and whitespace:

view address model =
  container_ [
    stylesheet "https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css"
    stylesheet "css/style.css"
    node "link" [
      A.href "http://fonts.googleapis.com/css?family=Special+Elite"
      A.rel "stylesheet"
    ] []
    img [A.src "images/joan.png"] []
    h1_ "Can We Talk!?"
    row_ [ inputControls address model ]
    row_ [ messageList model ]
  ]

To me, the second version is substantially easier to read. It is also more concise and easier to edit, as each entry in each multiline List literal begins and ends with whitespace rather than delimiter characters.

Naturally, this raises the questions mentioned earlier:

  1. How should newline-implied commas interact with actual commas?
  2. What should happen if you have a really long line and want to wrap?

As to the first question, the simplest answer is that commas become optional in a multiline context. In other words, either of the above code snippets - as well as all three versions of the type alias presented earlier - would all still compile and work as normal. In all of these examples, the programmer's intent can be understood unambugiously by the compiler regardless of whether the commas are present.

The second question breaks down into two sub-questions:

  1. What should happen if you have a line with a really long let, case, or if expression and want to wrap?
  2. What should happen if you have a really long List or Record literal and want to wrap?
  3. What should happen if you have a really long function application and want to wrap?

In the case of let, case, and if, those already work trivially in both single line and multiline contexts, so no change would be necessary.

As far as List and Record literals go, the "newline implies comma" rule nests trivially, as seen in the above example where the first argument to container_ is a multiline List literal containing another multline List literal (the second argument to node):

container_ [
    stylesheet "https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css"
    stylesheet "css/style.css"
    node "link" [
      A.href "http://fonts.googleapis.com/css?family=Special+Elite"
      A.rel "stylesheet"
    ] []

The only remaining case is "What should happen if you have a really long function application and want to wrap?"

Currently, many of the above comma-less examples would be valid answers to this question, as newlines generally imply function application outside the toplevel.

As such, the only breaking change required to support this syntax involves how multiline function application is handled.

Change 2: Multiline Function Application Requires Indenting

Currently if you have a function application that spans multiple lines, you do not have to indent. For example, this compiles:

add2 a b = a + b

result =
  add2
  1
  2

This proposed change would disallow the above. Instead, it would require that you indent when performing function application across multiple lines, like so:

add2 a b = a + b

result =
  add2
    1
    2

This change resolves the only remaining question from the previous change, namely "What should happen if you have a really long function application and want to wrap?"

To better assess the implications of this change in conjunction with the previous change, consider the following record literal, excerpted from a real-world pattern match:

    CurrentDocMode -> {
      sidebarHeader = lazy2 viewCurrentDocHeader model.currentDoc addresses,
      sidebarBody   = lazy3 CurrentDoc.view addresses.navigateToTitle addresses.navigateToChapterId model.currentDoc,
      sidebarFooter = lazy  viewCurrentDocFooter addresses
    }

This is pretty long, so what if we wanted to wrap? One legal approach would be:

    CurrentDocMode -> {
      sidebarHeader =
        lazy2 viewCurrentDocHeader model.currentDoc addresses,

      sidebarBody =
        lazy3 CurrentDoc.view addresses.navigateToTitle addresses.navigateToChapterId model.currentDoc,

      sidebarFooter =
        lazy viewCurrentDocFooter addresses
    }

This is better, that second line is still pretty long. What if we wanted that one to wrap again? That could look like any of the following:

Option A

    CurrentDocMode -> {
      sidebarHeader =
        lazy2 viewCurrentDocHeader model.currentDoc addresses,

      sidebarBody =
        lazy3 CurrentDoc.view addresses.navigateToTitle
        addresses.navigateToChapterId model.currentDoc,

      sidebarFooter =
        lazy viewCurrentDocFooter addresses
    }

Option B

    CurrentDocMode -> {
      sidebarHeader =
        lazy2 viewCurrentDocHeader model.currentDoc addresses,

      sidebarBody =
        lazy3 CurrentDoc.view addresses.navigateToTitle
          addresses.navigateToChapterId model.currentDoc,

      sidebarFooter =
        lazy viewCurrentDocFooter addresses
    }

Option C

    CurrentDocMode -> {
      sidebarHeader =
        lazy2 viewCurrentDocHeader model.currentDoc addresses,

      sidebarBody =
        lazy3
          CurrentDoc.view
          addresses.navigateToTitle
          addresses.navigateToChapterId
          model.currentDoc,

      sidebarFooter =
        lazy viewCurrentDocFooter addresses
    }

To me, these seem clearly sorted from worst to best in terms of readability.

Option A puts lazy3 and addresses.navigateToChapterId at the same indentation level, suggesting they both have the same relation to the term on the previous indentation level (namely =). This is misleading, as they do not have the same relationship; lazy3 is a function call, and the remaining terms are all arguments to lazy3.

Option B is better, in that indentations suggest dependencies where they actually exist; the result of the call to lazy3 feeds into = and addresses.navigateToChapterId feeds into lazy3. However, the additional terms on the same line as lazy3 still reduce clarity. (Moving them all to the second line would fix this, but would then reintroduce the "line is too long" problem we were solving by wrapping in the first place.)

Option C seems easily the clearest. Its only drawback is that it takes up the most vertical space, and optimizing to save vertical space generally seems to reduce clarity without a comparable benefit anywhere else.

This proposed change to function application would disallow Option A, the least-clear one, while continuing to permit the other two.

In conjunction with the previous proposed change (newlines imply commas), those two could be rewritten as follows:

Option B without commas

    CurrentDocMode -> {
      sidebarHeader =
        lazy2 viewCurrentDocHeader model.currentDoc addresses

      sidebarBody =
        lazy3 CurrentDoc.view addresses.navigateToTitle
          addresses.navigateToChapterId model.currentDoc

      sidebarFooter =
        lazy viewCurrentDocFooter addresses
    }

Option C without commas

    CurrentDocMode -> {
      sidebarHeader =
        lazy2 viewCurrentDocHeader model.currentDoc addresses

      sidebarBody =
        lazy3
          CurrentDoc.view
          addresses.navigateToTitle
          addresses.navigateToChapterId
          model.currentDoc

      sidebarFooter =
        lazy viewCurrentDocFooter addresses
    }

There is no parsing ambiguity here, and these seem extremely readable. This style is nice and uncontentious in let bindings, and with this change, Record literals and List literals can get the same benefit.

Arguments for Prioritizing This

  1. It improves a ton of code. A lot of the time I find myself spending on Elm code is working with multiline Record and List literals, as well as multiline type alias definitions. If you look at a random .elm file in Dreamwriter or in TodoMVC, it's not uncommon to see that 25-50% of the lines of code in a given file could be improved by this change. This may be a small improvement, but its coefficient is very large; in practice it will significantly improve my UX when reading and writing a very large proportion of Elm code. (Which is exactly why I put so much effort into this proposal).
  2. It makes it easier to do the right thing. This enables a clearer style of writing Elm code that is currently possible, and makes it take less effort to use that style than any of the current alternatives because there is strictly less typing involved.
  3. It makes it harder to do the wrong thing. Requiring indentation for multiline function application seems like more of a benefit than a cost. Although that style is currently allowed, there's a separate case to be made that it shouldn't be.

    Design Counterarguments

The initial counterarguments I presented above had to do with uncertainty, but as this proposal addresses that uncertainty, they no longer apply.

Here is a case a blogger made against CoffeeScript's optional commas. (Search for "optional commas" to find that section.)

His first argument is actually against CoffeeScript's object literal syntax allowing you to omit curly braces (e.g. writing foo: bar is equivalent to writing { foo: bar } in many circumstances, which is also part of Ruby syntax for Hashes), which is a feature I absolutely disagree with and would oppose being added to Elm. Without this feature, his first criticism in the comma section no longer applies.

His second criticism only applies to function invocation, which is only applicable in CoffeeScript because function application in CoffeeScript supports comma-delimited arguments. Since Elm does not support comma-delimited function arguments, that criticism also does not apply.

Finally, there is also the intrinsic counterargument that this increases learning curve simply by virtue of being a new concept. That is always a fair criticism, but "newlines imply commas" and "indent to continue function application" seem like simple enough rules that they should not require much effort to comprehend or to remember.

Addendum on Union Types

All of the above arguments apply equally well, I think, to Union Types and the pipe character. Again, compare:

type Action
  = NoOp
  | UpdateField String
  | EditingTask Int Bool
  | UpdateTask Int String
  | Add
  | Delete Int
  | DeleteComplete
  | Check Int Bool
  | CheckAll Bool
  | ChangeVisibility String
type Action =
  NoOp
  UpdateField String
  EditingTask Int Bool
  UpdateTask Int String
  Add
  Delete Int
  DeleteComplete
  Check Int Bool
  CheckAll Bool
  ChangeVisibility String

The former is noisier without improving clarity, and also encourages putting the equals sign on the line below the declaration, which is inconsistent with how equals is typically used in the rest of Elm.

Enabling the latter syntax could be as simple as applying the same "indentation continues the previous line" concept from function application; I can't see any reason why that wouldn't work just as well here.

All 14 comments

For me personally, the 'type' example doesn't improve clarity. The '|' border provides a clear line for my eye to follow, showing where the type definition begins and ends.

The key difference between this and the records is that the braces { } enclosing the record provide a natural "start and end" for the eye to follow. Same with lists and []

I also liked it because of its connection to Backus-Naur form, reading the '|' in my head as "Or".

This could just be because I'm used to seeing it in Haskell and BNF. The indentation does provide the start/end information, so maybe it's just something we'd get used to.

For records, lists, etc. I think this is a really nice change, that also would have the side-effect of silencing some of the more vocal nay-sayers on Reddit.

I'm primarily interested in it for Lists and Records, which is why I mentioned union types as an addendum, but yeah.

@JoeyEremondi I'm curious - what are the naysayers on Reddit you mentioned?

I agree that the heterogeneousness of listing stuff with commas in between makes it harder to edit stuff in text editors. That's also the reason I started thinking in this direction once.
I don't agree with the visual noise argument. At least, for me it's helpful. I'm used to it like Joey. It feels like a serif, which can be useful.

I have a number of thoughts on this proposal. I'll argue from the position that I'd want this proposal to succeed.

  1. Please no optional stuff. Optional syntax is bad way IMHO. I don't really care about backward compatibility. If we really want this it's a significant change, if necessary we can create a little tool to change your code to the new syntax.

    1. Say you don't make it optional. Then you either loose short literals like [1,2,3] or you allow those _only_ on a single line. That sounds a little strange, but mixing the two styles will be even more confusing. The recommended way would be all new newlines, because there is a hassle with changing from commas to newlines.

  2. What about tuples? Those are also commas. It sounds kind of crazy to me to have optional commas there, but OTOH it's inconsistent not to have them there.
  3. Let's add the module export list to the possible places where commas can be removed.
  4. What about multiway ifs? Those have pipes like the union types. If you remove the if-then-else, you don't need the pipes with significant whitespace.

Given points 1 and 2 above, and that I don't mind the visual, I'm not a supporter this proposal. It's interesting, but there seems to be something missing.

@Apanatshka Made a separate, simplified proposal based on your feedback: https://github.com/elm-lang/elm-compiler/issues/979

Thoughts?

Allowing trailing commas might be another alternative.

On my side, I would not remove the commas. Probably I am alone with this feeling (mathematics and Haskell background may have driven me into this) but the commas somewhat stabilize the entities for me; removing them would loosen the structure (and this is what keeps me massively from, e.g., CoffeeScript). Especially considering sets, lists or tuples. Also, totally agree on 1) and 2) from Apanathshka: making it optional would loosen it more and having different cases seems strange.

I agree that making things optional would be worse than an all-out change. Elm's approach to syntax so far has been "there is one way to do it" and I like that, because it means things fall naturally into place. Contrast Rails, where there are 10 ways to do anything and consequently everyone has conflicting opinions.

@rtfeldman given our discussions at the meetup, do you want me to write up my counter-proposal and close these in favour of that?

Actually I'm really curious to hear @evancz's thoughts on this; I believe he just got back from Prague.

For reference, the DSL from last night's discussion: https://gist.github.com/rtfeldman/ef88bac28bf05051a0ee

Mike had some interesting comments last night, as a JS/Ruby/CoffeeScript veteran but Elm newbie:

  1. Between current Todo.elm, comma-less Todo.elm and operator-based Todo.elm, he found comma-less the easiest to read of the three. The learning curve on the operator one was by far the highest for him, whereas comma-less had zero learning curve (presumably since he already knew CoffeeScript).
  2. The composability wouldn't have many real-world use cases because it only makes it easier to extend the root element. As soon as you want to extend a component in a way that affects a child of the root (very often the case), you need a wrapper function anyway.

Very interesting observations from Mike! Not at all what I expected.

After talking last night there feels like visual ambiguity between whitespace denoting calls and lists. The problem #979 addresses is ugly DSLs. I agree operators conflate it with learning one-off syntax. I just wonder if there's another solution there. I'm leaning towards acceptance but also appreciate the parallel to tagged union syntax.

What would be an example of the visual ambiguity?

As en exercise: is there one in the converted TodoMVC? If not, why would other code bases be relevantly different in this regard than TodoMVC?

Closing in favor of https://github.com/elm-lang/elm-compiler/issues/979 which seems to be pretty much universally more popular.

Was this page helpful?
0 / 5 - 0 ratings