Dhall-haskell: Should we manipulate the header text prior markdown parsing?

Created on 18 Jun 2020  Â·  18Comments  Â·  Source: dhall-lang/dhall-haskell

In CommonMark, there is the notion of indented-code-blocks. In the current implementation of #1863, if we have something like this on the header:

{-  Hi there

    how are you?
-}

At first glance, I expected that this could generate two <p> sibling elements. On the processing, the comment delimiters will be removed, so it will be come the following text:

 Hi there
   how are you?

And mmark will parse it and produce the following html

<p>Hi there</p>
<pre>
  <code>how are you?
  </code>
</pre>

And that's its expected behaviour, being mmark a commonmark parser. But should be keep it that way and do no indentation-related processing? The first test I did was with a doc header like in that example and I ended up a litttle confused, I didn't knew about that common mark rule though.

The following example renders two paragraphs:

{-
Hi there

how are you?
-}

For simplicity, I think that we should do no-indentation processing and encourage users to use the latest: to start their docs on next line after the {-. But I'd like to see your thoughs/regards about this.

dhall-docs question

All 18 comments

Yeah, I agree that the documentation should not be indented and should start on the line after the {-. This is why most of the Prelude documentation is formatted in that way.

@Gabriel439: Should the tool still just remove the {- prefix or remove the entire first line? Currently it is doing the prior.

Won't we need to properly handle indented comments anyway once we start processing in-code comments too?

E.g.

let MyRecord =
      { {-| foo
            bar
            baz -}
         x : Natural
       , ...
       }

I don't see why we shouldn't support this for headers too.

It's a bit tricky of course, but we also have already established some (i assume) similar rules for multi-line strings: https://github.com/dhall-lang/dhall-lang/blob/master/standard/multiline.md#indentation

I think we should rather consider not supporting indented code blocks since they won't mix well with indented block comments.

rustdoc AFAICT seems to bypass this issue by requiring each line to start with ///, so there's never any ambiguity on how far a comment line is indented.

I think we should rather consider not supporting indented code blocks since they won't mix well with indented block comments.

Or establish some rules about what type of comments should be used for your documented element. Like for example:

  • For headers you can use multiline or single line comments
  • For records and function type arguments just support single line.

Although I dont like it to much...

rustdoc AFAICT seems to bypass this issue by requiring each line to start with ///, so there's never any ambiguity on how far a comment line is indented.

Javadoc does something similar, you need to start your comments with /** and each line should start with *.

We could do something similar, using | in our case, although its not so easy to type and may end up being annoying for the user.

Or maybe we can use the first line in the code comment as our indentation guide and subsequent lines will need to be indented using that as the base

If so, I think that cases like:

{-|   foo
      bar
          baz

Will be ok and will generate an indented codeblock for baz, and

{-|   foo
      bar
    baz

Will be invalid or maybe we can assume lesser indentation will be equal to the first indentation.

What about something this:

{-|  foo
  |  bar
  |       baz
  |-}

-- |  foo
-- |  bar
-- |       baz

Then the indentation would be unambiguous, because it would be relative to the |

Yeah, that would probably simplify the implementation. It looks a bit onerous to type though.

What did you think about the idea of simply not supporting indented code blocks? We'd still have the ``` sort. Users who aren't aware of the limitation might find it tricky to figure out though…

On second thought I actually like Gabriel's idea. I think it would be good to discuss this with the wider community though.

Yeah, that would probably simplify the implementation. It looks a bit onerous to type though.

What did you think about the idea of simply not supporting indented code blocks? We'd still have the ``` sort. Users who aren't aware of the limitation might find it tricky to figure out though…

I don't know if we can turn-off that feature on mmark, since it is on the commonmark spec.

On second thought I actually like Gabriel's idea. I think it would be good to discuss this with the wider community though.

I'll ask on the discourse thread

What did you think about the idea of simply not supporting indented code blocks? We'd still have the ``` sort. Users who aren't aware of the limitation might find it tricky to figure out though…

I don't know if we can turn-off that feature on mmark,

Maybe we don't need to do anything on mmark side about this. Just stripping leading whitespace (outside of ```-code blocks) might be enough.

since it is on the commonmark spec.

I thought we're departing from commonmark already somewhat by using mmark?

On second thought I actually like Gabriel's idea. I think it would be good to discuss this with the wider community though.

I'll ask on the discourse thread

:+1:

What did you think about the idea of simply not supporting indented code blocks? We'd still have the ``` sort. Users who aren't aware of the limitation might find it tricky to figure out though…

I don't know if we can turn-off that feature on mmark,

Maybe we don't need to do anything on mmark side about this. Just stripping leading whitespace (outside of ```-code blocks) might be enough.

But there are some places where indentation can be tricky if bad stripped. For instance, nested list items.

since it is on the commonmark spec.

I thought we're departing from commonmark already somewhat by using mmark?

I explained wrong. mmark mostly tries to follow CommonMark, and nested code blocks is one of those features that it actually implements. mmark is really high-level and actually doesn't provides a way to modify parsing directly. I think I'll file an issue on its repo to ask author if there is a way to _disable_ features, unlike mmark extensions work, that _add_ features to parsing

On second thought I actually like Gabriel's idea. I think it would be good to discuss this with the wider community though.

I'll ask on the discourse thread

I've just asked the community. If they commonly accept using | or a similar proposal, then we can live with mmark way of parsing.

I'd vote for the option without the |. I felt a bit distracted by the "cognitive load" it induces. At least that was my first impression looking at the examples.

I do think we should stick to CommonMark with all it's features, whether we like it or not. Supporting all of it but indented code blocks will lead to confusion.
IMHO this is a nice discussion of Markdown and the compability issue:
https://talk.commonmark.org/t/beyond-markdown/2787

Here's another idea:

  • Documentation comments have to be either block comments or single-line comments

    • In other words, more than one -- | in a row is not permitted
  • For block comments, the first line has to be {-| with no trailing characters and the last line has to be -}

    • The indentation is relative to the opening { character of the block comments

    In other words:

    {-|
    ↑ First column
    -}

  • For a line comment, there is a required space after the -- | that is stripped before conversion to markdown


For block comments, the first line has to be {-| with no trailing characters and the last line has to be -}

  • The indentation is relative to the opening { character of the block comments
    In other words:
    haskell {-| ↑ First column -}

When a 2 line comment takes 4 lines, I feel that's a bit a waste of space, but I'm not sure how to address this best.

For line comments, I think I'd rather have a convention where, in a sequence of line comments, the indentation is determined by removing the leading -- or -- | and a single space character.

So

-- | bli
-- bla
  -- blub
 --     blarg

would be interpreted as the Markdown text

bli
bla
blub
    blarg

It looks like we have a lot of ideas on the table. I'll write a summary of all the discussed ideas here with its (estimated) difficulty of implementation (easy, medium, hard) and its flexibility (I'll try to not be subjective) for the end-user.

Asides, it looks like indentation is something we need to definetly handle. Although indented-code-blocks are not a so used (and known) markdown feature, other ones like nested-lists are really useful and they heavily use indentation.

I'd like to apologize for typos, missing or misunderstanding of ideas and the length of this comment. Please let me know what you think will be the best option (for me, (2 = 4) > 3 > 1).

1. Don't manipulate parsed doc for indentation

This means:

  • Documentation Block comments are only permitted on headers, and indentation is the same as the beginning of the file
  • For other type of documentation comments (i.e. function arguments' type and attributes) the user should use singleline comments. Several lines are not allowed

Difficulty: Easy
Flexibility for end-user: obviously not so flexible. The attribute description may be several lines long

2. Use first line of text for indentation

This will use the first line of documentation on a (singleline or block) comment as the base of indentation. This doesn't mean that the first line of comments (right after {-| or -- |) will be the base for it, we might use the first line of _actual_ documentation.

So, something like

{-| foo
    bar
        baz
-}

and

{-|

 foo
 bar
     baz
-}

and (although I don't like it too much)

     {-|
foo
bar
    baz
-}

will be equivalent to:

foo
bar
    baz

For inline comments it is kind of tricky though. The only required line to have the | after -- should be the first one, and we should parse as many consecutive line of comments we can. Note that if there is another token between line of comments, the comment after the token is not part of the documentation.

So,

-- | foo
--   bar
--       baz

and (not sure about this actually, but if I let that behaviour in block comments then it kind of makes sense to me to have it in both type of comments)

-- |
--    foo
--    bar
--        baz

will be equivalent to the same as above:

foo
bar
    baz

A thing we should notice here is if the indentation on the source code of the set of -- lines.

-- |
  -- foo
    -- bar
      --     baz

If we align the --, then it is equivalent to the penultimate snippet. But if not, then it will produce:

foo
  bar
        baz

I'd prefer to go for the prior, i.e. aligning all the -- comments.

Difficulty:

  • For block comments, I'd say its _medium_. The difficult part is extract the src position of every line. We might do this by extracting the src position of the {-| and analyze the text to find the src location of each line.
  • For line comments, I'd say its _medium_ if we do the aligning of the --. The parser will need to get as much -- lines as it can (starting with --|) and remove all whitespace before each line. If we don't do any aligning, then i'd say this is going to be more complicated as the other, but still _medium_. We might again capture all of the lines and add the extra prefix-space to the lines that needed (i.e. second and following).

User experience:

  • For block comments I think that users will be confortable of this behaviour. The rules are really simple (in fact, just one)
  • For single line comments it is also ok-ish. The only confusion might be if the tool doesn't do -- alignment, which might tell us that we should. dhall format could do that alignment as well.

3. | as base of indentation

This idea is similar to javadoc or rustdoc. On block comments, each line starts with | (note the space), that will be the base of indentation. So block comments like the following

{-| foo
  | bar
  |    baz
-}

and

{-|
  | foo
  | bar
  |     baz
-}

and

{-|
| foo
 | bar
  |     baz
-}

will produce:

foo
bar
    baz

For line comments, this is similar as the second idea using the | as the alignment.

Difficulty: Easy. We just remove all of the text on each line until we find the |. The tool may show a warning when a line with no | is found.
User experience: Not so good since you have to remember to type those |. IntelliJ automatically adds it when it detects you're writing documentation. We might update dhall-lsp-server (not sure if that is responsible for that) of the Dhall Language support in the case of vscode.

4. Indentation is guided by the { char and no several -- are allowed

This is described here (comment). It is similar to the second idea, but indentation is determined by the { char and no consecutive -- comments are allowed.

Difficulty: It would be similar to the 2nd idea, medium and probably easier since there are not a lot of rules.
User experience: I'd say it's fine for users. It makes the user write more consistent, easier to read-and-maintain documentation. The only downside is that no several -- in a row are allowed, but you have {-.

I personally think we should go for the simplest approach, not only because it is easier to implement, but also because it is easier for users to reason about. We can always support something more flexible complicated later if users request it, but once you support something more complicated you can't easily take it back.

I'm not really convinced that the various approaches are really all that difficult. With a few tests, I think it should be quite manageable.

@Gabriel439 has a good point though. Once we have some users we'll most likely get more interest in this issue, so we'll have a better basis to decide on more complicated approaches.

Was this page helpful?
0 / 5 - 0 ratings