In CommonMark, there is the notion of indented-code-blocks. In the current implementation of #1863, if we have something like this on the header:
{- Hi there
how are you?
-}
At first glance, I expected that this could generate two <p> sibling elements. On the processing, the comment delimiters will be removed, so it will be come the following text:
Hi there
how are you?
And mmark will parse it and produce the following html
<p>Hi there</p>
<pre>
<code>how are you?
</code>
</pre>
And that's its expected behaviour, being mmark a commonmark parser. But should be keep it that way and do no indentation-related processing? The first test I did was with a doc header like in that example and I ended up a litttle confused, I didn't knew about that common mark rule though.
The following example renders two paragraphs:
{-
Hi there
how are you?
-}
For simplicity, I think that we should do no-indentation processing and encourage users to use the latest: to start their docs on next line after the {-. But I'd like to see your thoughs/regards about this.
Yeah, I agree that the documentation should not be indented and should start on the line after the {-. This is why most of the Prelude documentation is formatted in that way.
@Gabriel439: Should the tool still just remove the {- prefix or remove the entire first line? Currently it is doing the prior.
Won't we need to properly handle indented comments anyway once we start processing in-code comments too?
E.g.
let MyRecord =
{ {-| foo
bar
baz -}
x : Natural
, ...
}
I don't see why we shouldn't support this for headers too.
It's a bit tricky of course, but we also have already established some (i assume) similar rules for multi-line strings: https://github.com/dhall-lang/dhall-lang/blob/master/standard/multiline.md#indentation
I think we should rather consider not supporting indented code blocks since they won't mix well with indented block comments.
rustdoc AFAICT seems to bypass this issue by requiring each line to start with ///, so there's never any ambiguity on how far a comment line is indented.
I think we should rather consider not supporting indented code blocks since they won't mix well with indented block comments.
Or establish some rules about what type of comments should be used for your documented element. Like for example:
Although I dont like it to much...
rustdocAFAICT seems to bypass this issue by requiring each line to start with///, so there's never any ambiguity on how far a comment line is indented.
Javadoc does something similar, you need to start your comments with /** and each line should start with *.
We could do something similar, using | in our case, although its not so easy to type and may end up being annoying for the user.
Or maybe we can use the first line in the code comment as our indentation guide and subsequent lines will need to be indented using that as the base
If so, I think that cases like:
{-| foo
bar
baz
Will be ok and will generate an indented codeblock for baz, and
{-| foo
bar
baz
Will be invalid or maybe we can assume lesser indentation will be equal to the first indentation.
What about something this:
{-| foo
| bar
| baz
|-}
-- | foo
-- | bar
-- | baz
Then the indentation would be unambiguous, because it would be relative to the |
Yeah, that would probably simplify the implementation. It looks a bit onerous to type though.
What did you think about the idea of simply not supporting indented code blocks? We'd still have the ``` sort. Users who aren't aware of the limitation might find it tricky to figure out though…
On second thought I actually like Gabriel's idea. I think it would be good to discuss this with the wider community though.
Yeah, that would probably simplify the implementation. It looks a bit onerous to type though.
What did you think about the idea of simply not supporting indented code blocks? We'd still have the ``` sort. Users who aren't aware of the limitation might find it tricky to figure out though…
I don't know if we can turn-off that feature on mmark, since it is on the commonmark spec.
On second thought I actually like Gabriel's idea. I think it would be good to discuss this with the wider community though.
I'll ask on the discourse thread
What did you think about the idea of simply not supporting indented code blocks? We'd still have the ``` sort. Users who aren't aware of the limitation might find it tricky to figure out though…
I don't know if we can turn-off that feature on
mmark,
Maybe we don't need to do anything on mmark side about this. Just stripping leading whitespace (outside of ```-code blocks) might be enough.
since it is on the commonmark spec.
I thought we're departing from commonmark already somewhat by using mmark?
On second thought I actually like Gabriel's idea. I think it would be good to discuss this with the wider community though.
I'll ask on the discourse thread
:+1:
What did you think about the idea of simply not supporting indented code blocks? We'd still have the ``` sort. Users who aren't aware of the limitation might find it tricky to figure out though…
I don't know if we can turn-off that feature on
mmark,Maybe we don't need to do anything on
mmarkside about this. Just stripping leading whitespace (outside of ```-code blocks) might be enough.
But there are some places where indentation can be tricky if bad stripped. For instance, nested list items.
since it is on the commonmark spec.
I thought we're departing from commonmark already somewhat by using
mmark?
I explained wrong. mmark mostly tries to follow CommonMark, and nested code blocks is one of those features that it actually implements. mmark is really high-level and actually doesn't provides a way to modify parsing directly. I think I'll file an issue on its repo to ask author if there is a way to _disable_ features, unlike mmark extensions work, that _add_ features to parsing
On second thought I actually like Gabriel's idea. I think it would be good to discuss this with the wider community though.
I'll ask on the discourse thread
I've just asked the community. If they commonly accept using | or a similar proposal, then we can live with mmark way of parsing.
I'd vote for the option without the |. I felt a bit distracted by the "cognitive load" it induces. At least that was my first impression looking at the examples.
I do think we should stick to CommonMark with all it's features, whether we like it or not. Supporting all of it but indented code blocks will lead to confusion.
IMHO this is a nice discussion of Markdown and the compability issue:
https://talk.commonmark.org/t/beyond-markdown/2787
Here's another idea:
Documentation comments have to be either block comments or single-line comments
-- | in a row is not permittedFor block comments, the first line has to be {-| with no trailing characters and the last line has to be -}
{ character of the block commentsIn other words:
{-|
↑ First column
-}
For a line comment, there is a required space after the -- | that is stripped before conversion to markdown
For block comments, the first line has to be
{-|with no trailing characters and the last line has to be-}
- The indentation is relative to the opening
{character of the block comments
In other words:
haskell {-| ↑ First column -}
When a 2 line comment takes 4 lines, I feel that's a bit a waste of space, but I'm not sure how to address this best.
For line comments, I think I'd rather have a convention where, in a sequence of line comments, the indentation is determined by removing the leading -- or -- | and a single space character.
So
-- | bli
-- bla
-- blub
-- blarg
would be interpreted as the Markdown text
bli
bla
blub
blarg
It looks like we have a lot of ideas on the table. I'll write a summary of all the discussed ideas here with its (estimated) difficulty of implementation (easy, medium, hard) and its flexibility (I'll try to not be subjective) for the end-user.
Asides, it looks like indentation is something we need to definetly handle. Although indented-code-blocks are not a so used (and known) markdown feature, other ones like nested-lists are really useful and they heavily use indentation.
I'd like to apologize for typos, missing or misunderstanding of ideas and the length of this comment. Please let me know what you think will be the best option (for me, (2 = 4) > 3 > 1).
This means:
Difficulty: Easy
Flexibility for end-user: obviously not so flexible. The attribute description may be several lines long
This will use the first line of documentation on a (singleline or block) comment as the base of indentation. This doesn't mean that the first line of comments (right after {-| or -- |) will be the base for it, we might use the first line of _actual_ documentation.
So, something like
{-| foo
bar
baz
-}
and
{-|
foo
bar
baz
-}
and (although I don't like it too much)
{-|
foo
bar
baz
-}
will be equivalent to:
foo
bar
baz
For inline comments it is kind of tricky though. The only required line to have the | after -- should be the first one, and we should parse as many consecutive line of comments we can. Note that if there is another token between line of comments, the comment after the token is not part of the documentation.
So,
-- | foo
-- bar
-- baz
and (not sure about this actually, but if I let that behaviour in block comments then it kind of makes sense to me to have it in both type of comments)
-- |
-- foo
-- bar
-- baz
will be equivalent to the same as above:
foo
bar
baz
A thing we should notice here is if the indentation on the source code of the set of -- lines.
-- |
-- foo
-- bar
-- baz
If we align the --, then it is equivalent to the penultimate snippet. But if not, then it will produce:
foo
bar
baz
I'd prefer to go for the prior, i.e. aligning all the -- comments.
Difficulty:
{-| and analyze the text to find the src location of each line.--. The parser will need to get as much -- lines as it can (starting with --|) and remove all whitespace before each line. If we don't do any aligning, then i'd say this is going to be more complicated as the other, but still _medium_. We might again capture all of the lines and add the extra prefix-space to the lines that needed (i.e. second and following).User experience:
-- alignment, which might tell us that we should. dhall format could do that alignment as well.| as base of indentationThis idea is similar to javadoc or rustdoc. On block comments, each line starts with | (note the space), that will be the base of indentation. So block comments like the following
{-| foo
| bar
| baz
-}
and
{-|
| foo
| bar
| baz
-}
and
{-|
| foo
| bar
| baz
-}
will produce:
foo
bar
baz
For line comments, this is similar as the second idea using the | as the alignment.
Difficulty: Easy. We just remove all of the text on each line until we find the |. The tool may show a warning when a line with no | is found.
User experience: Not so good since you have to remember to type those |. IntelliJ automatically adds it when it detects you're writing documentation. We might update dhall-lsp-server (not sure if that is responsible for that) of the Dhall Language support in the case of vscode.
{ char and no several -- are allowedThis is described here (comment). It is similar to the second idea, but indentation is determined by the { char and no consecutive -- comments are allowed.
Difficulty: It would be similar to the 2nd idea, medium and probably easier since there are not a lot of rules.
User experience: I'd say it's fine for users. It makes the user write more consistent, easier to read-and-maintain documentation. The only downside is that no several -- in a row are allowed, but you have {-.
I personally think we should go for the simplest approach, not only because it is easier to implement, but also because it is easier for users to reason about. We can always support something more flexible complicated later if users request it, but once you support something more complicated you can't easily take it back.
I'm not really convinced that the various approaches are really all that difficult. With a few tests, I think it should be quite manageable.
@Gabriel439 has a good point though. Once we have some users we'll most likely get more interest in this issue, so we'll have a better basis to decide on more complicated approaches.