Roxygen2: Issue with bullet lists in `@param`

Created on 28 May 2020  路  12Comments  路  Source: r-lib/roxygen2

Surfaced in r-lib/vctrs#1100.

This block:

#' Foo
#' @name foo
#'
#' @param foo
#'   * A
#'   * B
#'   * C
NULL

produces a nested bullet list:

\arguments{
\item{foo}{\itemize{
\item A
\itemize{
\item B
\item C
}
}}
}

The first workaround is to remove the indenting:

#' Bar
#' @name bar
#'
#' @param bar
#' * A
#' * B
#' * C
NULL
\arguments{
\item{bar}{\itemize{
\item A
\item B
\item C
}}
}

Or to add a header:

#' Baz
#' @name baz
#'
#' @param baz
#'   Header.
#'   * A
#'   * B
#'   * C
NULL
\arguments{
\item{baz}{Header.
\itemize{
\item A
\item B
\item C
}}
}
bug markdown

All 12 comments

roxygen2::roc_proc_text(roxygen2::rd_roclet(), "
  #' Foo
  #' @name foo
  #'
  #' @md
  #' @param foo
  #'   * A
  #'   * B
  #'   * C
  NULL
")
#> $foo.Rd
#> % Generated by roxygen2: do not edit by hand
#> % Please edit documentation in ./<text>
#> \name{foo}
#> \alias{foo}
#> \title{Foo}
#> \arguments{
#> \item{foo}{\itemize{
#> \item A
#> \itemize{
#> \item B
#> \item C
#> }
#> }}
#> }
#> \description{
#> Foo
#> }

Created on 2020-05-28 by the reprex package (v0.3.0)

We might be able to use the same algorithm as glue::trim() here

Since the white space is significant for markdown, maybe we should just not trim it at all? In some cases like this, at least. The specific problem here is that the markdown parser gets

Browse[2]> text
[1] "* A\n  * B\n  * C"

and cutting off the initial white space here really matters.

Since whitespace is significant, I would expect the roxy indenting to be cut off so it doesn't interfere with the markdown parser?

The whitespace conveys information in markdown, so if you cut it off then you lose that information.

We could cut off _some_ of it, like glue::trim(). This would make sense I think, but it also makes the rules more difficult, and also potentially constrains what you can write. E.g. indenting by four spaces marks a code block, but if we glue::trim() then that's only true if you don't do it in from the first line of a roxygen tag.

I think it is just simpler not to touch the white space at all. Then the rules are simple: we just cut off the #' comments (with white space _before_ them), and the rest is treated as a markdown file. It is also simple to move text between .md and roxygen: you just add or remove the #' prefix. If we touch with the white space then this does not hold any more.

Just to be clear, I meant that there are two kinds of whitespaces in a roxy markdown: (a) the indenting inside a roxy key, and (b) the whitespace for the markdown content. I would expect (a) to be removed so it doesn't interfere with (b). I understand that it might cause complications though.

Then the rules are simple: we just cut off the #' comments (with white space before them), and the rest is treated as a markdown file.

These rules are simple but I think they produce unexpected behaviour in this case. Unindented roxy blocks are very hard to read, so we should encourage indenting.

two kinds of whitespaces in a roxy markdown

How do you tell the difference? E.g. what kind of whitespace is this?

#' @section Mysection:
#'     myfun()
#'     myfun2()

but I think they produce unexpected behaviour in this case

To be clear, my suggestion would solve your original problem.

E.g. what kind of whitespace is this?

I think using first indent as in glue to infer the indent space is reasonable and predictable. It would work both with @param and with @description sections. I don't see it causing problems, though of course I haven't thought much about these things.

To be clear, my suggestion would solve your original problem.

Sorry I missed that. How does it solve it?

I think using first indent as in glue to infer the indent space is reasonable and predictable.

Yes, but I think it is also not necessary. The best is not to have two-kinds of whitespace at all, just one kind. Much simpler. E.g. if you cut the whitespace, then you cannot easily move code between @includeRmd and inline comments, because you might need to adjust the white space.

I don't see it causing problems, though of course I haven't thought much about these things.

The problem is that the formatting of this part of the example is context dependent:

#'     myfun()
#'     myfun2()

E.g. if you put this in an .md and include it, you get a code block. If we trim the white space then this is not the case for inline comments, because all the white space is removed.

We also do not know how white space will be significant in future versions of markdown, so just throwing it away does not seem like a good solution.

Sorry I missed that. How does it solve it?

If we don't cut the whitespace on the first line, then the markdown parser will get this, and generate the correct output, i.e. a single list:

> cat(commonmark::markdown_xml("  * foo\n  * bar\n"))
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
  <list type="bullet" tight="true">
    <item>
      <paragraph>
        <text xml:space="preserve">foo</text>
      </paragraph>
    </item>
    <item>
      <paragraph>
        <text xml:space="preserve">bar</text>
      </paragraph>
    </item>
  </list>
</document>

E.g. if you cut the whitespace, then you cannot easily move code between @includeRmd and inline comments, because you might need to adjust the white space.

This might be a feature. I think it is important to adjust the whitespace when copying from a .md to a roxy block because unindented roxy documentation is poor practice. Note that this only concerns a subset of keywords though, e.g. param and return, so other keywords that do not require indenting might skip the trimming. Edit: Or perhaps this is about whether the text starts on the same line as the keyword?

.g. if you put this in an .md and include it, you get a code block.

Is this the only expected problem? Fenced code blocks are much more common than indented blocks.

unindented roxy documentation is poor practice

You can still indent it, in a way that works with markdown. E.g. your original post is perfectly fine. :)

Or perhaps this is about whether the text starts on the same line as the keyword?

Yeah, this is exactly the problem with trimming. :) We would end up with a complicated set of rules that nobody understands. It would take a long time to explain them and make sure they are in sync with the docs, etc. Whereas not cutting is just that, not cutting. One line to explain.

Is this the only expected problem? Fenced code blocks are much more common than indented blocks.

Indented blocks are canonical pandoc markdown, and the visual editor creates them as well. So they will be the standard in R, very likely.

I don't know if this is the only problem, or that it will be the only problem as markdown and commonmark evolves. But that's the point. If we don't process the text, then we don't need to know. Let the markdown parser interpret the markdown as it should.

I think you may be overstating the complexity of these potential rules. People and IDEs will keep indenting their parameter lists and this whitespace sensitive syntax might be interpreted by the markdown parser in surprising ways. From this point of view, trimming the indent whitespace reduces the complexity of the syntax rather than increase it. I agree that finding a rule sufficiently simple and predictable to detect when it makes sense to trim might be difficult.

In any case we have the same goals, keeping things simple and having indentation that makes sense from the persectives of the roxygen and markdown syntaxes. I have found the markdown integration to work well in most cases, so if your suggestion solves the bullet list issue as well, I would be happy with it.

Was this page helpful?
0 / 5 - 0 ratings