Node: Inconsistent use of plaintext code fencing in docs

Created on 20 Apr 2020  路  15Comments  路  Source: nodejs/node

Hey party people!

I'm working on a project that parses all the markdown files in the doc directory of this repo, and I've come across a number of instances of inconsistent naming of the plaintext info string (i.e. the language) in GitHub Flavored Markdown code fencing:

language | instances
:------- | :--------
text | 44
txt | 21
fundamental | 1
markdown | 1
raw | 1

I'm currently using this hack to sanitize all the incoming markdown before converting it to HTML:

markdown = markdown
  .replace(/```txt/gm, '```plaintext')
  .replace(/```text/gm, '```plaintext')
  .replace(/```fundamental/gm, '```plaintext')
  .replace(/```raw/gm, '```plaintext')
  .replace(/```markdown<.*$/gm, '```plaintext')
  .replace(/```plaintext<.*$/gm, '```plaintext')

Here's what the current documentation style guide says about code blocks:

  • For code blocks:

    • Use language aware fences. ("```js")

    • Code need not be complete. Treat code blocks as an illustration or aid to

      your point, not as complete running programs. If a complete running program

      is necessary, include it as an asset in assets/code-examples and link to

      it.

There's no mention of what the canonical language should be for plaintext, but I'd suggest standardizing on one of the following:

  • text is the most commonly used plaintext info string in this project
  • plaintext is what the popular highlight.js syntax highlighting library expects

I don't have a strong opinion about what the info string should be as long as it's used consistently across the docs. I'd be happy to open a pull request to clean these up, but wanted to open an issue first for discussion.

Things that might need to be done to resolve this:

  • [ ] Choose an info string for plain text
  • [ ] Update docs to all use the same info string
  • [ ] Update the style guide
  • [ ] Add a linting rule?
  • [ ] What else?

cc @nodejs/documentation 馃憢

doc tools

All 15 comments

If someone feels strong about this and want to open a PR to add a linting rule to enforce one form (preferably text given the data we have), they are welcome.

I think there are 2 instances of markdown I could find:

https://github.com/nodejs/node/blob/46ec9ab7d81f7dac59b2bf9afb67896564b4f586/doc/guides/contributing/issues.md#L42

https://github.com/nodejs/node/blob/46ec9ab7d81f7dac59b2bf9afb67896564b4f586/doc/guides/contributing/pull-requests.md#L254

IMHO both should stay as is, it is valid markdown syntax and therefore should be syntax highlighted as markdown (supported by highlight.js BTW).

Regarding the use of raw and fundamental, I don't know if it brings any more value than just using text. For reference:

https://github.com/nodejs/node/blob/46ec9ab7d81f7dac59b2bf9afb67896564b4f586/doc/api/report.md#L506-L512
https://github.com/nodejs/node/blob/46ec9ab7d81f7dac59b2bf9afb67896564b4f586/doc/api/fs.md#L3470-L3474

There's a discussion going about code fences in GitHub Flavored Markdown (GFM) going at https://github.com/nodejs/node/issues/32916. I also have a PR open about adding more GFM over at https://github.com/nodejs/node/pull/32944.

/to @Trott

Progress on this issue would be valuable to inform us on what lint rules (if any) should be added to the Markdown files located in the .github directory of this repo.

Specs:
https://github.github.com/gfm/
https://spec.commonmark.org/0.29/

This issue seems to need label:doc and label:tools. Can someone please add them?

I think markdown should stay, but the others should be consolidated into a single label unless someone is able to articulate a display and/or semantic difference. text seems good to me, as it's the most common one.

I think markdown should stay [...]

Agreed. Markdown is a distinct language that benefits from syntax highlighting as opposed to Plaintext, which does not. Here are all the languages supported by highlight.js:

https://github.com/highlightjs/highlight.js/blob/master/SUPPORTED_LANGUAGES.md

Therefore, markdown should not be replaced with plaintext, which is happening in OP's hack.

[...] but the others should be consolidated into a single label unless someone is able to articulate a display and/or semantic difference.

There is a semantic difference and I do not believe they should be consolidated into a single label (info string). Allow me to demonstrate.

Regarding the use of raw and fundamental, I don't know if it brings any more value than just using text. For reference:

Rather than raw, that should be console. Look at the differences in syntax highlighting.


Demo

Using raw as the info string:

```raw
$ node

process.report.writeReport();
Writing Node.js report to file: report.20181126.091102.8480.0.001.json
Node.js report completed

```

Using console as the info string:

```console
$ node

process.report.writeReport();
Writing Node.js report to file: report.20181126.091102.8480.0.001.json
Node.js report completed

```

Therefore, console would be preferable as the syntax highlighting is significant and report.md's use of raw is erroneous. I can provide screenshots to help clarify my point if necessary (let me know).

Thanks @DerekNonGeneric. I've incorporated your feedback in https://github.com/nodejs/node/pull/33028

I had commented in the PR w/ my pre-deep-dive thoughts. With a better understanding, it turns out they're more relevant to this thread and _we had been overlooking several pertinent details_.

  1. We鈥檙e not currently targeting highlight.js (unsure why it was mentioned) since that鈥檚 not what the API docs are using. The Node.js API docs are presently using SHJS for syntax highlighting. I don鈥檛 know what GitHub uses, but ensuring consistency with the rendered Markdown pages that are viewable on the GitHub website would be preferable.
  2. Unless we simply want to achieve consistency, there does not appear to be justification for using text over nothing at all (I still need to check whether SHJS does anything with this). Perhaps it should be an empty (blank) info string instead.
  3. My suggestion about using console was for its benefit to rendered Markdown pages that are viewable on the GitHub website. I鈥檇 personally like to know what GitHub uses for the syntax highlighting for these pages to determine if it would be feasible to replace SHJS with it.
  4. Something we had yet to consider is that the Markdown files in the docs/api directory are not GFM, but CommonMark (at least that appears to be what the intention is).

I agree that there is an issue here and that the PR will fix it, although I normally see this type of thing handled in a single commit that also prevents it from re-occurring. These Markdown lint rules aren't defined in this repo though (they're in remark-preset-lint-node), so I imagine this will play out differently than what I'm accustomed to. Looking forward to seeing this come to fruition!

Unless we simply want to achieve consistency, there does not appear to be justification for using text over nothing at all.

My 2 cents on this: I think we should enforce using text, it seems to be an easy way to spot when a contributor forgot to add the language (maybe because a line return slip through, maybe the collaborator is not familiar with this feature of Markdown, etc.), and it makes it explicit when the syntax highlighting just don't make sense.

Maybe we can start small and do this incrementally? I imagine it would be uncontroversial as a first step to standardize on text or txt rather than having both.

Maybe we can start small and do this incrementally? I imagine it would be uncontroversial as a first step to standardize on text or txt rather than having both.

(This would also allow us to implement lint restrictions incrementally.)

Believe it or not …

  • the remark-rehype module converts the info string in the code fences (e.g.,```xxx) to an HTML <pre> tag w/ a nestested <code> tag that has a language- class (e.g., language-xxx in this case)

https://github.com/nodejs/node/blob/94e5b5c77dade0d8f7358c66144b75c369679cab/tools/doc/generate.js#L90

https://github.com/nodejs/node/blob/94e5b5c77dade0d8f7358c66144b75c369679cab/doc/api_assets/sh_main.js#L513

  • all code fences containing an info string get parsed as JavaScript regardless of language specified

https://github.com/nodejs/node/blob/94e5b5c77dade0d8f7358c66144b75c369679cab/doc/api_assets/sh_javascript.min.js#L1

all code fences containing an info string get parsed as JavaScript regardless of language specified

Correction: there is a way to prevent code blocks from being highlighted. 馃槍

https://github.com/nodejs/node/blob/94e5b5c77dade0d8f7358c66144b75c369679cab/doc/api_assets/sh_main.js#L522-L525

Assuming …

  1. info string of none
  2. gets converted to pre tag w/ class of language-none by remark-rehype [it DOES]
  3. gets converted to sh_none by SHJS on the frontend [it DOES NOT]

(Previous comment deleted. Wrong issue. Obviously. Sorry.)

Since the Markdown lint rules are in remark-preset-lint-node, we are going to need to normalize the info strings prior to version bumping node-lint-md-cli-rollup to include the latest ruleset.

These version bumps have been happening independently and would not include changes that make the documents comply w/ the newest version of the ruleset. Therefore, the PR to resolve this issue (https://github.com/nodejs/node/pull/33028) should be good to go at this point. We will just need to wait for https://github.com/nodejs/node/pull/33442 to land until code blocks are actually preventable from being highlighted.

Was this page helpful?
0 / 5 - 0 ratings