Gatsby: [gatsby-transformer-remark] Section titles with code in ``s creates escaped HTML <code> tags

Created on 24 Apr 2019  路  16Comments  路  Source: gatsbyjs/gatsby

Description

When using the automatic TOC generation from gatsby-transformer-remark with markdown section titles that contain code in backticks, the generated TOC heading contains escaped HTML <code> tags that show up in the document. Instead of actually emitting the <code> tag in the HTML string, it's escaped and the HTML is rendered as text.

Steps to reproduce

Create a markdown remark with a heading like this:

## Generating TOCs with `gatsby-transformer-remark`

This will generate the following HTML string for the TOC:

<ul>\n<li><a href=\"/demo/#generating-tocs-with-code-classlanguage-textgatsby-transformer-remarkcode\">Generating TOCs with &#x3C;code class=\"language-text\">gatsby-transformer-remark&#x3C;/code></a></li>\n</ul>

Expected result

The <code> tag shouldn't be escaped and should be emitted directly

Actual result

The <code> tag is escaped, causing it to be rendered as text rather than as HTML.

Environment


  System:
    OS: macOS 10.14.3
    CPU: (12) x64 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
    Shell: 5.3 - /bin/zsh
  Binaries:
    Node: 10.15.0 - ~/.nvm/versions/node/v10.15.0/bin/node
    Yarn: 1.15.2 - /usr/local/bin/yarn
    npm: 6.4.1 - ~/.nvm/versions/node/v10.15.0/bin/npm
  Languages:
    Python: 2.7.10 - /usr/bin/python
  Browsers:
    Chrome: 73.0.3683.103
    Safari: 12.0.3
  npmPackages:
    gatsby: ^2.1.17 => 2.1.17 
    gatsby-cli: ^2.4.8 => 2.4.11 
    gatsby-image: ^2.0.29 => 2.0.29 
    gatsby-plugin-feed: ^2.0.14 => 2.0.14 
    gatsby-plugin-google-analytics: ^2.0.8 => 2.0.14 
    gatsby-plugin-manifest: ^2.0.5 => 2.0.19 
    gatsby-plugin-offline: ^2.0.11 => 2.0.24 
    gatsby-plugin-react-helmet: ^3.0.0 => 3.0.6 
    gatsby-plugin-sass: ^2.0.11 => 2.0.11 
    gatsby-plugin-sharp: ^2.0.15 => 2.0.22 
    gatsby-plugin-typescript: ^2.0.1 => 2.0.9 
    gatsby-plugin-typography: ^2.2.1 => 2.2.7 
    gatsby-remark-images: ^3.0.1 => 3.0.5 
    gatsby-remark-prismjs: ^3.1.4 => 3.2.4 
    gatsby-source-filesystem: ^2.0.4 => 2.0.22 
    gatsby-source-spotify: ^1.0.1 => 1.0.1 
    gatsby-transformer-csv: ^2.0.4 => 2.0.7 
    gatsby-transformer-json: ^2.1.5 => 2.1.8 
    gatsby-transformer-remark: ^2.1.15 => 2.2.6 
    gatsby-transformer-sharp: ^2.1.4 => 2.1.14 
help wanted confirmed

Most helpful comment

I just wanted to thank @d4rekanguok, @wooorm, and @barrymcgee for coming up with the solution and getting it implemented in the third party libraries involved. I'm sorry it took me so long to do my part, but I'm really happy we were able to get the fix landed :)

Special thanks to @d4rekanguok for mentoring me on this one; I appreciate you helping me make a real non-trivial contribution to an open source project as prestigious as this!

All 16 comments

This looks like the same thing as #5764, but I'm not using the gatsby-remark-autolink-headers plugin. I am using gatsby-remark-prismjs though. I tried putting it at the end of the plugins array, but the same thing happened.

Here's my gatsby-config.js file: https://github.com/Ameobea/homepage/blob/master/gatsby-config.js

gatsby-remark-prismjs is indeed the culprit here... it looks like table of content is always generated after all plugins have applied to the markdown ast though -- perhaps there is a way to tell gatsby-remark-prismjs to ignore inline code in headings?

I made a dirty edit of gatsby-remark-prismjs that ignore inline code in headings, you can give it a test:

yarn add d4rekanguok/gatsby-remark-prismjs-no-heading

and replace gatsby-remark-prismjs in your config with gatsby-remark-prismjs-no-heading.

All it does is replace unist-util-visit with unist-util-visit-parents, so we can get some information about the direct parent & do nothing if it's inside a heading block.

pinging @pieh, what do you think about this approach? It could be a bit more generalized i.e user can pass in an option to ignore inline code in certain elements etc.

@d4rekanguok yep that fixed it for me! Thanks so much. Looking forward to seeing the fix upstream 馃憤

@d4rekanguok I'm not sure I like the idea - as it would remove <code> completely from heading text (not only from ToC links / heading ids)? (i.e. in https://www.gatsbyjs.org/docs/migrating-from-v1-to-v2/#what-well-cover)

I'm not sure if there's better alternative tho

@pieh We'd still keep the <code> in headings, since toHAST turn inline markdown code to <code> anyway (remark-prismjs turns them into html node - which cause the issue in the first place); but it's a bit of annoying because the styling that comes with remark-prismjs is gone:

image

But it looks like all it does is attaching a class name to inline code -- could we just add a node.data.hProperties to the markdown node in remark-prismjs, so toHAST can attach the class name? This way, toc won't get raw html <code>

That works, I managed to retain the class name generated by remark-prismjs for inline code inside headings while toc is untouched

image

If we could rewrite highlight-code.js so that it retains the markdown format, I think we could actually keep code highlighting inside header as well! No actually that won't work, I'm stumped at making inline highlight work for now

I'm specifically targeting heading though, so it feels a bit hacky...

if (parent.type === `heading`) {
  if (!node.data) node.data = {};
  node.data.hProperties = { className: [className] }
}
else {
  node.type = `html`;
  node.value = `<code class="${className}">${highlightCode(languageName, node.value)}</code>`;
}

I think we potentially try to approach this from wrong direction - right now we are trying to workaround that < become &#x3C; and therefore code and class and classname end up in toc - should we rather try to fix that (if not possible in our code, send PR to upstream packages)?

oh, do you mean to strip out html code during mdast to toc transformation? That would certainly be a lot cleaner

I don't have much knowledge of gatsby, or the gatsby ecosystem, so I'm looking at this issue with a 10,000 foot view

In regards to syntax highlighting, remark-highlight.js makes the following suggestion

This package integrates with remark-html. It may be better to work with rehype, which is specifically made for HTML, and to use rehype-highlight instead of this package.

I'm wondering if this issue stems from a similar circumstance?

Looking at gatsby-remark-prismjs, it looks like it operates on the markdownAst. If it used the htmlAst instead, you could do use rehype-prism or rehype-highlight to add syntax highlighting

@BarryThePenguin Thank you for the quick response, and great suggestions!

rehype-prism looks really interesting. However in the current Gatsby remark plugin system, it's not possible for user to apply plugins to the transformed htmlAst.

I was exploring the option of turning gatsby-remark-prismjs to retain markdownAst structure, so that no html node is passed into mdast-util-toc, but even if this works for remark-prismjs, other plugins from the community may still add html node to the markdownAst and face the same issue.

Because of this, I was hoping to be able to resolve html nodes with mdast-util-toc

Hiya!

This issue has gone quiet. Spooky quiet. 馃懟

We get a lot of issues, so we currently close issues after 30 days of inactivity. It鈥檚 been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contributefor more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 馃挭馃挏

Right now, I'm using @d4rekanguok's workaround library for my personal site. From what I've seen here, it doesn't look like there has been any generally accepted solution, but several options have been proposed.

If someone more familiar with the Gatsby ecosystem wants to propose their preferred way forward, I'd be willing to do some work on this and try to get it resolved.

@Ameobea Hey Casey! The fix upstream (thanks to the awesome work of wooorm & Barry) has been merged but I haven't got the chance to continue to work on this.

In short, the problem was that gatsby-remark-prismjs injects html into the headings, but mdast-util-toc ignore the embed code in headings. In the lasted version, phrasing contents (emphasis, html code, etc. ) will be supported.

Here's what need to be done:

  • [ ] upgrade gatsby-transformer-remark to use the latest version of mdast-util-toc
  • [ ] use allowDangerousHTML on toHAST & hastToHTML in this line
-  toc = hastToHTML(toHAST(tocAst.map))
+  toc = hastToHTML(toHAST(tocAst.map, { allowDangerousHTML: true }), { allowDangerousHTML: true })
  • [ ] the generated links is not so pretty right now, i.e # Hello `world` will turn into something along the line of <a href="#hellocodeclasslang-textworld"><h1>Hello <code>world</code></h1></a>, I was thinking to do some work to make it nicer
  • [ ] add test to cover these new cases

Would be great if you want to give it a try!

I just wanted to thank @d4rekanguok, @wooorm, and @barrymcgee for coming up with the solution and getting it implemented in the third party libraries involved. I'm sorry it took me so long to do my part, but I'm really happy we were able to get the fix landed :)

Special thanks to @d4rekanguok for mentoring me on this one; I appreciate you helping me make a real non-trivial contribution to an open source project as prestigious as this!

I think you meant to give credit here to @BarryThePenguin, not me :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hobochild picture hobochild  路  3Comments

KyleAMathews picture KyleAMathews  路  3Comments

dustinhorton picture dustinhorton  路  3Comments

kalinchernev picture kalinchernev  路  3Comments

signalwerk picture signalwerk  路  3Comments