gatsby-transformer-remark not transforming excerpts

Created on 9 Mar 2018 · 28Comments · Source: gatsbyjs/gatsby

I am using gatsby-transformer-remark with the following configuraton:

{
      resolve: `gatsby-transformer-remark`,
      options: {
        excerpt_separator: `<!--more-->`,
        plugins: [
          {
            resolve: `gatsby-remark-images`,
            options: {
              maxWidth: 590,
            },
          },
          {
            resolve: `gatsby-remark-responsive-iframe`,
            options: {
              wrapperStyle: `margin-bottom: 1.0725rem`,
            },
          },
          'gatsby-remark-prismjs',
          'gatsby-remark-copy-linked-files',
        ],
      },
    },

According to this post https://using-remark.gatsbyjs.org/excerpts/ I am querying excerpts from my articles like this:

export const pageQuery = graphql`
  query IndexQuery {
    site {
      siteMetadata {
        title
      }
    }
    allMarkdownRemark(sort: { fields: [frontmatter___date], order: DESC }) {
      edges {
        node {
          excerpt
          fields {
            slug
          }
          frontmatter {
            date(formatString: "DD MMMM, YYYY")
            title
            tags
            featuredImage {
                childImageSharp{
                    sizes(maxWidth: 200) {
                        ...GatsbyImageSharpSizes_tracedSVG
                    }
                }
            }
          }
        }
      }
    }
  }
`

My article in markdown uses a separater defined in config above:

title: Break the Java Generics Naming Convention?
date: "2016-04-30T22:12:03.284Z"
tags: ['Java']
path: '/break-java-generics-naming-convention'
featuredImage: './break-java-generics-maning-convention.jpg'
---

Choosing **descriptive**, *intent-revealing* names is one of the key principles of writing clean code, which is easy to read and understand and usually requires little to no comments. According to the official convention, generic type parameter names are supposed to be just one capital letter. Should you break this convention to make your names more descriptive?
<!--more-->

However, the excerpt I am getting is plain text and not transformed from markdown:

clipboard02

Is this intended behavior? How can I get excerpt which is already formatted? Thank you.

"gatsby": "^1.9.223",
"gatsby-transformer-remark": "^1.7.34",

Node: v9.5.0
NPM: 5.6.0

Source

vojtechruz

👍12

Most helpful comment

Yeah!

So - we added an HTML addition to the excerpt functionality relatively recently. Using this HTML excerpt functionality will transform the excerpt from plain text to HTML, if there's HTML to be transformed. For example, in the above example, the excerpt would translate italics/bold/etc. into their corresponding HTML tags.

It can be used like so:

{
  # grab your markdownRemark node however you prefer
  markdownRemark(id: "123412341234") {
    excerpt(format: HTML)
  }
}

This still works with the excerpt_seperator functionality.

Here's an example repo demonstrating this functionality

Here's the Markdown file showing the separator

Going to close this as answered, but please feel free to reply if we can help further. Thanks all!

DSchau on 4 Mar 2019

🎉7

All 28 comments

This only happens when using:
excerpt_separator

Otherwise, excerpt contains plain text with stripped markdown formatting.

vojtechruz on 12 Mar 2018

It is not so bad that you lose formatting such as bold or italics, but what's not nice is that escape slashes are preserved:
Eg:
How to make the most of the IntelliJ IDEA\'s view...

vojtechruz on 12 Mar 2018

It looks like the gatsby-transformer-remark's excerpt handling is here. If you or anyone else is interested in doing some investigation, a PR would be great.

m-allanson on 16 Mar 2018

I've been digging around and I don't see anything obvious.

It took me a while, but I figured out that the excerpt_separator is actually defined and used by grey-matter.

I can't figure out where the transformation happens and what is passed to grey-matter.

Can someone point me in the right direction? This is really annoying.

docwhat on 18 Mar 2018

Although not directly related to this issue, the markdown plugin doesn't parse the title or any part of the frontmatter either.

omeid on 22 Apr 2018

The docs here say

you can specify an excerpt_separator, as well as any of the other options mentioned here, in the gatsby-config.js file.

However neither excerpt_separator or excerpt, docs, which should allow you to pass a function for generating the excerpt result it anything other than the default 140 char excerpt.

"gatsby": "^1.9.131",
"gatsby-transformer-remark": "^1.7.15"

Undistraction on 29 Apr 2018

The issue is at extend-node-type.js, line 128. It passes node.internal.content through the remark processor at that point, getting an AST, which it then does further stuff with and eventually transforms to Html, HtmlAst, or other forms (eg, in getHtml() on line 276).

The problem is that it doesn't do anything with node.excerpt. So that just passes through the whole process. And, at line 339, it then resolves the promise with the unprocessed excerpt.

To fix this, we'd have to either add the excerpt to the cache (as its own entry, or by storing a pair of ASTs), or just assume that the excerpt will only really be parsed once and do it directly. I assume we still want to keep the async processing and all the plugins, so it seems to make sense to me to change the cached value to be a pair of ASTs, with two separate promise chains to generate them.

Extracting a function that applies all the plugins, then does the remark processing, then applies all the plugins again, would make it easier to re-use the entire functionality to do both parts - especially as this is all async.

I don't know this code at all, so would prefer to leave changing it to someone who knows better. Hope this helps unblock the bug - I really want it fixed!

arlobelshee on 29 May 2018

👍3

I just submitted PR #5586 with a possible fix for this. It implements my above-mentioned plan. However, I am not a committer, so am not sure if it works.

arlobelshee on 29 May 2018

👍2

I could use some help from someone who knows the code better. It seems like my change has somehow altered the type of node.excerpt (even though I didn't change its declared type from string). I have no idea where and how the schema expectations are set. I also don't know whether it is a good thing for the excerpt to now have HTML in it, or to create a new node.excerptFormatted which does.

Please help!

arlobelshee on 30 May 2018

👍1

This is still a problem. Any chance of this PR getting merged, or can anyone share a workaround?

strubell on 8 Jul 2018

👍5

@strubell, here's a workaround I've been using:

node.html.split("<!--more-->")[0]

:see_no_evil:

MrSaints on 8 Jul 2018

👍4

That works for simple things. But if you have things like anchor links in your post the URL needs to be the slug + the link of the markdown. I guess one could massage the links in the freshly split HTML but that seems nasty.

karl-run on 31 Aug 2018

Hey, any updates on this? I just stumbled upon this issue as well.

resir014 on 6 Oct 2018

👍3

Old issues will be closed after 30 days of inactivity. This issue has been quiet for 20 days and is being marked as stale. Reply here or add the label "not stale" to keep this issue open!

gatsbot[bot] on 20 Jan 2019

not stale

vojtechruz on 21 Jan 2019

👍5

+1 Not stale.

viktorbengtsson on 23 Jan 2019

Hello, any updates on this.

lednhatkhanh on 3 Mar 2019

Yeah!

It can be used like so:

{
  # grab your markdownRemark node however you prefer
  markdownRemark(id: "123412341234") {
    excerpt(format: HTML)
  }
}

This still works with the excerpt_seperator functionality.

Here's an example repo demonstrating this functionality

Here's the Markdown file showing the separator

Going to close this as answered, but please feel free to reply if we can help further. Thanks all!

DSchau on 4 Mar 2019

🎉7

@DSchau any suggestions for if we just want the text of the excerpt as per the default excerpt behaviour, but with a defined separator for some posts and relying on the prune value for the rest?

StormPooper on 5 Mar 2019

👍1

This still works with the excerpt_seperator functionality.

@DSchau this still doesn't work with format:PLAIN with custom separator.

I cloned your example and changed format from HTML to plain in BlogIndex:

          excerpt(format: PLAIN)
          fields {
            slug
          }

Textual excerpts don't show properly:
Screen Shot 2019-04-02 at 6 50 53 PM

Am I doing something wrong?

markvital on 2 Apr 2019

@markvital what would you expect to happen there?

That _is_ plain text. Would you expect Markdown specific stuff to be stripped?

DSchau on 2 Apr 2019

Oh I see.
@DSchau Yes, I expected all tags to be removed, so I can insert clean text into meta tags, as described in docs.
This is how it worked for me, until I set "excerpt_separator": ` in gatsby-config.js.

markvital on 2 Apr 2019

👍1

@DSchau awesome, I just stumbled on this problem and I'm glad I don't have to work around it! One problem though: if I set my excerpts to (format: HTML) and use a custom excerpt_separator, when the file doesn't contain the separator it creates the excerpt from the last paragraph of the text. It used the first 140 characters before I changed the format.

corvidism on 12 May 2019

One problem though: if I set my excerpts to (format: HTML) and use a custom excerpt_separator, when the file doesn't contain the separator it creates the excerpt from the last paragraph of the text. It used the first 140 characters before I changed the format.

That’s issue #12386

_{Sent with GitHawk}

docwhat on 12 May 2019

I don't think the original issue is solved by excerpt(format: HTML) since HTML tags are not suitable for SEO metadata. We probably need fix for PLAIN's behaviour.

Like @markvital mentioned, if I don't specify excerpt_separator, the markdown annotations like italic/bold/link are processed to HTML and excerpt only contains visible text strings (like innerHTML of a tag). However, if I specify excerpt_separator, the original markdown content before the separator is returned as excerpt without any markdown and HTML processing (the raw markdown text).

Both behaviours should be consistent, preferably the former, I think.

riywo on 3 Jun 2019

👍2

Since this issue has been closed, do we need another issue?

FYI: This is an example unit test to see how it works differently with excerpt_separator https://github.com/riywo/gatsby/commit/4178bbd1545c83177a944142ee827e00e0f86422

 FAIL  src/__tests__/extend-node.js (6.786s)
  ● Excerpt is generated correctly from schema › excerpt does have missing words and extra spaces with excerpt_separator

    expect(received).toMatch(expected)

    Expected value to match:
      "Where oh where is that pony?"
    Received:
      "
    Where oh [*where*](nick.com) **_is_** ![that pony](pony.png)?

    "

      419 |       `,
      420 |     node => {
    > 421 |       expect(node.excerpt).toMatch(`Where oh where is that pony?`)
          |                            ^
      422 |     },
      423 |     { pluginOptions: { excerpt_separator: `<!-- end -->` } }
      424 |   )

      at toMatch (src/__tests__/extend-node.js:421:28)
      at test (src/__tests__/extend-node.js:91:11)

riywo on 4 Jun 2019