gatsby-transformer-remark not transforming excerpts

Created on 9 Mar 2018  Â·  28Comments  Â·  Source: gatsbyjs/gatsby

I am using gatsby-transformer-remark with the following configuraton:

{
      resolve: `gatsby-transformer-remark`,
      options: {
        excerpt_separator: `<!--more-->`,
        plugins: [
          {
            resolve: `gatsby-remark-images`,
            options: {
              maxWidth: 590,
            },
          },
          {
            resolve: `gatsby-remark-responsive-iframe`,
            options: {
              wrapperStyle: `margin-bottom: 1.0725rem`,
            },
          },
          'gatsby-remark-prismjs',
          'gatsby-remark-copy-linked-files',
        ],
      },
    },

According to this post https://using-remark.gatsbyjs.org/excerpts/ I am querying excerpts from my articles like this:

export const pageQuery = graphql`
  query IndexQuery {
    site {
      siteMetadata {
        title
      }
    }
    allMarkdownRemark(sort: { fields: [frontmatter___date], order: DESC }) {
      edges {
        node {
          excerpt
          fields {
            slug
          }
          frontmatter {
            date(formatString: "DD MMMM, YYYY")
            title
            tags
            featuredImage {
                childImageSharp{
                    sizes(maxWidth: 200) {
                        ...GatsbyImageSharpSizes_tracedSVG
                    }
                }
            }
          }
        }
      }
    }
  }
`

My article in markdown uses a separater defined in config above:

title: Break the Java Generics Naming Convention?
date: "2016-04-30T22:12:03.284Z"
tags: ['Java']
path: '/break-java-generics-naming-convention'
featuredImage: './break-java-generics-maning-convention.jpg'
---

Choosing **descriptive**, *intent-revealing* names is one of the key principles of writing clean code, which is easy to read and understand and usually requires little to no comments. According to the official convention, generic type parameter names are supposed to be just one capital letter. Should you break this convention to make your names more descriptive?
<!--more-->

However, the excerpt I am getting is plain text and not transformed from markdown:

clipboard02


Is this intended behavior? How can I get excerpt which is already formatted? Thank you.


"gatsby": "^1.9.223",
"gatsby-transformer-remark": "^1.7.34",

Node: v9.5.0
NPM: 5.6.0

Most helpful comment

Yeah!

So - we added an HTML addition to the excerpt functionality relatively recently. Using this HTML excerpt functionality will transform the excerpt from plain text to HTML, if there's HTML to be transformed. For example, in the above example, the excerpt would translate italics/bold/etc. into their corresponding HTML tags.

It can be used like so:

{
  # grab your markdownRemark node however you prefer
  markdownRemark(id: "123412341234") {
    excerpt(format: HTML)
  }
}

This still works with the excerpt_seperator functionality.

Here's an example repo demonstrating this functionality

Here's the Markdown file showing the separator

Going to close this as answered, but please feel free to reply if we can help further. Thanks all!

All 28 comments

This only happens when using:
excerpt_separator

Otherwise, excerpt contains plain text with stripped markdown formatting.

It is not so bad that you lose formatting such as bold or italics, but what's not nice is that escape slashes are preserved:
Eg:
How to make the most of the IntelliJ IDEA\'s view...

It looks like the gatsby-transformer-remark's excerpt handling is here. If you or anyone else is interested in doing some investigation, a PR would be great.

I've been digging around and I don't see anything obvious.

It took me a while, but I figured out that the excerpt_separator is actually defined and used by grey-matter.

I can't figure out where the transformation happens and what is passed to grey-matter.

Can someone point me in the right direction? This is really annoying.

Although not directly related to this issue, the markdown plugin doesn't parse the title or any part of the frontmatter either.

The docs here say

you can specify an excerpt_separator, as well as any of the other options mentioned here, in the gatsby-config.js file.

However neither excerpt_separator or excerpt, docs, which should allow you to pass a function for generating the excerpt result it anything other than the default 140 char excerpt.

"gatsby": "^1.9.131",
"gatsby-transformer-remark": "^1.7.15"

The issue is at extend-node-type.js, line 128. It passes node.internal.content through the remark processor at that point, getting an AST, which it then does further stuff with and eventually transforms to Html, HtmlAst, or other forms (eg, in getHtml() on line 276).

The problem is that it doesn't do anything with node.excerpt. So that just passes through the whole process. And, at line 339, it then resolves the promise with the unprocessed excerpt.

To fix this, we'd have to either add the excerpt to the cache (as its own entry, or by storing a pair of ASTs), or just assume that the excerpt will only really be parsed once and do it directly. I assume we still want to keep the async processing and all the plugins, so it seems to make sense to me to change the cached value to be a pair of ASTs, with two separate promise chains to generate them.

Extracting a function that applies all the plugins, then does the remark processing, then applies all the plugins again, would make it easier to re-use the entire functionality to do both parts - especially as this is all async.

I don't know this code at all, so would prefer to leave changing it to someone who knows better. Hope this helps unblock the bug - I really want it fixed!

I just submitted PR #5586 with a possible fix for this. It implements my above-mentioned plan. However, I am not a committer, so am not sure if it works.

I could use some help from someone who knows the code better. It seems like my change has somehow altered the type of node.excerpt (even though I didn't change its declared type from string). I have no idea where and how the schema expectations are set. I also don't know whether it is a good thing for the excerpt to now have HTML in it, or to create a new node.excerptFormatted which does.

Please help!

This is still a problem. Any chance of this PR getting merged, or can anyone share a workaround?

@strubell, here's a workaround I've been using:

node.html.split("<!--more-->")[0]

:see_no_evil:

That works for simple things. But if you have things like anchor links in your post the URL needs to be the slug + the link of the markdown. I guess one could massage the links in the freshly split HTML but that seems nasty.

Hey, any updates on this? I just stumbled upon this issue as well.

Old issues will be closed after 30 days of inactivity. This issue has been quiet for 20 days and is being marked as stale. Reply here or add the label "not stale" to keep this issue open!

not stale

+1 Not stale.

Hello, any updates on this.

Yeah!

So - we added an HTML addition to the excerpt functionality relatively recently. Using this HTML excerpt functionality will transform the excerpt from plain text to HTML, if there's HTML to be transformed. For example, in the above example, the excerpt would translate italics/bold/etc. into their corresponding HTML tags.

It can be used like so:

{
  # grab your markdownRemark node however you prefer
  markdownRemark(id: "123412341234") {
    excerpt(format: HTML)
  }
}

This still works with the excerpt_seperator functionality.

Here's an example repo demonstrating this functionality

Here's the Markdown file showing the separator

Going to close this as answered, but please feel free to reply if we can help further. Thanks all!

@DSchau any suggestions for if we just want the text of the excerpt as per the default excerpt behaviour, but with a defined separator for some posts and relying on the prune value for the rest?

This still works with the excerpt_seperator functionality.

@DSchau this still doesn't work with format:PLAIN with custom separator.

I cloned your example and changed format from HTML to plain in BlogIndex:

          excerpt(format: PLAIN)
          fields {
            slug
          }

Textual excerpts don't show properly:
Screen Shot 2019-04-02 at 6 50 53 PM

Am I doing something wrong?

@markvital what would you expect to happen there?

That _is_ plain text. Would you expect Markdown specific stuff to be stripped?

Oh I see.
@DSchau Yes, I expected all tags to be removed, so I can insert clean text into meta tags, as described in docs.
This is how it worked for me, until I set "excerpt_separator": `<!--more--> in gatsby-config.js.

@DSchau awesome, I just stumbled on this problem and I'm glad I don't have to work around it! One problem though: if I set my excerpts to (format: HTML) and use a custom excerpt_separator, when the file doesn't contain the separator it creates the excerpt from the last paragraph of the text. It used the first 140 characters before I changed the format.

One problem though: if I set my excerpts to (format: HTML) and use a custom excerpt_separator, when the file doesn't contain the separator it creates the excerpt from the last paragraph of the text. It used the first 140 characters before I changed the format.

That’s issue #12386

Sent with GitHawk

I don't think the original issue is solved by excerpt(format: HTML) since HTML tags are not suitable for SEO metadata. We probably need fix for PLAIN's behaviour.

Like @markvital mentioned, if I don't specify excerpt_separator, the markdown annotations like italic/bold/link are processed to HTML and excerpt only contains visible text strings (like innerHTML of a tag). However, if I specify excerpt_separator, the original markdown content before the separator is returned as excerpt without any markdown and HTML processing (the raw markdown text).

Both behaviours should be consistent, preferably the former, I think.

Since this issue has been closed, do we need another issue?

FYI: This is an example unit test to see how it works differently with excerpt_separator https://github.com/riywo/gatsby/commit/4178bbd1545c83177a944142ee827e00e0f86422

 FAIL  src/__tests__/extend-node.js (6.786s)
  ● Excerpt is generated correctly from schema › excerpt does have missing words and extra spaces with excerpt_separator

    expect(received).toMatch(expected)

    Expected value to match:
      "Where oh where is that pony?"
    Received:
      "
    Where oh [*where*](nick.com) **_is_** ![that pony](pony.png)?

    "

      419 |       `,
      420 |     node => {
    > 421 |       expect(node.excerpt).toMatch(`Where oh where is that pony?`)
          |                            ^
      422 |     },
      423 |     { pluginOptions: { excerpt_separator: `<!-- end -->` } }
      424 |   )

      at toMatch (src/__tests__/extend-node.js:421:28)
      at test (src/__tests__/extend-node.js:91:11)

Since this issue has been closed, do we need another issue? ...

@riywo I think so. Either a new format TEXT or fix PLAIN and add MARKDOWN

Sent with GitHawk

FYI: I have created a PR to fix this behavior #14723

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hobochild picture hobochild  Â·  3Comments

signalwerk picture signalwerk  Â·  3Comments

dustinhorton picture dustinhorton  Â·  3Comments

andykais picture andykais  Â·  3Comments

ghost picture ghost  Â·  3Comments