Gutenberg: Strip Blocks from Auto-Generated Excerpts

Created on 11 Jun 2018  Â·  10Comments  Â·  Source: WordPress/gutenberg

Describe the bug
When the function get_the_excerpt() is called, and no manual excerpt exists, wp_trim_excerpt() generates an excerpt from the post content.

The issue is that when the beginning of the post contains a Gutenberg block, it will be rendered. But only the textual content will survive, as wp_strip_all_tags() is called on the generated output.

Excerpts are supposed to be text only, and Gutenberg should not break this convention.

To Reproduce

  1. Create a post, and insert a Gutenberg block outputting a mixture HTML tags and text at the top.
  2. Save the post.
  3. View the generated excerpt in any theme using get_the_excerpt().

Expected behavior
Gutenberg blocks should not be rendered in excerpts.

The way I see it, a block and a shortcode are very similar, so they should be treated the same: being stripped.

Related: #5572

Backwards Compatibility [Feature] Blocks

Most helpful comment

Thanks @chrisvanpatten, this is exactly the clarification I was looking for.

All 10 comments

The way I see it, a block and a shortcode are very similar, so they should be treated the same: being stripped.

I agree with this assessment.

Might have to put an exception in for certain blocks for this to work properly. Paragraphs, list, blockquotes, etc. would all be stripped if you just blindly stripped all blocks.

@fklein-lu Can you include more detail of your expected behavior?

@chrisvanpatten Do you understand this issue well-enough to include full reproduction steps and detail of expected behavior?

@danielbachhuber I'm not sure what you mean — I didn't report the issue. Want me to take it over? Happy to help, just not sure what you need :)

@chrisvanpatten Sure, if you'd like. I'll close otherwise. I need more detail about how to reproduce this issue and what the expected results are. I started looking into it 12 days ago but got stuck when I realized I didn't fully understand the nature of the request.

@danielbachhuber I think it's just fundamentally a design choice (not in the visual sense, but in the broader "how should Gutenberg work" sense).

If you don't pass an explicit excerpt, wp_trim_excerpt() (which is ultimately used by get_the_excerpt() and the_excerpt()) generates an excerpt.

As part of this process, a few things happen.

  1. The raw content is pulled via get_the_content()
  2. The content passes through strip_shortcodes()
  3. The the_content filter is run (critically, this causes Gutenberg blocks to be rendered via the do_blocks function)
  4. The "rendered" content passes throughwp_trim_words()…

    • …which itself runs wp_strip_all_tags(), stripping all HTML tags but preserving text contained within them

  5. The resulting excerpt is filtered again (via a wp_trim_excerpt filter) and returned.

This means that blocks, or at least the text from all blocks, are included in an auto-generated excerpt. Image gallery captions, text from a cover image block, any text from server-rendered blocks (like post names/authors/etc in the Latest Posts block), possibly text from certain embeds, etc. are all potentially included in this output. These are pieces of content that would probably be included via shortcode in a pre-Gutenberg world, and thus would _not_ have been part of the excerpt (because of the strip_shortcodes() call).

Ultimately I think the question is: should there be a strip_blocks() happening alongside (or eventually in place of) that strip_shortcodes() call, to remove all that "text-you-probably-don't-want-in-an-excerpt" content?

I think that's a reasonable request but also adds an additional level of complexity because stripping _all_ blocks also means the text you would probably want included an excerpt (e.g. paragraphs, list items, blockquotes, etc.) would be stripped, since now _everything_ is a block (vs. before, where you had "shortcodes" and "everything else").

So perhaps strip_blocks() would need to work more selectively, sort of like a blend between strip_shortcodes() and wp_kses(), e.g.…

strip_blocks( $content, [
    'core/paragraph',
    'core/list',
    'core/blockquote',
] );

…where the second parameter is a list of blocks that will be preserved. In the case of wp_trim_excerpt() that second parameter could be filtered in case block authors have "textual" blocks (we have a custom "section header" block on a project, for instance) that they want included in generated excerpts.

Anyhow, this got much longer than I expected so I'll stop here. Would love to hear thoughts from the team :)

Thanks @chrisvanpatten, this is exactly the clarification I was looking for.

How about only stripping dynamic blocks? Keeping blocks that save their markup in post content should maintain the current behavior, no?

The Gutenberg implementation involves some hacky code that adds strip_dynamic_blocks to the the_content filter then removes it again when wp_trim_excerpt is run - the function invokes the filter hook.
This assumes that wp_trim_excerpt will be run.
In one of my plugins I had replaced wp_trim_excerpt with my own code.

remove_filter( "get_the_excerpt", "wp_trim_excerpt" );
add_filter( "get_the_excerpt", "oik_get_the_excerpt" );

Since wp_trim_excerpt is no longer being run the strip_dynamic_blocks hook is not being removed. So my dynamic blocks did not appear when the post was being displayed singly and WordPress SEO is activated.

My workaround is to not remove the filter.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nylen picture nylen  Â·  3Comments

BE-Webdesign picture BE-Webdesign  Â·  3Comments

mhenrylucero picture mhenrylucero  Â·  3Comments

moorscode picture moorscode  Â·  3Comments

aduth picture aduth  Â·  3Comments