Gutenberg: New lines in shortcode content are ignored

Created on 14 Jan 2018 · 6Comments · Source: WordPress/gutenberg

Issue Overview

The new editor does not respect new lines in shortcode content. It's impossible to use the new editor to edit posts which depend on this. For me this is a blocker; it breaks my sites.

I have a number of shortcodes where each line in the content is expected to be processed separately. With Gutenberg v2.0.0 it's impossible to safely edit a post containing these shortcodes without the content being broken. Take for example the [bw_csv] shortcode which converts CSV content into a table.

[bw_csv]One,Two,Three
A,B,C
D,E,F
[/bw_csv]

During shortcode expansion the first line becomes table the table heading.
Subsequent lines are table rows. So the expected output is like this.

One | Two | Three
--- | ---- | ----
A | B | C
D | E | F

When the new lines are removed by the editor the shortcode does not expand correctly. Everything is treated as the table heading.

The basic problem can be demonstrated even without the shortcode being active.
In the current editor you can quite happily switch between Visual and Text and the site is not broken.
Once the content is converted the new lines in the original source are lost.

Steps to Reproduce (for bugs)

Create a new post using Posts > All Posts > Add New > Classic Editor
Enter content as in the shortcode above
View the post.
The content should appear as typed.

Repeat the above steps for another new post

Now Edit the post using Gutenberg
The new line characters do not survive the conversion.
Any attempt to recreate the original content using Gutenberg will fail.
Switching to the Code editor to remove unwanted paragraphs ( #3900 ) doesn't work as that ignores new lines as well. They disappear as soon as you click on Update.

I've also tried using different block types: Shortcode, Custom HTML.

Expected Behavior

The editor should not be removing new lines.
It should not add anything either.
It works in WordPress 4.9.1
In my opinion the Shortcode block should support content created AS IS. It should respect any new line characters in the content.

Current Behavior

I've been unable to find any existing block that respects my new lines.

Custom HTML strips the new lines on save.
The shortcode block also strips new lines.

Possible Solution

Shortcode block should respect the content as typed.

Screenshots / Video

Related Issues and/or PRs

3900

1709 - see https://github.com/WordPress/gutenberg/issues/1709#issuecomment-321574711

Todos

[ ] Tests
[ ] Documentation

Backwards Compatibility [Feature] Shortcodes [Type] Bug

Source

bobbingwide

👍5

Most helpful comment

During WordCamp London contributor day I discussed this issue with @dmsnell. He suggested revisiting the reason that the HTML beautifier was implemented in #633.
So the original question stands... why do we need to invoke beautifyHtml at all?

bobbingwide on 15 Apr 2018

👍2

All 6 comments

During a direct discussion with @aduth regarding finding a solution to respect new lines in the content for my CSS block, Andrew commented

Hmm, it seems like the beautifier can be a bit aggressive. Maybe need some metrics to tell it to not perform (maybe if there’s no tags? if a block opts out?)

I'm going to look into the options.

But first, I'd like to know why the function is needed at all?
For the shortcode block, replacing

    return beautifyHtml( content, {
        indent_inner_html: true,
        wrap_line_length: 0,
    } );

with

 return content;

appeared, at first glance, to resolve my issues nicely.
Why do other blocks need it?

bobbingwide on 7 Apr 2018

👍1

Related: #633

With work toward a custom serializer in #5897, I'm wondering if we ought to just build some basic beautification into the serializer, rather than a step which occurs after block output has been generated. This way we can have more awareness of what should and shouldn't have changes applied.

aduth on 8 Apr 2018

bobbingwide on 15 Apr 2018

👍2

I'll drop by here as well and share a brief thought on beautification, which actually I like. We have a fundamental problem with post_content as soon as we have shortcodes because they immediately break anything that operates on HTML. Something like jsbeautify is probably guaranteed to break in random unexpected ways because it thinks it has HTML when actually it doesn't - the same kinds of problems we have had with WordPress posts for years.

That is, if we want to beautify we probably need to build a custom beautifier and _attempt_ to preserve shortcodes. Frankly, this is probably also a losing venture since we cannot preserve shortcodes in the general sense because they are unspecified. However, we might be able to preserve shortcodes within a block and make a hard cut-off saying "no shortcodes may span block boundaries."

Actually this problem makes me want to cry because basically I can't see any practical way to beautify _and_ not destroy shortcodes. We could build another layer in the parser to detect _some_ shortcodes which wrap content (but we could not get them all) and then wrap those in something before beautifying to try to preserve them, but that also would add considerable computational complexity.

Is there a strong enough justification for beautification to take us head-to-head with this problem?
Should we consider adding a warning whenever we detect a shortcode in the post to say that "oh yeah this might blow up the world and all that's good?"
Do we need to start thinking about a kind of inline-block whereby shortcodes could be contained and protected automatically?

dmsnell on 16 Apr 2018

I have a local branch which contains most of the work toward building beautification into the serializer, in a way which is not particularly opinionated (e.g. if newlines existed in a shortcode, the serializer would output them verbatim). The remaining work is largely edge cases, but the recurring pain point in working through them is an understanding of what we mean with "beautification".

As I understand it, the original issue #633 was targeted at what was at-the-time a large continuous string of text for the entirety of post content. A simplest solution may have been to add newlines between block demarcations. Still not great for more complex markup; traditionally one would expect some indented hierarchy of elements to be most readable. But what are the rules for said indentation?

js-beautify considers some objectives like ideal line length, which may be overkill for what we're trying to accomplish.

In my branch, the working naive implementation I adopted was "place on own line and indent non-inline descendent nodes", producing something like (from my own test cases):

const result = renderElement(
    <section>
        { '\n\tString' }
        <h2>Hello <em>World!</em></h2>
        <span>On previous line</span>
        <p>This is content.</p>
        <div>
            <span>Nesting:</span>
            <p>Can get out of control.</p>
        </div><pre><code>{ 'foo\nbar\t' }</code></pre>
    </section>
);

<section>
    String
    <h2>Hello <em>World!</em></h2><span>On previous line</span>
    <p>This is content.</p>
    <div><span>Nesting:</span>
        <p>Can get out of control.</p>
    </div>
    <pre><code>foo
bar </code></pre>
</section>

aduth on 16 Apr 2018

FWIW, another example of this is the [sourcecode] shortcode from the SyntaxHighlighter plugin, where you end up with many lines of code squished onto a single line.