Currently, segmentHTMLToShortcodeBlock seems to support the conversion of shortcodes to blocks in a non-recursive way. Nested shortcodes — i.e. shortcodes residing within shortcodes don't seem to be supported.
The following Classic block contents will be converted into multiple blocks as long as registerBlockType has the appropriate transforms: {} block:
[contact-field label='Name' type='name' required='1'/]
[contact-field label='Email' type='email' required='1'/]
[contact-field label='Website' type='url'/]
[contact-field label='Message' type='textarea'/]
This series of contact form fields gets converted into multiple separate contact form field blocks as segmentHTMLToShortcodeBlock iterates through the Classic block. Note that they are not contained within a [contact-form] shortcode.
[contact-form to="[email protected]" subject="Email from website"][contact-field label="Email" type="email" required="1"][contact-field label="Message" type="textarea"][/contact-form]
Here, the [contact-form] shortcode gets read, but segmentHTMLToShortcodeBlock reads the nested/contained contact-field shortcodes as what it seems to assume to be inner HTML content.
One thing I’ve noticed is that Both isMatch and transform: ( { content } ) => { /* some code */ } ); simply don’t seem to be used with shortcodes.
[contact-form to="[email protected]" subject="Email from website"]
[contact-field label="Email" type="email" required="1"]
[contact-field label="Message" type="textarea"]
[/contact-form]
When those nested shortcodes get converted during HTML editing of a classic block, they get wrapped into a paragraph block. This may be something to consider when working on this ticket.
The best way forward to to modify segmentHTMLToShortcodeBlock so that it reads shortcodes found within shortcodes (currently treated as innter content and not read by transform), either completely recursively or at least one level down from a parent shortcode.
The supprot of isMatch and transform: ( { content } ) => { /* some code */ } ); in registerBlockType also needs to be clarified when it comes to the interpretation of shortcode's inner content, whether it is plain text, HTML or other shortcodes.
There's a challenge here and that is that shortcodes don't have a clear semantic when nested. Many of the shortcode parsers will end up failing to legitimately find them, and worse, the behaviors are simply undefined. It's up to the shortcode itself to determine whether or not to nest. This is one of the reasons we are trying to replace shortcodes with the block, an unambiguous grammar.
What is the end-goal you are trying to accomplish here? The way I think this would have to work if we are to do it reliably is leave the matching up to the transform converting to a shortcode. That is, could we accomplish this by introducing a new filter on segmentHTMLToShortcodeBlock such that something like the contact form could hook in there and _if_ a contact form shortcode is detected, then continue parsing the inside and return the nested shortcodes in whatever structure it wants?
Thanks for providing such clear examples!
@dmsnell What I'm trying to do specifically, is to make the Jetpack contact form convertible into blocks, facilitating the migration from nested shortcodes into nested blocks.
Perhaps the Block API documentation should be more explicit about the limitations of converting shortcodes?
Have you tried the Jetpack 8.6 Beta? From what I've heard they added Gutenberg support there. Perhaps contact forms already works there.
@swissspidy: Perhaps I didn't give enough context, but I am working on the Jetpack 6.8 release.
make the Jetpack contact form convertible into blocks
@aldavigdis this is where I wonder if we could just do things manually for now. shortcodes have unfortunately always suffered from the ambiguity and non-uniformity of nesting behaviors.
when you are trying to get the contact form shortcode, what does it provide you inside of that? can we take what's inside, parse it in a traditional way, and convert the insides into the inner blocks we need?
is this all in JS or is it also in PHP? in PHP we have do_shortcode() which we might be able to use and in JS we might have to write something on our own, though _if_ we know what we should expect as output from the contact form block we _should_ be able to do a reasonable job parsing it. (in contrast, it's hard to do a general solution since each plugin might treat nested shortcodes differently)
@aldavigdis Thanks for the clarification. I first assumed so, but I didn't see any Jetpack/Automattic stuff when I looked at your profile. That's why I mentioned it.
To get back to the original question:
I don't think Gutenberg should really deal with nested shortcodes itself. They're quite an edge case and have never been fully supported in WordPress core.
Isn't it possible to the conversion yourself in Jetpack by defining transforms in the custom comment form block instead of relying on the block editor to do this for you?
Our backup plan in case this sort of conversion won't be supported by Gutenberg is just to fall back to a default state for our contact form block. So, the form block would default on a couple of nested form field blocks and the user will then have to manually edit the form.
It would make the most sense for having this sort of transformation supported directly in Gutenberg, as there are others that may depend on it for things such as migrating from other page builders, which I'm pretty sure depend on nested blocks to a certain extent.
We are running out of time ourselves and it's very regrettable that we didn't spot this sooner, as this seems to be a fundamental thing in Gutenberg. However, I am leaving this issue open, for anyone who wants to tackle this in the future.
Do also note that this is not something we can (or should) just handle in the backend as far as I know. We are trying our best to do things the Gutenberg way from day one without any hacks.
Thanks @aldavigdis. I'd like for us to close this issue noting what @swissspidy said - this isn't so much a Gutenberg issue as it is a WordPress issue that's far too late to change.
_If_ (and that's a big IF) the Jetpack shortcodes are reliably regular then we can manually parse them out. I tossed together a basic grammar for doing that which could be directly used to generate a parser with pegjs or it could serve as a specification to build again.
What is _reliably regular_?
Shortcodes _without_ a closing never appear _with_ a closing
If [contact-field] _could_ come with a self-closing / or with a closing tag around input then we can't parse out the following input: there is no "correct" parse as it's inherently ambiguous.
[contact-field label="First Name" type="textarea"]
[contact-field label="Last Name" type="textarea"]
Don't forget your closers!
[/contact-field]
It's not defined which of the first two contact field shortcodes the last closer belongs to.
[contact-field label="High" type="textarea" /]
[contact-field label="Low" type="textarea"]
That's the story I have to tell you.
In this case we should expect to find a closing somewhere for the second contact field. Clearly the first one was self-closing and somewhat implies that there was zero content inside of it. But, since the second one is missing the self-closer it somewhat implies that inner content follows and then a closing. Is the last chunk of text supposed to be inside the shortcode and we have a missing closer or do we have a missing self-closer?
You can load the PEG explorer at https://dmsnell.github.io/peg-parser-explorer/ and then paste in the following grammar then paste in your sample document at the top and play around with different inputs.
Document
= (Shortcode / $((!Shortcode .)+))*
Shortcode
= SelfClosingShortcode
/ ClosingShortcode
/ NonClosingShortcode
SelfClosingShortcode
= "[" n:ShortcodeName __ a:AttributePairs? __ "/]"
{ return { name: n, attrs: a || {} } }
ClosingShortcode
= o:ShortcodeOpening d:(Shortcode / $((!Shortcode !ShortcodeClosing .)+))* c:ShortcodeClosing
& { return o[0] === c }
{ return d.length ? { name: o[0], attrs: o[1], children: d } : { name: o[0], attrs: o[1] } }
NonClosingShortcode
= o:ShortcodeOpening
{ return { name: o[0], attrs: o[1] } }
ShortcodeOpening
= "[" n:ShortcodeName __ a:AttributePairs? __ "]"
{ return [ n, a || {} ] }
ShortcodeClosing
= "[/" n:ShortcodeName "]"
{ return n }
ShortcodeName
= $([a-z0-9-]+)
AttributePairs
= a:Attribute as:(_+ ia:Attribute { return ia })*
{ return [ a ].concat( as ).reduce( function( out, next ) {
out[ next[ 0 ] ] = next[ 1 ];
return out;
}, {} ) }
Attribute
= n:AttributeName "=" v:AttributeValue
{ return [ n, v ] }
AttributeName
= $([a-z0-9-]+)
AttributeValue
= '"' v:$(!'"' .)+ '"' { return v }
/ "'" v:$(!"'" .)+ "'" { return v }
/ $([0-9]+)
__
= _*
_
= [ \t\n\r]
I guess the point I was trying to make in that last comment is that while this parse may provide you with the results you expect from Jetpack shortcodes (if it even does that) then it will surely provide the wrong results for someone else simply because this stuff has been undefined and up to the plugin author forever in WordPress.
Like I say — I agree this is coming up way too late for both parties and I wish this could have been spotted far earlier, as this is a fundamental thing — and I don't think Gutenberg should have specific code related to Jetpack at all.
Closing the ticket.