Amp-wp: Incorporate AMP validator into post authorship flow, to alert if content within the post editor is not compatible with AMP

Created on 4 Jan 2018  Â·  17Comments  Â·  Source: ampproject/amp-wp

Acceptance Criteria

AC1: On saving a post on /wp-admin/post.php, if there’s post content that’s not compatible with AMP, display an error at the top of the page.
AC2: The error will not display specifically what is incompatible.
AC3: The error will display the following text: “Notice: your post fails AMP validation”.

Most helpful comment

Possible Approach

Hi @westonruter,
Thanks for waiting here. One approach that I've prototyped is front-end validation in Gutenberg (screencast).

Unfortunately, this only applies on editing a block. Not on saving it, as you mentioned.

Here's a very rough prototype that you could paste in the console on a Gutenberg post.php page:

var isValidAMP,
    previousWp = wp,
    el = wp.element.createElement,
    Notice = wp.components.Notice,
    blockType = 'core/html',
    htmlBlock = wp.blocks.unregisterBlockType( blockType ),
    OriginalBlockEdit = htmlBlock.edit;

/**
 * Whether markup is valid AMP.
 *
 * Use the AMP validator from the AMP CDN.
 * And place the passed markup inside the <body> tag of a basic valid AMP page.
 *
 * @param string markup
 * @returns {boolean} $valid Whether the passed markup is valid AMP.
 */
isValidAMP = function( markup ) {
    var ampDocument = `<!doctype html><html ⚡><head><meta charset="utf-8"><link rel="canonical" href="./regular-html-version.html"><meta name="viewport" content="width=device-width,minimum-scale=1"><style amp-boilerplate>body{-webkit-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-moz-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-ms-animation:-amp-start 8s steps(1,end) 0s 1 normal both;animation:-amp-start 8s steps(1,end) 0s 1 normal both}@-webkit-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-moz-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-ms-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-o-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}</style><noscript><style amp-boilerplate>body{-webkit-animation:none;-moz-animation:none;-ms-animation:none;animation:none}</style></noscript><script async src="https://cdn.ampproject.org/v0.js"></script></head><body>${markup}</body></html>`,
        validated = amp.validator.validateString( ampDocument );
    return ( validated.hasOwnProperty( 'status' ) && 'PASS' === validated.status );
};

jQuery.getScript( 'https://cdn.ampproject.org/v0/validator.js', function() {
    /*
     * Workaround, not recommended in actual plugin.
     * Replaces the wp value that the script above overwrote.
     */
    wp = previousWp;
} );

/**
 * Overwrites the 'Custom HTML' block's edit() function.
 *
 * Outputs the original component, in OriginalBlockEdit.
 * If it's not valid AMP, it also outputs a Notice.
 */
htmlBlock.edit = function( props ) {
    var content = props.attributes.hasOwnProperty( 'content' ) ? props.attributes.content : '';
            result = [ el( OriginalBlockEdit, Object.assign( props, { key: 'original' } ) ) ];
    if ( ! isValidAMP( content ) ) {
        result.unshift( el(
            Notice,
            {
                key: 'notice',
                title: 'Notice',
                status: 'warning',
                content: 'This is not valid AMP',
            }
        ) );
    }
    return result;
};
wp.blocks.registerBlockType( blockType, htmlBlock );

  • Overwrites a block's edit() function. In this case, the Custom HTML block. But it could be others, like the paragraph.
  • Validates the content in edit() via a function available by loading this AMP CDN validator.js script.

Issues:

  • Like you mentioned, it'd be better to only run validation on saving the post. Here's the callback for saving a post. Maybe there's a way to use that to trigger messages to appear.
  • Duplicate sanitization. There's already back-end sanitization, as you mentioned.
  • The AMP CDN validator.js script overwrites the wp object with a function.

This is based on the 'Adding a caption' section in this post.

All 17 comments

Request To Work On This

Hi @ThierryA and @westonruter,
Could I please work on this? Though this is in Sprint 3, It looks like we're doing well on Sprint 2. But maybe you have other priorities in mind.

If so, I'll write a simple solution design here so we're on the same page.

_Update: I'm now working on a sub-issue of #839._

Thanks!

@kienstra Please do. I see this highly related to #842, but probably a different mechanism. For content, I suppose it would involve hooking into the whitelist sanitizer to find elements that are dropped due to being illegal? If this could identify the specific paragraph in the content that has invalid content in it, that would be ideal. Naturally this would straightforward to accomplish with Gutenberg, and perhaps Gutenberg should be the focus for this here then.

For #842 my thought was do do something at a higher level, when hooks are run, to run the validator on the entire page output rather than on just the content.

I don't see these two as being mutually exclusive. One is focused on the content authorship—telling users what they can't author—and the other is focused on the site administration—telling users what plugins are doing illegally.

In Development

Hi @westonruter,
Thanks for your details here. If it's alright, I'll post an update soon.

Let's chat about the approach to take here.

Possible Approach

Hi @westonruter,
Thanks for waiting here. One approach that I've prototyped is front-end validation in Gutenberg (screencast).

Unfortunately, this only applies on editing a block. Not on saving it, as you mentioned.

Here's a very rough prototype that you could paste in the console on a Gutenberg post.php page:

var isValidAMP,
    previousWp = wp,
    el = wp.element.createElement,
    Notice = wp.components.Notice,
    blockType = 'core/html',
    htmlBlock = wp.blocks.unregisterBlockType( blockType ),
    OriginalBlockEdit = htmlBlock.edit;

/**
 * Whether markup is valid AMP.
 *
 * Use the AMP validator from the AMP CDN.
 * And place the passed markup inside the <body> tag of a basic valid AMP page.
 *
 * @param string markup
 * @returns {boolean} $valid Whether the passed markup is valid AMP.
 */
isValidAMP = function( markup ) {
    var ampDocument = `<!doctype html><html ⚡><head><meta charset="utf-8"><link rel="canonical" href="./regular-html-version.html"><meta name="viewport" content="width=device-width,minimum-scale=1"><style amp-boilerplate>body{-webkit-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-moz-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-ms-animation:-amp-start 8s steps(1,end) 0s 1 normal both;animation:-amp-start 8s steps(1,end) 0s 1 normal both}@-webkit-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-moz-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-ms-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-o-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}</style><noscript><style amp-boilerplate>body{-webkit-animation:none;-moz-animation:none;-ms-animation:none;animation:none}</style></noscript><script async src="https://cdn.ampproject.org/v0.js"></script></head><body>${markup}</body></html>`,
        validated = amp.validator.validateString( ampDocument );
    return ( validated.hasOwnProperty( 'status' ) && 'PASS' === validated.status );
};

jQuery.getScript( 'https://cdn.ampproject.org/v0/validator.js', function() {
    /*
     * Workaround, not recommended in actual plugin.
     * Replaces the wp value that the script above overwrote.
     */
    wp = previousWp;
} );

/**
 * Overwrites the 'Custom HTML' block's edit() function.
 *
 * Outputs the original component, in OriginalBlockEdit.
 * If it's not valid AMP, it also outputs a Notice.
 */
htmlBlock.edit = function( props ) {
    var content = props.attributes.hasOwnProperty( 'content' ) ? props.attributes.content : '';
            result = [ el( OriginalBlockEdit, Object.assign( props, { key: 'original' } ) ) ];
    if ( ! isValidAMP( content ) ) {
        result.unshift( el(
            Notice,
            {
                key: 'notice',
                title: 'Notice',
                status: 'warning',
                content: 'This is not valid AMP',
            }
        ) );
    }
    return result;
};
wp.blocks.registerBlockType( blockType, htmlBlock );

  • Overwrites a block's edit() function. In this case, the Custom HTML block. But it could be others, like the paragraph.
  • Validates the content in edit() via a function available by loading this AMP CDN validator.js script.

Issues:

  • Like you mentioned, it'd be better to only run validation on saving the post. Here's the callback for saving a post. Maybe there's a way to use that to trigger messages to appear.
  • Duplicate sanitization. There's already back-end sanitization, as you mentioned.
  • The AMP CDN validator.js script overwrites the wp object with a function.

This is based on the 'Adding a caption' section in this post.

@kienstra that looks amazing! Having that client-side JS-based validation is perfect as it is realtime and it is block-based. You should absolutely keep going with this. I don't see this in any way conflicting with the validation of the frontend as a whole. When saving, we can send out a request to get the frontend HTML and then run it through the validator as you're doing at the block level. This will need a bit more

The AMP CDN validator.js script overwrites the wp object with a function.

Why would it do this? That seems like a big flaw in the validator script. Apparently it's not encapsulating the minified function names which is causing wp to be defined be accident. There is also a wo and many other variables that are polluting the global namespace. This should get fixed in the AMP project as a prereq.

@pbakaus @Gregable take a look at this issue with the validator...

Would you mind filing an issue on the AMP github tracker:
https://github.com/ampproject/amphtml/issues/new

AMP Validation In 'Classic' Editor

Here's a screencast of validating a post in the 'classic' editor. On saving the post, the JavaScript file makes an AJAX request to the permalink. Then, it validates that entire HTML document for AMP compatibility, using amp.validator.validateString().

But the back-end sanitizers work really well, and it's hard to enter content that won't be AMP-compatible by the time the page is rendered.

class-editor-validation

Next steps include applying similar front-page validation to Gutenberg.

Moving Into QA

Hi @westonruter,
If it's alright, I'm moving this into QA.

Steps To Test

  1. Create a new post on the test site
  2. Paste the following content in the editor:
<button onclick="alert('this')">This</button>
<button onclick="alert('this')">This</button>

<script></script>
  1. Click "Save Draft"
  2. Expected: a warning with
  3. A message that the post isn't valid
  4. The invalid elements
  5. The invalid attributes

validation-failure-message

  1. Remove the content, and paste:
https://www.youtube.com/watch?v=GGS-tKTXw4Y
Example content
  1. Click "Save Draft"
  2. Expected: there's no warning

Known Issue
The sanitizer seems to also report removal of <p>, when also removing another element.

The sanitizer seems to also report removal of <p>, when also removing another element.

This is due to the whitelist stanitizer removing empty parent elements when an invalid element is removed. I believe it is this logic here:

https://github.com/Automattic/amp-wp/blob/388f79f369997938224aa489fa86cf003a3eb9cd/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php#L1479-L1485

At the very least this can be changed from remove_invalid_child to just removeChild since it is not invalid. But I'm not 100% sure this routine is needed.

@westonruter @kienstra what is the reason for the preprocessor remove the parent if the parent is valid, is the intention not leave empty HTML Tag?

Removed Parent Element

Hi @ThierryA,
It looks like the intent is for the preprocessor to remove the parent if its only child was the removed element.

In that case, there's probably no reason to report its removal, as we don't know if it's invalid. @westonruter's suggestion works locally:

$parent = $parent->parentNode;
if ( $parent ) {
-         $this->remove_invalid_child( $node );
+         $parent->removeChild( $node );
}

There's also a slightly-relate edge case where this doesn't report the removal of a parent if the child is removed. For example:

<nonexistent><notatag></notatag></nonexistent>

This only reports:

Invalid elements: notatag

Making this change to replace_node_with_children() seems to address this:

        // Replace node with fragment.
        $node->parentNode->replaceChild( $fragment, $node );
+       if ( isset( $this->args['remove_invalid_callback'] ) ) {
+           call_user_func( $this->args['remove_invalid_callback'], $node );
+       }

@kienstra , not seeing any warning here
image

Dev Site

Hi @csossi,
Thanks for testing this. Could you please use the dev site? It has more up-to-date code. It looks to display the errors, as expected.

verified in QA

Request For Regression Testing

Hi @csossi,
Thanks for testing this. Could you please retest, to ensure there wasn't a regression?

Please create a post using the classic editor. Here are the test steps.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alexhaller picture alexhaller  Â·  5Comments

miina picture miina  Â·  4Comments

GitaStreet picture GitaStreet  Â·  4Comments

miina picture miina  Â·  5Comments

westonruter picture westonruter  Â·  4Comments