Could https://cdn.ampproject.org/v0/validator.js be used to provide programmatic validation?
If all the validation logic is there, it would make things (#937, #1967), a lot easier if we could call an endpoint like
https://cdn.ampproject.org/validate?url=...&callback=...
Automating the validation of lists of URLs would become a lot easier from anywhere that can make an HTTP request, such as Google Apps scripts that parse a spreadsheet of AMP URLs.
CC @pietergreyling, @jmadler, @adewale
@dandv please sketch out the API you'd like to see. What would be the output of this endpoint?
+1 for wanting an API.
I'd like to see it accept both the URL of an AMP page, or the AMP html in a post body.
GET https://cdn.ampproject.org/v/?url= .... &callback=jsonpcallback
POST https://cdn.ampproject.org/v/ /*data in post body */
Result - something like
{
"version": 0, /* The amp version detected/validated */
"source": "https://some.amp.site/page.html", /* or "POST" if post data */
"canonical": "http:// ...", /* Canonical url from amp page */
"valid": false, /* or true */
"extensions": [ /* array of amp extension detected */ ], /* Nice to have */
"errors": [
{
"position": "10:2",
"error": "The text (CDATA) inside tag 'author stylesheet' matches 'CSS !important', which is disallowed.",
"help" : "https://github.com/ampproject/amphtml/blob/master/spec/amp-html-format.md#stylesheets"
},
{
"position": "9:0",
"warning": "AMP deprecated <style> boilerplate text detected",
"help": "http:// ... "
}
]
}
Nice to have: jsonp support, optional html output.
I think this might be a good idea, but nothing like that exists at the moment.
For now, it is possible to run the javascript validator in other contexts. I left a comment here:
https://github.com/ampproject/amphtml/issues/999#issuecomment-171787638 with an example.
You can also build it from source. There are instructions here:
https://github.com/ampproject/amphtml/tree/master/validator
Following up, @dandv @jpettitt . What is the use case in mind here? It seems like using either the nodejs library (https://www.npmjs.com/package/amphtml-validator) or using validator.js directly are going to be faster in all cases than shipping an HTML document over the wire to cdn.ampproject.org and waiting for a response.
@Gregable: at the time, I was working on a piece of Google Apps script attached to a trix with partner AMP URLs to test. In that environment, you can't import Node modules, so fetching the validation result over HTTP is the only realistic option.
The HTML document doesn't have to be shipped over; the URLs are publicly accessible, so the call would submit the URL for validation and receive a JSON result similar to what @jpettitt described.
CloudFlare has added an endpoint to our beta cache with a json output. We are happy to tweak output based of feedback.
curl https://cdn.edgeamp.org/q/' -X POST —data-binary @invalid_amp.html -H 'Content-Type: text/html; charset=UTF-8'
curl https://cdn.edgeamp.org/q/cfedgeorigin.com/amp/invalid_amp.html
curl https://cdn.edgeamp.org/q/cfedgeorigin.com/amp/invalid_amp.html 2>/dev/null | python -mjson.tool
{
"errors": [
{
"code": "MANDATORY_TAG_MISSING",
"col": 0,
"error": "The mandatory tag 'noscript enclosure for boilerplate' is missing or incorrect.",
"help": "https://github.com/ampproject/amphtml/blob/master/spec/amp-boilerplate.md",
"line": 12
},
{
"code": "MANDATORY_TAG_MISSING",
"col": 0,
"error": "The mandatory tag 'head > style[amp-boilerplate]' is missing or incorrect.",
"help": "https://github.com/ampproject/amphtml/blob/master/spec/amp-boilerplate.md",
"line": 12
},
{
"code": "MANDATORY_TAG_MISSING",
"col": 0,
"error": "The mandatory tag 'noscript > style[amp-boilerplate]' is missing or incorrect.",
"help": "https://github.com/ampproject/amphtml/blob/master/spec/amp-boilerplate.md",
"line": 12
}
],
"source": "http://cfedgeorigin.com/amp/invalid_amp.html",
"valid": false,
"version": "1471559432224"
}
Looking into this. A few questions for those interested in this thread:
One of the main issues is avoiding abuse (ie: DOS) via Google's fetching. We could solve this by:
1) Requiring that the API request provide the contents, rather than the URL, and we perform validation on the provided string. This wouldn't work for @dandv's case. Would this still be useful?
2) Only serving the cached validation result, which would be similar to just fetching the document via cdn.ampproject.org. This wouldn't support testing changes to a site, only seeing a snapshot in time. Would this still be useful?
- This would imply caching validation results in FAIL cases as well as PASS regardless of whether the document was actually rejected (not cached) as in the FAIL case - correct?
- What would we store? The full validator output as at the time of validation?
This is true, I don't think the cache currently stores FAIL cases, I think it retries on every request.
- Would it make sense to also tag this with the validator spec file revision?
It wouldn't hurt, but I don't think get the feeling that most developers are using this revision number for much. It's really used only for development of the validator itself, we are quite careful about backwards incompatible changes these days.
One of the main issues is avoiding abuse (ie: DOS) via Google's fetching.
Could DOS attacks via the "validate AMP URL" API be avoided via other anti-DOS mechanisms?
I.e. if we employ rate limiting, we can reasonably limit requests to a pretty low rate, given that:
If we implement domain whitelisting, we could have a verification mechanism similar to verifying site ownership in Google Webmaster.
I got here as I was looking for a process that could trigger a validation check when a git push occurs. For now I'll probably implement the validator locally, but having an official URL, even if it just returned pass/fail, would be great!
@brianlayman
This could be of interest:
https://github.com/ampproject/ampbench
Walkthrough article: Debug AMP pages with AMPBench, an open source app from the AMP Project.
Look for this section in the walkthrough article: "Experimental AMPBench JSON APIs"
The code is here:
https://github.com/ampproject/ampbench/blob/master/ampbench_routes.js#L393
AMPBench also supports this kind of thing:
$ curl https://ampbench.appspot.com/raw?url=https://ampbyexample.com
PASS
@pietergreyling That looks to be perfect! Thank you for the supporting detail too. Time to start coding...
@pietergreyling — this looks great! Are there plans to move the Experimental AMPBench JSON API to a stable status? We're seeing folks ask about this in the context of integrating into their platforms for faster in-context validation.
@ericlindley-g At this point the recommended way of integrating AMPBench into a custom validation workflow using the JSON API is to run an instance of AMPBench on a dedicated hosting platform. The latter can be public or on an in-house server behind a firewall restricting access to internal clients.
Instructions to do this are here:
https://github.com/ampproject/ampbench#getting-the-code-and-running-it
https://github.com/ampproject/ampbench#deploying-ampbench-to-the-cloud
This also has the advantages of allowing the source code of the API to be tuned according to custom needs and the creation of Pull Requests to the AMPBench repository for any improvements based on such changes.
@pietergreyling Sounds good—thanks!
I think we're pretty happy with the options available now, so I'm going to go ahead and close this out. Feel free to reopen if I'm wrong.
I'd still really appreciate this. There is currently no way to know if a page is valid using standard JS from the console log.
You have to use the NPM package / CLI or be a human being and use the developer tool and/or read the console to find out if a page is valid. I think a CURL is okay but markup or a JS object on window would be much more useful.
Could there be a class selector on the HTML (html.amp-valid
vs html.amp-invalid
) and/or an object accessible via window which holds the validation state when developers append #development=1
(ie window.AMP.validator.status
)?
This would be really useful for E2E testing via things like Robot Framework and Ghost Inspector. There's no documented JS object we can reliably count on to store the validation status and JS console doesn't have access to previous logs in order to check for AMP Validation Successful
@Gregable are there any new options for what I've described since this issue was opened? Looking for a reliable way to check validation status using just vanilla JS after appending #development=1
There's a window.amp.validator
object but it doesn't actually hold the current state it seems. validateUrlAndLog
doesn't return anything because it's async and doesn't return the promise. validateString
isn't useful because the document has already been modified by AMP and the AMP validator won't take in HTML that's clearly already had AMP ran over it.
Most helpful comment
+1 for wanting an API.
I'd like to see it accept both the URL of an AMP page, or the AMP html in a post body.
Result - something like
Nice to have: jsonp support, optional html output.