Amphtml: Validator: add fragment validation API

Created on 14 Dec 2015  Â·  18Comments  Â·  Source: ampproject/amphtml

This API will eventually replace the current version of sanitizer.js. Its intent is to enable validation for the outputs produced by templates. The new API will look like this:

/**
 * @param {string} html
 * @return {!Promise<string>}
 */
amp.validator.validateFragment = function(html) {}

Some specifics:

  1. The preamble/structure validation checks will be turned off such as overall HTML structure, HEAD, etc.
  2. The <script custom-element> and <script custom-template> checks will be off.
  3. We can assume that the fragment will always have a single root element.
  4. The validator has a promisable protocol for easy execution via web workers.
  5. Ideally we'd produce a separate binary for this API to enable DCE given that the reduced download size is important here.
  6. Unlike the main validator, this validator should simply sanitize content - strip undesired elements/attributes rather than simply throw exceptions.
When Possible Bug caching

Most helpful comment

There's a TODO where SVG tags are blacklisted in sanitizer.js that points to this bug:
https://github.com/ampproject/amphtml/blob/70c23c642e24b673e7d852572327aac0cd9ec3f3/src/sanitizer.js#L55-L57

There are other issues (like #8214) which also point to this issue in an attempt get SVG working.

However, it's not clear to me why SVG was explicitly blocked in the first place, and if it wouldn't be possible to unblacklist SVG as a stopgap before this issue hits its second birthday.

All 18 comments

/cc @cramforce

Do you have a time frame for this? The current template validation logic is fragile from a security perspective it'd be advantageous to replace it soon.

We were not planning to get to it for quite some time. I think that the conclusion was that current sanitization logic is significantly stricter than our AMP validator itself.

Significantly stricter in same cases, and less strict that the AMP validator in others. I still think that AMP should reuse the AMP validator for the template outputs (or statically ensure that the output is valid AMP, but that's much trickier to get right).

What's the latest update here? Thanks.

This is, for now, connected to the a4a validation. The question is whether we can do this kind of validation cheaply on the client side.

@Gregable @powdercloud Did we get to run webworker experiments and such on this?

Yes I am doing some tests with webworker and I think there is also still some room for more optimizations. I've made this issue https://github.com/ampproject/amphtml/issues/7420 to keep track of my progress.

This could also be useful for reducing duplicated validation logic in amp-bind.

Any updates here?

@powdercloud Would like to hear about webworker experimentation.

There's a TODO where SVG tags are blacklisted in sanitizer.js that points to this bug:
https://github.com/ampproject/amphtml/blob/70c23c642e24b673e7d852572327aac0cd9ec3f3/src/sanitizer.js#L55-L57

There are other issues (like #8214) which also point to this issue in an attempt get SVG working.

However, it's not clear to me why SVG was explicitly blocked in the first place, and if it wouldn't be possible to unblacklist SVG as a stopgap before this issue hits its second birthday.

Unfortunately no. SVG has many ways to execute JS and we will have to
filter out all of them.

James Shannon notifications@github.com ezt írta (időpont: 2017. okt. 29.,
V, 1:56):

There's a TODO where SVG tags are blacklisted in sanitizer.js that points
to this bug:

https://github.com/ampproject/amphtml/blob/70c23c642e24b673e7d852572327aac0cd9ec3f3/src/sanitizer.js#L55-L57

There are other issues (like #8214
https://github.com/ampproject/amphtml/issues/8214) which also point to
this issue in an attempt get SVG working.

However, it's not clear to me why SVG was explicitly blocked in the first
place, and if it wouldn't be possible to unblacklist SVG as a stopgap
before this issue hits its second birthday.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/ampproject/amphtml/issues/1156#issuecomment-340227676,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA2l1QMgBhMz45ASngrtbiOESsmXU50vks5sw789gaJpZM4G1MN6
.

/cc @alabiaga

@choumx Hey there! My team have hit this issue too just today while trying to build a dynamic nav using amp-mustache and it not liking SVGs. Any idea when this might get addressed please? If it's not too complicated I can see if I can rope some of my team into working on a PR.

@sanjsanj I just looked through this issue and I believe we can remove SVG from the blacklist without issue...as I quickly prototyped a local PR with the changes allowing this. The html sanitizer will already remove the necessary markup to disallow scripting the SVG, which I am assuming is not something you are trying to do.

@molnarg what are the many ways of scripting SVG? is it not only via the script tag?

I spoke to @molnarg offline about the security implications of allowing SVG tags and he pointed out DOMPurify as the one of the libraries he knows that handles the sanitizing of SVGs well. I'll look into it this week and scope out the work required to add these rules into the validator. Though there is still the issue of needing to sanitize this data if in the context of mustache templates, as that sanitizing happens during runtime and the validator logic isn't shared yet with the validation used in the extensions, as pointed out in some of the earlier comments..which is part of an effort that I am leading. I will create a separate issue to track the support of SVGs.

Poking at this a bit. Here are the current minified, gzipped sizes of the various JS files at hand:

| File | Bytes |
| -- | -- |
| purify.min.js | 5707 |
| amp-mustache-0.1.js | 15070 |
| validator.js | 73584 |

For SVG support in the short-term, it seems like using DOMPurify may be the most promising in terms of minimizing (1) eng work needed and (2) impact to initial render latency for amp-list (which is already an issue, see #15272). I'll try this out this week and see how far I can get.

Note this doesn't take into account JS parse/execution time. Also, @alabiaga also did some investigation into using the "light" JS Validator and making it leaner (e.g. without extension rules) and found we could reduce size by up to 30% compared to validator.js. Still there's a long way to go and it's possible we may ultimately need to consider a different way to share validator rules for performance-sensitive use cases like amp-list.

/cc @dvoytenko

The AMP Validator has more recently taken a divergent path away from trying to be performant and small enough to be used live in the AMP Runtime. I don't think this is still a good design given the performance goals of AMP. If the use case is still critical, we should consider alternative solutions.

Was this page helpful?
0 / 5 - 0 ratings