Amphtml: Allow CSS larger than 50k if 90% used

Created on 17 Aug 2016  Â·  77Comments  Â·  Source: ampproject/amphtml

We'd like to see the CSS limit increased from the current 50K to 100K.

Background - we're automatically converting sites, including all their look and feel to AMP. Many sites have 3-400k of CSS which we prune down to only the rules that are used on the page. This Typically yields between 30 and 80k of CSS.

This means that on the larger files we need to very aggressively optimize the CSS size by rewriting extensively which is a) computationally expensive, b) tends to break things in unexpected ways and c) means we're dropping /*! comments containing rights info which puts us out of license compliance. Increasing the limit to 100K would make all those issues go away for many publishers who are converting existing templates.

Here is an example page that's right up against the current limit https://cdn.relaymedia.com/amp/www.niemanlab.org/2016/08/that-friends-and-family-facebook-algorithm-change-doesnt-seem-to-be-hurting-traffic-to-news-sites/

When Possible Feature Request caching

Most helpful comment

I'm a fan of keeping it at 50k personally. The fact the spec is causing you to rewrite your CSS is a good sign, it's making you keep it lightweight.

All 77 comments

/to @dvoytenko @Gregable

@Gregable Do we have some quick stats of CSS size distribution in AMP pages overall?

@dvoytenko and @Gregable it's not just the current size, as we move beyond using AMP for phones and start using it for responsive pages like the one I linked above the CSS size jumps markedly.

Maybe we can have sections that aren't delivered to mobile. The cache can remove and vary on mobile vs desktop?

@dknecht The issue show up when you're converting existing content - for new designs we can hand build CSS and get it under the limit. However that doesn't scale to 10's of thousands of sites. As automatic amp converters like ours get better (see link above) we're starting to bring in the whole page rather than just fragments of it.

Apart from the initial cost of processing the CSS it adds very little to the page cost weight overall (particularly since AMP doesn't load any graphics in areas hidden by responsive layout). The page I linked above is 22.2k on the wire despite having 46k of CSS and an overall size of 102k uncompressed.

@dvoytenko, I don't have anything at hand. I suspect it wouldn't be all that interesting though since folks have been optimizing against the current rules.

@jpettitt, at a quick glance, that css example looks pretty bloated. I count 8 different rules targeting .simple-rightsidebar which is on exactly 1 tag that has 6 other classes on it.

I'm a fan of keeping it at 50k personally. The fact the spec is causing you to rewrite your CSS is a good sign, it's making you keep it lightweight.

@Gregable sadly that's the nature of converting existing templates. If you look closer a lot of the rules you cite are inside media queries. AMP forces bloat when converting existing templates - All the rules that start ._RM were created to replace style attributes on elements and all the rules that start #_RM were created to replace !important tags in the original CSS. This particular example bloats a lot becasue they were liberal with the !important and style attribute.

I'll send you a link on slack to an example that's over the limit (not a live customer so I can't post it here)

@pdufour rewriting CSS is fine if you have the resources but it doesn't scale to 10's of thousands of sites. If we stick at 50K makes automated conversion withe full look and feel far harder. As is we can optimize heavily - the example linked had 183584 bytes of css reduced to 47612 with 2225 rules cut down to 709.

I'm trying to avoid having to rip out all the media queries and treat the page as phone only. That would work for now, however if we want AMP to be usable as more general fast web page framework we need to allow enough space for a responsive design.

My understanding of the 50k selection as a CSS limit was that it was believed to be enough to fit nearly all cases, but not enough to just copy/paste the existing CSS implementation over, for the reasons that @pdufour described.

Lifting it could remove that performance improvement (negligible though it may be), but it could be replaced by other solutions. Maybe a cache optimization that removes any unused CSS rules based on selectors that don't apply, or a cache optimization that intelligently compresses CSS rules/selectors.

@jmadler we're already removing unused CSS and, if it still doesn't fit, renaming all the classes and ID's to short names. We still see CSS over 50k (keep in mind with some sites we're starting with as much as 800k of CSS). We're also stripping data url's. We've yet to find one we can't squeeze under 100k. AMP itself creates some of the issues by banning '*', !important and inline styles.

I think this goes to a bigger question of is AMP a mobile only and article only standard, in which case we could pre-render all the media queries and strip out anything wider than a phone. Or is it (or will it become) a more generalized acceleration framework. If the latter, particularity if we're going to do highly structured responsive pages like product pages, home pages etc then 50k become a real obstacle.

So where are we with this?

Let me pass it on to @cramforce. He is back next week and can provide a definitive answer.

It would be good to get a good sample of pages that run up to the limit. I expected this to be controversial, but haven't seen it come up anywhere else, yet.

@cramforce I'll send you a customer example over slack (not a live site yet so I can't post it here).

ping.

Here is my recommendation:

  • leave limit unchanged for now
  • not count whitespace
  • no longer count non-data URIs in limit

Not perfect but better than nothing. If we could do that and bump to 75K that would be perfect.

I've not heard any other parties to ask for an increase in size. This seems to come up when transcoding pages to AMP, but AMP is not designed to be a transcoding target for non-AMP pages.

I get that's it's not designed to be a transcoding target - however in the real world automatically translated pages mostly look like sh*t (eg wordpress plugin) and the vast majority of sites don't have developers on staff. Those that do have developers have a todo list 50 items long and only the top one or two items on the list will ever happen. If AMP is to achieve parity in UX with existing sites and spread beyond the minority of publishers who have both the technical staff and the free resources to do decent amp conversion auto transcoding is pretty much essential.

Publishers we talk to complain that AMP doesn't monetize well, this will kill AMP. Much of the reason it's not monetizing is they have abysmal AMP conversions that don't support the full ad map, lack navigation, and lack recirc elements.

I think this comes down to letting the perfect be the enemy of the good. Yes in a perfect world we'd have all AMP pages designed from scratch, avoiding all the bad practices. This isn't a perfect world and there is an installed base running to billions of pages on millions of sites. Allowing for a simple path from there to here will speed AMP adoption, improve monetization and therefor help the AMP ecosystem.

I'm having a hard time seeing why, what is basically a number pulled out of thin air, is so important that it's worth making people spend, cumulatively across all sites, million of $ on rewrites.

@jpettitt Does your current output only include selectors that actually match on the pages they apply to?

Yes we compare every CSS rule to the actual page content and drop all those that don't apply. If that doesn't get it under the limit we rename all the classes and id's. Finally if that doesn't do it we start pre-computing media queries and stripping out content for wider pages, this last step is what we'd like to avoid.

I'm most cases we're cutting the original CSS by ~80%. Sometimes as much at 90%.

I sympathize, but we cannot go higher without messing up our performance model.

Maybe we need easily statically separable CSS sections per device type.

@cramforce Much earlier in thread i proposed "Maybe we can have sections that aren't delivered to mobile. The cache can remove and vary on mobile vs desktop?"

Instead of having publisher setup vary correctly we can just have cache hide sections not needed for the requesting device

Yep, unfortunately such mechanisms rely on the cache to be fast.

Does it really mess up the model? On the wire we see ~75% compression with gzipped AMP pages and so an extra 50k of CSS is around 12.5k (worst case, probably less) on the wire with AMP pages having an overall weight with the JS, ads, images etc of 1 to 2 MB it's in the noise. As is we're actually slowing the page down as we squish the CSS by using URL shortening on the css url() values and moving data urls, particularly small icon font fragments, to external resources. Not counting non-data urls will help some.

@cramforce , I would push back on your suggestion of choosing bytes that don't count towards the limit. I think the simplicity of a byte limit makes it easier for folks to reason about and develop against. If there is a good reason for additional complexity in the rule, then we should go forward, but this doesn't feel like it falls into that category.

OK, I thought about this some more and I think I have a solution that I'd
like and that should make almost everyone happy:

  • leave the existing limit in place unchanged.
  • add an alternative rule (it is enough if one is met) that says: at least
    90% (actual number TBD) of CSS selectors match content in the document. On
    top of that limit total byte count of data URIs.

Of course, this would be a pretty big change on the validator side, so we
need to see how fast this can be implemented, but I think this is a good
mechanism.

  • it captures our primary intent: CSS hygiene.
  • it is easy to automatically enforce by generators (that can avoid any non
    matching selectors).

What tool can we use to automatically determine that "90% of CSS selectors match content in the document."

We've search for such a tool but have come up empty.

There are some candidates but they do not handle @media.

/jg

@media is irrelevant here. It isn't required that things actually match at
a specific screen size, just that they at least potentially match.

I think mod_pagespeed does such pruning, but it may make sense to create a
few special purpose node, PHP, etc modules.

On Sep 10, 2016 12:06 PM, "jay gray" [email protected] wrote:

What tool can we use to automatically determine that "90% of CSS selectors
match content in the document."

We've search for such a tool but have come up empty.

There are some candidates but they do not handle @media.

/jg

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ampproject/amphtml/issues/4555#issuecomment-246129939,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAFeT8prXLRma3M2rA0ve58cOWRc4O9gks5qov_TgaJpZM4Jl_s5
.

@cramforce , perhaps it's my anti-abuse background, but I can see someone doing something like the following:

<style amp-custom>
  .staticBlobOfCssRules {
    display: none;
  }
  ... Another MB of CSS ...
</style>

...

<div class=staticBlobOfCssRules>
  <!-- Set of elements to make sure that every CSS rule in our site-wide rules matches something -->
  <div class="divclassa divclassb divclassc ..."></div>
  <span class="spanclassa spanclassb spanclassc ..."></div>
</div>

If you develop your site-wide CSS at the same time as this static blob of useless divs, you get to make your CSS as complex as you want and always get around the AMP rules.

@Gregable we're not trying to make this protected against that type of abuse, just make the path of least resistance lead to good results. People can make large docs in the HTML part anyway, so protecting against that doesn't really help.

In particular, the capability you need to generate the contents of staticBlobOfCssRules are roughly equivalent with what you need to figure out which rules to delete.

Doing the "selectors match content doc" is quite expensive in terms of CPU (we do this). There are a bunch of optimizations but even serious pre-processing of source CSS file and cache ed results it's ~35% of our page generation CPU time. I like the idea but it's going to be expensive to implement, particularly on a page by page basis.

I am not sure lifting the 50k limit to 100k would make things much better.
I understand that the goal of using <style amp-custom> is to reduce latency but at the same time it means that there is no efficient way to cache the styles sheets across multiple documents. If you load 10 articles from the same publishing entity, you load 10 times 50k worth of CSS when you could really have loaded that just once and cached it aggressively.

So rather than changing the 50k limit I would really be in favor of a mechanism to actually remove the CSS from _inside_ the documents. If we want to avoid delaying the CSS request, maybe we could use an HTTP link header to point to it so that the browsers can't start to fetch it _before_ they actually have received and loaded the whole DOM?

Other models are certainly possible. E.g. a 30KB site wide CSS file + allowing 20KB inline or some variation thereof.

The real goal is to motivate CSS hygiene. So far, AMP is achieving this goal. No site can make their CSS "append only", which is in stark contrast to most CSS on the internet that rarely gets a rule removed. Want to render that "Christmas Special 2013"? Cool, CSS is still there :)

So, I'm super happy to explore all kinds of proposals that achieve the same goal.

AMP caches are, of course, free to un-inline the CSS, give it a content-addressed URL and then HTTP2 push it to clients instead of doing inlining.

The hygiene part I get, we have sites we convert that have 5000+ css rules where only 400 are in use on an AMP page. We've seen 800k+ of CSS on a mobile page.

The original motivation for this request was being able to keep the 500 rules that are in use on a complex page so we don't have to get humans involved in rewriting or start cutting functionality. Right now every site we work with fits under 50k thanks to some highly aggressive, and CPU expensive, optimization (the biggest comes in at 48k). AMP itself actually bloats the CSS by not allowing !important or element level styles. We end up applying workarounds in our compression that introduce more rules.

Regarding the on the wire performance cost of inline CSS vs external style sheets. Even with multiplexing in SPDY or HTTP/2 there is at least one extra round trip to request and fetch a style sheet. Assuming 4G (20ms latency, 4Mbit downstream, cumulatively ~60ms TTFB) you could transmit ~30KB compressed equal to 250K uncompressed of inline CSS in the TTFB of your external CSS file.

[CSS compresses really well - 48,693 bytes of CSS from our worst case site gzips to 10,188 bytes]

  1. I don't think going to an external style sheet will actually be a win for most users since the vast majority of AMP hits are one off's.
  2. Moving from 50K of CSS to 100K of CSS is going to add 10 to 15KB on the wire if it's all used (20 to 30ms on 4G).
  3. The don't count the white space idea mentioned up thread only amounts to ~4% on the 48k test file I'm using.

My take, increasing the CSS limit a little, say 75K will, with proper optimization, allow almost any site to be converted to AMP. It will do that at minimal performance cost while still forcing the removal of dead rules which seems to have been the original goal.

I think we are moving forward with https://github.com/ampproject/amphtml/issues/9625

Does this address the primary concerns?

Hello, i'm a developing a site that is 100% AMP (much like your website), this includes home, product pages, responsive and integration with a CMS.

I also find the 50K limit to be a problem, and this is no site conversion, it's just a big site developed specifically for AMP. Currently we're selectivly adding css based on the current templates and specific modules within templates.

If you plan on supporting AMP as a framework for all devices i think the limit is not going to work for all sites since you can have larger pages and responsive css.

The solution proposed by @cramforce about checking rule usage seems the best one to me, since it allows the creation of larger pages for desktop. Raising the limit to 100K could work just as a temporary solution.

When you select the % of rules that need to be in used keep in mind the use of CMS and developer friendlyness. For example: in the site i'm developing i see that some 'col-md-x' classes are not being used and also some styles that apply to the articles content (like p, b, bloquote, figure, etc). It would not be very practical to remove those.

@gregable if we could also add