When PHP is generated with random content (e.g. https://github.com/ampproject/amp-wp/issues/1239), the post-processor cache gets disabled as of https://github.com/ampproject/amp-wp/pull/1325 to prevent polluting the object cache with garbage cache values. This only matters when a persistent object cache is present, since that is when the post-processor cache is used.
When the style sanitizer comes across a stylesheet, it parses it and stores the cached result in a transient. This is done regardless of whether there is a persistent object cache present, since parsing CSS is expensive. If, however, the CSS has variable content in any way then this can result in a site with out a persistent object cache to have its wp_options table balloon to hundreds of thousands of garbage transients, as seen on this support topic.
I believe such variable stylesheets to be an uncommon occurrence, but when it does happen, we should warn the user and stop storing the parsed CSS in transients. There could also be a parameter to the style sanitizer to disable the transient cache, as well as provide a threshold for what constitutes an unacceptable rate of cache misses.
I suggest solving this like you avoid filling disks with log information: by using rotation.
We should define a limited number of slots to store cached CSS results in and use the content hash to check whether we need to use a new slot or not. So we could for example cache the last 10 versions of the CSS stylesheet.
This would avoid the worst case scenario where you just fill the DB until the server is taken down. We could also track the velocity of how fast we go through these to show a warning to the user if caching does not make sense.
That sounds great.
Note that each individual stylesheet (link, style element, style attribute) is stored in a separate transient. So the rotation should be expanded to include a large enough slots for all the variable number of stylesheets on a site. There could be quite a few. Perhaps default to 1000? Maybe it should be configurable?
I'll work on this in two separate PRs:
With #3300, the main problem should be resolved: we won't be able to completely fill up the database anymore.
To now detect high velocity cycling so that we can show a warning to the user and/or skip caching, I can think of two separate approaches:
@westonruter Thoughts?
Alternatively, we could just say that this problem is exotic anyway, and leave it at the state we are at with #3300, where the worst case is avoided (taking the server down).
To now detect high velocity cycling so that we can show a warning to the user
Couldn't the warning simply be shown when the cache pool index rotates back to 0? https://github.com/ampproject/amp-wp/pull/3300/files#r326742376
Couldn't the warning simply be shown when the cache pool index rotates back to 0? ampproject/amp-wp/pull/3300/files#r326742376
As stated in https://github.com/ampproject/amp-wp/pull/3300#discussion_r326793911 :
"No, this is only expressing the "distance travelled". Velocity is distance over time, we don't know anything about the time here. If we hit the end of the pool several times a day, we have a very high velocity. If we only hit it every 3 months, it might just be a site with lots of edits."
Someone I know has also seen a high cache miss rate with AMP_Image_Dimension_Extractor because they used some image URLs with randomized query args for cache busting. Perhaps the extractor could get smarter there. At the minimum, it would be nice if one could pass a list of query args to AMP_Img_Sanitizer that should be stripped from image URLs before hashing them in \AMP_Image_Dimension_Extractor::determine_which_images_to_fetch.
@swissspidy Yes, that's important to bring up. Can you share examples of the image URL with randomized query args?
Nevertheless, I don't stripping the query args from the hashed cache key is going to work. Consider two URLs:
https://i0.wp.com/example.com/wp-content/uploads/2018/12/hero.jpg?resize=700%2C700&ssl=1https://i0.wp.com/example.com/wp-content/uploads/2018/12/hero.jpg?resize=350%2C350&ssl=1If the query args were stripped, then only the correct dimensions will be stored for one of the images, with the other then getting larger or smaller dimensions depending on which one was seen first.
If we stripped query args, we'd need to be careful about it, such as selectively only certain named removing args (e.g. rand) or if the query string itself looks like a random number (e.g. ?2580138906).
A complication here is that such images are probably not even supposed to have dimensions in the first place. It may be an indication of a tracker. Having a random query string would imply that the image should not even be getting turned into an amp-img but rather an amp-pixel, but it's impossible to know for sure. (That being said, it is more reliable for a tracker image to have the actual random query arg added via JS, so this may not be relevant.) Nevertheless, if it is intended that the image be cache-busted, then perhaps this is an indication that the underlying image has actually changed and with that the underlying dimensions, and so actually the AMP plugin _should_ be obligated to obtain the dimensions.
Maybe we need to treat such dimensionless images which have random-looking query args as being validation errors and go ahead and remove them from the DOM entirely, forcing the user to fix the underlying problem. It's either this, or we'll need to supply fallback dimensions.
Based on our last discussion, the shorter-term solution to the problem appears to be to introduce a new filter which is used to allow a site to disable caching parsed CSS in transients. I've drafted a PR to do this in #4177. This may be better combined with the cache pool work in #3300 somehow.
Something else that comes to mind is that we could detect when there seems to be an extraordinary number of transients for parsed CSS and then automatically disable the transient storage. Consider a daily WP Cron that queries for for SELECT COUNT(*) FROM $wpdb->options WHERE option_name LIKE '_transient_amp-parsed-stylesheet-v24%' (the version should be taken into account, or else transients with old versions should be proactively deleted during the WP Cron event). If that number is over some threshold, transient caching could be automatically turned off. This WP Cron action would store the flag for disabling of the transient caching in an option.
An even more sophisticated approach would be to not just make a determination based on a single reading, but to capture the counts daily on an ongoing basis and then calculate the average rate of increase. That may or may not be helpful vs a single total count.
Extending this to support external object cache users would require incrementing a counter every time a cache MISS happens, and then the counter could be checked during the daily cron to see if it has gone too high, and then reset it back to zero. A downside to that is it means an extra cache-set for every cached stylesheet. See also @schlessera's comment in https://github.com/ampproject/amp-wp/issues/2092#issuecomment-533004679.
My only concern with all of this is we want to avoid the current problems with the post-processor response caching, which gets disabled way to easily (see #4178 for proposal to remove it).
_All of this also applies to the storage of image dimensions._
A Site Health warning when transient caching is disabled (and external object cache is not present). Not having a parsed CSS cache greatly slows down page generation.