Amphtml: I2I: Formalize retention/deletion policy for old releases

Created on 6 Nov 2019  路  13Comments  路  Source: ampproject/amphtml

As discussed as a side note during the #25205 design review (#24593), we should formalize and implement a retention/deletion policy for old releases. Among the discussed points was the idea of also deleting versions with known P0 issues after they have been replaced.

_Update_
Summarizing the discussions on this thread and offline, the proposal is as following:

  • Any AMP version older than 6 months will be considered deprecated
  • We will automatically delete those deprecated versions from the AMP CDN servers
  • For requests to explicit RTV-locked for deprecated versions, serve active LTS release instead (e.g., requests to https://cdn.ampproject.org/rtv/011801010000000/v0.js will be served the same file as https://cdn.ampproject.org/lts/v0.js)
INTENT TO IMPLEMENT When Possible caching infra

Most helpful comment

We discussed this in a @ampproject/wg-infra meeting, and we think we should do the following:

  1. Every 3 months delete all the RTV files for versions older than 6 months
  2. Delete any RTV files for versions that have a P0 in them (within reason of having an on-duty engineer bisect older versions if the P0 was long standing) once a cherry-pick is in place and has been available for a week

@ampproject/wg-infra is this agreeable? (if you have no additional comments, please +1 the thumbs-up on this comment so I'll know you've read this)

All 13 comments

We discussed this in a @ampproject/wg-infra meeting, and we think we should do the following:

  1. Every 3 months delete all the RTV files for versions older than 6 months
  2. Delete any RTV files for versions that have a P0 in them (within reason of having an on-duty engineer bisect older versions if the P0 was long standing) once a cherry-pick is in place and has been available for a week

@ampproject/wg-infra is this agreeable? (if you have no additional comments, please +1 the thumbs-up on this comment so I'll know you've read this)

I have some questions / comments below, but no answers :)

One aspect we haven't discussed is what this will do to websites that use versioned AMP. Today, they aren't considered valid, but the pages still work. When we start deleting older RTVs, websites that use them will stop working if they fetch the versioned runtime from the CDN URL.

  1. Every 3 months delete all the RTV files for versions older than 6 months

If we delete every 3 months, the age of the oldest available RTV will fluctuate between 6 months and 9 months. What if we schedule an automatic clean up every month, or every week? Is this a) feasible and b) desirable?

Curious: What exactly is involved in a clean up? Is it simply a changelist that deletes files from an RTV directory? If so, is there a limit on the number of files that can be deleted at a time?

I have some questions / comments below, but no answers :)

One aspect we haven't discussed is what this will do to websites that use versioned AMP. Today, they aren't considered valid, but the pages still work. When we start deleting older RTVs, websites that use them will stop working if they fetch the versioned runtime from the CDN URL.

@ampproject/wg-caching do you know if we can check whether some websites are version-locked to old RTVs?

  1. Every 3 months delete all the RTV files for versions older than 6 months

If we delete every 3 months, the age of the oldest available RTV will fluctuate between 6 months and 9 months. What if we schedule an automatic clean up every month, or every week? Is this a) feasible and b) desirable?

I actually like that. This seems to be a pretty safe thing to do automatically, if coupled with a validation check

Curious: What exactly is involved in a clean up? Is it simply a changelist that deletes files from an RTV directory?

Yes, delete the entire RTV directory and be done with it

鈥f so, is there a limit on the number of files that can be deleted at a time?

Might require some special handling when we get started, but once this becomes part of the automation we wouldn't have a problem

I could see baking this into the release automation process, ie "We're adding release X, and clearing out all releases < X - 90 days" (though obv would have to be in a separate set of CLs)

Ping @ampproject/wg-caching ^^^

@ampproject/wg-caching do you know if we can check whether some websites are version-locked to old RTVs?

Yes, there are some. I did a quick query over HTTP Archive which is a small sample of the web, and got these URLs. I suspect there are many more in practice, given the size of the corpus.

It's a union of these two queries, which are pretty simplistic since I don't know how to do regexes in BigQuery:

SELECT url FROM `httparchive.pages.2020_01_01_desktop` where payload like '%https://cdn.ampproject.org/rtv/%' LIMIT 1000
SELECT url FROM `httparchive.pages.2020_01_01_mobile` where payload like '%https://cdn.ampproject.org/rtv/%' LIMIT 1000

Straw proposal compromise: redirect "deleted" RTVs to the newest LTS.

Ideally, AMP components are interface-stable along their version number, so this should break no pages. Also, ideally, AMP pages are constantly being tested against new RTVs as they're rewritten on the cache. However:

  1. pobody's nerfect, and maybe components break subtle undocumented interfaces (like CSS)
  2. maybe some of these pages are using optimized AMP but not the cache (e.g. by removing the <html amp> attribute)
  3. maybe some of them have locked to an old RTV specifically _because_ of a compatibility issue

Hopefully that's enough "maybe"s to make the risk low? Might be worth further analysis, especially since I believe (2) is likely to increase over time.

Apologies for missing the earlier discussion, what's the motivation for "We will automatically delete those deprecated versions from the AMP CDN servers"?

@kristoferbaxter One reason: we have a security/privacy issue with a release and either rollback or rollforward to a new release, we should not continue serving that release.

Here are the design review notes.

Security/Privacy makes sense for the AMP Cache to restrict usage, but does this also apply to origins?

Removing versus informing?

Why should a security/privacy issue only be considered for the AMP Cache? Some could also potentially affect origin and end users.

Another example would be an extension has something that makes the entire page unusable in a particular release.

Sounds like the design review should cover these points! Likely these make sense, but automatic retirement is a significant jump from security/privacy.

This was fixed ~2 months ago :)

Was this page helpful?
0 / 5 - 0 ratings