Amphtml: files.txt in each AMP release is incomplete

Created on 12 Dec 2019  路  30Comments  路  Source: ampproject/amphtml

What's the issue?

Each AMP release contains a listing of all files that are included in the release, but it is incomplete. Some files exist in the CDN release folder that are not generated by the build process: gulp dist.

How do we reproduce the issue?

  1. Build AMP release 1910161528000 locally: git checkout 1910161528000 and gulp dist
  2. Verify all files generated by the build process are included in https://cdn.ampproject.org/rtv/011910161528000/files.txt
  3. Notice https://cdn.ampproject.org/rtv/011910161528000/v0/validator.js exists but v0/validator.js is not one of the files generated by gulp dist (and hence not in files.txt)

Additional context

Having a complete list of files included in each release would be helpful for self-hosting the amp-runtime, see issue #25873 .

Slack chat:

_mattwomple_ thanks erwinm. to your knowledge, is validator.js the only additional file available from the CDN beyond what gulp dist outputs?
_erwinm_ That's accessible through the cdn? I believe there are a few other artifacts. One i can think of for example is viewer integration scripts for the Android google app

Soon Bug infra

All 30 comments

@erwinmombay @ampproject/wg-caching

The validator is not part of the Runtime release and has it's own release cadence, typically once per week, usually Tuesdays.

@honeybadgerdontcare

https://github.com/ampproject/amphtml/blob/a561d0e8be10c8996d9f3db6920f69ffffafd5d8/src/validator-integration.js#L45

Should files.txt be updated or should cdn.ampproject.org be hardcoded?

I'm not familiar enough with files.txt or this specific code to answer that so I'll lean on @ampproject/wg-infra to chime in.

Is there a necessity to hard code ${urls.cdn} to cdn.ampproject.org? Isn't it preferred that it is something that could be set individually per cache?

@honeybadgerdontcare - When you append #development=amp to a URL and view an AMP page, the validator loads from ${urls.cdn}/v0/validator.js. If self-hosted amp runtimes were released today and validator.js is not included the publisher's hosted runtime, then #development=amp would fail because the script is not available. You can actually see this in my testing playground right now: https://amp-android.cmphys.com/wm-optimized/#development=amp . I see two options:

  1. Always point to cdn.ampproject.org for the validator
  2. Include validator.js in self-hosted runtimes

I created files.txt a few months ago mostly to make it easy to download an entire runtime without having to compile it locally: https://github.com/ampproject/amphtml/pull/24331 .

The validator can be built locally following the instructions in the README.md. Is it problematic to use that if someone wants to self host their own validator?

I agree that self-hosting validator.js is probably fine.

While validator.js has its own release schedule, ultimately the "current release" becomes tied to a specific RTV simply by inclusion in the files available under cdn.ampproject.org/rtv/{rtv}. For example, https://cdn.ampproject.org/rtv/011908162035200/v0/validator.js is the validator release that was available at the time 1908162035200 was released. Given this behavior, and that files.txt is a snapshot of all files available available in an AMP release, it seems reasonable to me that files.txt be updated to include v0/validator.js.

Interesting that we're releasing a validator within the RTV directory. I'm not sure that is something we should be doing.

@rsimha our dev tooling uses the validator served at cdn.ampproject.org/v0/validator.js which always has the latest release from wg-caching. Is there a reason to also be releasing it with an RTV? Where does that come from?

@danielrozenberg Does the automated release process copy the latest validator binary at the time of release to each RTV directory? Or is it a redirect done at serving time?

@danielrozenberg Does the automated release process copy the latest validator binary at the time of release to each RTV directory? Or is it a redirect done at serving time?

To answer my own question, it is the former. At the time of generating a new release, the automation copies the current latest validator binary to each RTV directory.

Maybe @erwinmombay can comment on why this was put in place to start with?

I don't recall why we started doing so but i believe it was only precautionary as we were copying everything from the bare unversioned directories to the versioned (rtv) directories.

from what i can see the runtime doesn't ever request an rtv path for the validator

After discussing with wg:caching, we don't see any issue with leaving old versions of validator.js in RTV directories since it's never used. However, we also believe we shouldn't be putting them there. We'll leave it up to wg:infra if they want to change that practice.

If there's no reason for validator.js to be copied to RTV directories, we can certainly remove the step that copies them from the release pipeline, so that future releases do not contain the copy.

@honeybadgerdontcare @erwinmombay To confirm once more, does this sound good to you?

It sounds good to me.

As of about a week ago I removed the part of the release script that copies validator.js into the RTV directory. Marking this as resolved.

As of about a week ago I removed the part of the release script that copies validator.js into the RTV directory. Marking this as resolved.

Could you check the text in the original bug report below the text "Slack chat" to verify this really should be closed?

@mattwomple My eyes passed right over that gray text! Re-opening until I can dig in, thank you

The following files are present in the RTV directory but absent from files.txt:

  • amp_preconnect_polyfill_404_or_other_error_expected._Do_not_worry_about_it
  • sw/service-worker-remover.js
  • sw/amp-sw.js
  • sw/optional-modules.js
  • ping.html
  • v0/experiments.js
  • v0/amp-viewer-integration-a2.js
  • v0/experiments.js.map
  • v0/amp-viewer-integration-gmail-0.1.js
  • robots.txt
  • preconnect.gif
  • experiments.html
  • favicon.ico
  • caches.json

None of these files are created by gulp dist, as far as I can tell (which is why the script generating files.txt doesn't pick them up. I'm guessing many get added by the release script. I'm not sure which ones are actually used, aside from:

  • robots.txt for crawling
  • caches.json for listing of AMP caches
  • v0/experiments.js(.map)? and experiments.html

    • Not sure why these wouldn't be a part of dist

@erwinmombay @rsimha @danielrozenberg Can any of you shed light on if/why the remaining files are necessary? I believe they're added by the release script

One way to trace the origins of these files is to look at the commit history of the release scripts. Might be worth doing an initial pass, after which we can reach out to whoever added them and ask if they're still in use.

The three service worker files in /sw are pulled from the amp-sw npm package in code added by @erwinmombay . Is this something that could be done as a part of gulp dist?

The amp-viewer-integration-a2.js script appears to have been added by @erwinmombay as a "temporary fix for" a bug with "AMP loaded via AGSA not requesting ads" (AGSA == Android Google Search Application). This was back in August of 2018; it looks like the underlying issue may have been addressed, but will need to verify before this can be removed safely.

The amp-viewer-integration-gmail.js script appears to have been added by @Marcial1234 during the release automation project. A code comment suggests this happens externally because the GMail AMP viewer can only be built within a google3 workspace. Maybe someone @ampproject/wg-viewers can confirm if that's still the case?

The caches.json file was added a bit over a year by @honeybadgerdontcare to keep it in sync with a https://github.com/ampproject/amphtml/blob/master/build-system/global-configs/caches.json . Since we end up copying it in the release script, would it make sense to add it to the output of gulp dist instead of doing the copy in the release script?

I haven't yet figured out how experiments.html & friends make it into the RTV directory (I expect @danielrozenberg probably knows)

These static files were made an explicit list by @Marcial1234. The script used to copy all non-JS/CSS/HTML files, which were these, though I'm not sure where they came from before that or if they're still in use.

  • amp_preconnect_polyfill_404_or_other_error_expected._Do_not_worry_about_it
  • favicon.ico
  • ping.html
  • preconnect.gif
  • robots.txt

I haven't yet figured out how experiments.html & friends make it into the RTV directory (I expect @danielrozenberg probably knows)

This is done in two steps (during gulp dist). It's possible that files.txt doesn't include it because it starts off in the build/ directory, but is copied to dist/ when code is released. To make sure the files are included in files.txt, we can either change the intermediate directory from build/ to dist/, or have generateFileListing() in dist.js also look at some of the contents of build/.

https://github.com/ampproject/amphtml/blob/abd20891fc5df50e19c833b6d36b87ecc2ee1649/build-system/tasks/dist.js#L209
https://github.com/ampproject/amphtml/blob/abd20891fc5df50e19c833b6d36b87ecc2ee1649/build-system/tasks/dist.js#L115

Since experiments.html & friends are indeed distributed, dist seems like an appropriate destination for them :thinking:

Since experiments.html & friends are indeed distributed, dist seems like an appropriate destination for them

No objections to moving this if it helps, but you'd have to do some coordination to make sure the release scripts are changed soon after this change is made in open source. I'll let @danielrozenberg / @erwinmombay comment on whether there are good reasons for not doing this.

The amp-viewer-integration-gmail.js script appears to have been added by @Marcial1234 during the release automation project. A code comment suggests this happens externally because the GMail AMP viewer can only be built within a google3 workspace. Maybe someone @ampproject/wg-viewers can confirm if that's still the case?

@zhangsu may know more about this. I am not familiar with this file.

The amp-viewer-integration-gmail.js script appears to have been added by @Marcial1234 during the release automation project. A code comment suggests this happens externally because the GMail AMP viewer can only be built within a google3 workspace. Maybe someone @ampproject/wg-viewers can confirm if that's still the case?

Yes, that's still the case.

Since experiments.html & friends are indeed distributed, dist seems like an appropriate destination for them

No objections to moving this if it helps, but you'd have to do some coordination to make sure the release scripts are changed soon after this change is made in open source. I'll let @danielrozenberg / @erwinmombay comment on whether there are good reasons for not doing this.

No objections either, but do note that basically every time in the past when we made a change that needed both an amphtml and a release process coordination, things broke because there was a cherry-pick that week that was using the pre-change amphtml code (mod the cherry-pick) with the post-change release script.

That is to say, if you are going to make this change, work closely with the release on-duty for the two weeks following to make sure that any cherry-pick release also includes this PR to avoid breakages

With cases where files are built internally and copied into dist/, or when they are static files copied in via the release script, I think the plan will be to append those explicitly copied files to files.txt within the release script.

With regards to experiments.html & pals, could we:

  1. Output to both dist/ and build/ initially within gulp dist
  2. Update all references to build/ to use the new location`
  3. After a two-week release cycle, drop the original build location safely
    @danielrozenberg WDYT? Am I missing anything here as far as you know?

Our release scripts have now been updated to generate the files.txt list at the very end, once all static and other files are pulled into the final serving directories. Marking closed.

Thanks so much for wrapping up this task @rcebulko .

The changes should be visible starting with next week's releases, (current experimental/beta/stable all were built with the previous release script version)

Was this page helpful?
0 / 5 - 0 ratings