Each AMP release contains a listing of all files that are included in the release, but it is incomplete. Some files exist in the CDN release folder that are not generated by the build process: gulp dist
.
1910161528000
locally: git checkout 1910161528000
and gulp dist
v0/validator.js
is not one of the files generated by gulp dist
(and hence not in files.txt
)Having a complete list of files included in each release would be helpful for self-hosting the amp-runtime, see issue #25873 .
Slack chat:
_mattwomple_ thanks erwinm. to your knowledge, is validator.js the only additional file available from the CDN beyond what gulp dist outputs?
_erwinm_ That's accessible through the cdn? I believe there are a few other artifacts. One i can think of for example is viewer integration scripts for the Android google app
@erwinmombay @ampproject/wg-caching
The validator is not part of the Runtime release and has it's own release cadence, typically once per week, usually Tuesdays.
@honeybadgerdontcare
Should files.txt be updated or should cdn.ampproject.org be hardcoded?
I'm not familiar enough with files.txt
or this specific code to answer that so I'll lean on @ampproject/wg-infra to chime in.
Is there a necessity to hard code ${urls.cdn}
to cdn.ampproject.org
? Isn't it preferred that it is something that could be set individually per cache?
@honeybadgerdontcare - When you append #development=amp
to a URL and view an AMP page, the validator loads from ${urls.cdn}/v0/validator.js
. If self-hosted amp runtimes were released today and validator.js
is not included the publisher's hosted runtime, then #development=amp
would fail because the script is not available. You can actually see this in my testing playground right now: https://amp-android.cmphys.com/wm-optimized/#development=amp . I see two options:
I created files.txt
a few months ago mostly to make it easy to download an entire runtime without having to compile it locally: https://github.com/ampproject/amphtml/pull/24331 .
The validator can be built locally following the instructions in the README.md. Is it problematic to use that if someone wants to self host their own validator?
I agree that self-hosting validator.js is probably fine.
While validator.js
has its own release schedule, ultimately the "current release" becomes tied to a specific RTV simply by inclusion in the files available under cdn.ampproject.org/rtv/{rtv}. For example, https://cdn.ampproject.org/rtv/011908162035200/v0/validator.js
is the validator release that was available at the time 1908162035200 was released. Given this behavior, and that files.txt
is a snapshot of all files available available in an AMP release, it seems reasonable to me that files.txt
be updated to include v0/validator.js
.
Interesting that we're releasing a validator within the RTV directory. I'm not sure that is something we should be doing.
@rsimha our dev tooling uses the validator served at cdn.ampproject.org/v0/validator.js which always has the latest release from wg-caching. Is there a reason to also be releasing it with an RTV? Where does that come from?
@danielrozenberg Does the automated release process copy the latest validator binary at the time of release to each RTV directory? Or is it a redirect done at serving time?
@danielrozenberg Does the automated release process copy the latest validator binary at the time of release to each RTV directory? Or is it a redirect done at serving time?
To answer my own question, it is the former. At the time of generating a new release, the automation copies the current latest validator binary to each RTV directory.
Maybe @erwinmombay can comment on why this was put in place to start with?
I don't recall why we started doing so but i believe it was only precautionary as we were copying everything from the bare unversioned directories to the versioned (rtv) directories.
from what i can see the runtime doesn't ever request an rtv path for the validator
After discussing with wg:caching, we don't see any issue with leaving old versions of validator.js in RTV directories since it's never used. However, we also believe we shouldn't be putting them there. We'll leave it up to wg:infra if they want to change that practice.
If there's no reason for validator.js
to be copied to RTV directories, we can certainly remove the step that copies them from the release pipeline, so that future releases do not contain the copy.
@honeybadgerdontcare @erwinmombay To confirm once more, does this sound good to you?
It sounds good to me.
As of about a week ago I removed the part of the release script that copies validator.js
into the RTV directory. Marking this as resolved.
As of about a week ago I removed the part of the release script that copies
validator.js
into the RTV directory. Marking this as resolved.
Could you check the text in the original bug report below the text "Slack chat" to verify this really should be closed?
@mattwomple My eyes passed right over that gray text! Re-opening until I can dig in, thank you
The following files are present in the RTV directory but absent from files.txt
:
amp_preconnect_polyfill_404_or_other_error_expected._Do_not_worry_about_it
sw/service-worker-remover.js
sw/amp-sw.js
sw/optional-modules.js
ping.html
v0/experiments.js
v0/amp-viewer-integration-a2.js
v0/experiments.js.map
v0/amp-viewer-integration-gmail-0.1.js
robots.txt
preconnect.gif
experiments.html
favicon.ico
caches.json
None of these files are created by gulp dist
, as far as I can tell (which is why the script generating files.txt
doesn't pick them up. I'm guessing many get added by the release script. I'm not sure which ones are actually used, aside from:
robots.txt
for crawlingcaches.json
for listing of AMP cachesv0/experiments.js(.map)?
and experiments.html
dist
@erwinmombay @rsimha @danielrozenberg Can any of you shed light on if/why the remaining files are necessary? I believe they're added by the release script
One way to trace the origins of these files is to look at the commit history of the release scripts. Might be worth doing an initial pass, after which we can reach out to whoever added them and ask if they're still in use.
The three service worker files in /sw
are pulled from the amp-sw
npm package in code added by @erwinmombay . Is this something that could be done as a part of gulp dist
?
The amp-viewer-integration-a2.js
script appears to have been added by @erwinmombay as a "temporary fix for" a bug with "AMP loaded via AGSA not requesting ads" (AGSA == Android Google Search Application). This was back in August of 2018; it looks like the underlying issue may have been addressed, but will need to verify before this can be removed safely.
The amp-viewer-integration-gmail.js
script appears to have been added by @Marcial1234 during the release automation project. A code comment suggests this happens externally because the GMail AMP viewer can only be built within a google3 workspace. Maybe someone @ampproject/wg-viewers can confirm if that's still the case?
The caches.json
file was added a bit over a year by @honeybadgerdontcare to keep it in sync with a https://github.com/ampproject/amphtml/blob/master/build-system/global-configs/caches.json . Since we end up copying it in the release script, would it make sense to add it to the output of gulp dist
instead of doing the copy in the release script?
I haven't yet figured out how experiments.html
& friends make it into the RTV directory (I expect @danielrozenberg probably knows)
These static files were made an explicit list by @Marcial1234. The script used to copy all non-JS/CSS/HTML files, which were these, though I'm not sure where they came from before that or if they're still in use.
amp_preconnect_polyfill_404_or_other_error_expected._Do_not_worry_about_it
favicon.ico
ping.html
preconnect.gif
robots.txt
I haven't yet figured out how
experiments.html
& friends make it into the RTV directory (I expect @danielrozenberg probably knows)
This is done in two steps (during gulp dist
). It's possible that files.txt
doesn't include it because it starts off in the build/
directory, but is copied to dist/
when code is released. To make sure the files are included in files.txt
, we can either change the intermediate directory from build/
to dist/
, or have generateFileListing()
in dist.js
also look at some of the contents of build/
.
https://github.com/ampproject/amphtml/blob/abd20891fc5df50e19c833b6d36b87ecc2ee1649/build-system/tasks/dist.js#L209
https://github.com/ampproject/amphtml/blob/abd20891fc5df50e19c833b6d36b87ecc2ee1649/build-system/tasks/dist.js#L115
Since experiments.html
& friends are indeed distributed, dist
seems like an appropriate destination for them :thinking:
Since
experiments.html
& friends are indeed distributed,dist
seems like an appropriate destination for them
No objections to moving this if it helps, but you'd have to do some coordination to make sure the release scripts are changed soon after this change is made in open source. I'll let @danielrozenberg / @erwinmombay comment on whether there are good reasons for not doing this.
The
amp-viewer-integration-gmail.js
script appears to have been added by @Marcial1234 during the release automation project. A code comment suggests this happens externally because the GMail AMP viewer can only be built within a google3 workspace. Maybe someone @ampproject/wg-viewers can confirm if that's still the case?
@zhangsu may know more about this. I am not familiar with this file.
The amp-viewer-integration-gmail.js script appears to have been added by @Marcial1234 during the release automation project. A code comment suggests this happens externally because the GMail AMP viewer can only be built within a google3 workspace. Maybe someone @ampproject/wg-viewers can confirm if that's still the case?
Yes, that's still the case.
Since
experiments.html
& friends are indeed distributed,dist
seems like an appropriate destination for themNo objections to moving this if it helps, but you'd have to do some coordination to make sure the release scripts are changed soon after this change is made in open source. I'll let @danielrozenberg / @erwinmombay comment on whether there are good reasons for not doing this.
No objections either, but do note that basically every time in the past when we made a change that needed both an amphtml and a release process coordination, things broke because there was a cherry-pick that week that was using the pre-change amphtml code (mod the cherry-pick) with the post-change release script.
That is to say, if you are going to make this change, work closely with the release on-duty for the two weeks following to make sure that any cherry-pick release also includes this PR to avoid breakages
With cases where files are built internally and copied into dist/
, or when they are static files copied in via the release script, I think the plan will be to append those explicitly copied files to files.txt
within the release script.
With regards to experiments.html
& pals, could we:
dist/
and build/
initially within gulp dist
build/
to use the new location`build
location safelyOur release scripts have now been updated to generate the files.txt list at the very end, once all static and other files are pulled into the final serving directories. Marking closed.
Thanks so much for wrapping up this task @rcebulko .
The changes should be visible starting with next week's releases, (current experimental/beta/stable all were built with the previous release script version)