Hi,
After this discussion, https://productforums.google.com/forum/#!msg/webmasters/_EFTDi1koJo/EAoRom0fIQAJ, Tomo T suggested we open an issue here.
It's a known issue (#1662) that Cloudflare (and maybe other) are breaking AMP pages by inserting on the fly some javascript that do not comply with AMP specs.
Tomo T mentioned you guys are in touch with cloudflare, so maybe resolution is on its way, but in our wbAMP plugin for Joomla, we did the following:
1 - added no-cache headers
2 - Added a X-amphtml: wbAMP header, so that users can setup rules to bypass those pages in their CDN control panel for instance.
Is there any merit to this?
Rgds
I'm not an expert on cloudflare, but I went in and played around with the settings on a free account to see what cloudflare is doing. I think the feature which breaks AMP validation is called Rocket Loader. It seems, among other changes, to insert into every HTML document head the following snippet:
<script type="text/javascript">
//<![CDATA[
[Cloudflare javascript loader code here]
//]]>
</script>
This loads javascript from cloudflare's site, which isn't part of the AMP runtime and this is invalid for AMP. From my understanding of Rocket Loader's description, valid AMP pages won't be improved by Rocket Loader as they already load javascript asynchronously. As a workaround you can use today, you can disable Rocket Loader for amp pages.
The Rocket Loader feature appears to be under the "Speed" heading in cloudflare's web console, but it would probably be better to configure it just for AMP pages by setting up a specific "Page Rule" that disables Rocket Loader only for the section of your site that is AMP.
Thanks for your reply. We had seen the JS being inserted, but it's good to know this particular feature is involved. I will pass the information.
Nevertheless, the question stands whether it could be useful to add a specific header to amp page so that his process could be integrated by CDN or proxies?
Rgds
I've let cloudflare know as well. Generally CDNs could detect the doc as AMP HTML and avoid the JS insert (and, perhaps, do AMP specific things like image creation and srcset population).
Hi there,
@Gregable Our user disabled Rocker Loader on his site, and that solved the validation issue. However, it appears the Rocket Loader feature is something that can only be enabled/disabled for an entire subdomain, not using Page Rules, ie you can't exclude only some URLs.
So only solutions are a/ drop it entirely, or b/ move AMP pages to a subdomain, which is a bit involevd.
I would assume Cloudflare would want to be able to identify AMP pages and disable their plugin then. As @jmadler said, they can just look at the amp attribute in the html tag?
I still think a header or something that doesn't require looking at the actual page content to recognize an AMP page would be helpful in the real world.
Rgds
Hi, I work for CloudFlare but am not working in this specific area, it's been raised internally and this issue is being tracked and we'll look to get a fix out as soon as possible. If I have news on this, I'll update this bug. Feel free to ping me for updates on this.
For now, a CloudFlare customer will want to disable:
Those can be disabled using page rules, such that you can leave the features enabled for the rest of the content served.
We'd recommend leaving the following enabled for AMP:
@buro9 I did everything via a page rule. The problem is still there. Using Wordpress and CloudFlare. "The attribute 'type' in tag 'script type=application/ld+json' is set to the invalid value 'text/javascript'." is the only error left. Is there any waiting time for the settings to apply?
@eperezf I've raised that as a bug ticket internally. I am curious, does disabling HTML minification resolve that?
Settings are usually applied globally within seconds.
@jmadler I work for CloudFlare and would be interested to chat about ways to collaborate and potentially add AMP specific logic to our edge.
@buro9 I think HTML Minification is not the issue here as the problem is the script type "text/javascript" inserted in the
section. Just like Google Analytics, it should be formatted as "ld+/json" maybe.I just checked again the validator and the error changed to "The tag 'script' is disallowed except in specific forms". Maybe if the script can be loaded form an external file instead of being inserted on the head section it may work.
No, external file will not be allowed to be loaded either.
It can using <script async src="">
. At least it does on my website. The problem is inline javascript from Cloudflare using <script type="text/javascript">
.
On the main document, <script async src>
is only allowed to the AMP runtime or extensions from cdn.ampproject.org
.
Can CloudFlare assume that an AMP Crawler will always have '(compatible; Google-AMPHTML)' in the user agent or at least AMPHTML?
No, you'll need to check for the presence of "AMP" or "âš¡" in the tag, case insensitive.
FWIW, mod_pagespeed has a parallel set of issues. We are working on solving those now, and you can follow our progress here: https://github.com/pagespeed/mod_pagespeed/issues/1263
The fact that we have to sniff content to tell something is an AMP document complicates the solution somewhat, particularly when streaming HTML. But it is all do-able (and in fact has already been done, just not all code-reviewed yet).
That is the root of the issue we have too. For the most part we do not parse or otherwise look at the content of HTML that passes through CloudFlare. Some features do in fact require a parser, but many do not and up to now could safely be added to a HTML document without parsing or otherwise inspecting the HTML document content.
For performance you may imagine that we only parse that which we absolutely must parse, which is a subset of all text/html documents that pass through CloudFlare.
It would be great if there were a header that declared an AMP document, the content-type would be best for this and currently most AMP pages return text/html
which does not identify them as AMP. Something like text/html+amp
would be great but probably has it's own issues (many legacy things that only expect text/html and nothing else).
We will probably also move to sniffing the html tag, but until then the advice for AMP users will be to disable features that are not AMP aware.
@buro9 This is what I suggested in the OP, but indeed I didn't think a separate content type value would be practical. Hence might be a good idea to have a separate header?
Right now we add:
X-amphtml: wbAMP
as wbAMP is our extension to produce AMP-compliant page from Joomla pages, but obviously something more generic would be a better fit.
I also believe that having to actually parse (though in a limited way) the actual content to decide whether to leave it alone or process it is a major overhead, and will prevent adoption.
I would think that the features that CloudFlare offers that might break AMP are a proper subset of the ones that have to parse HTML anyway, but I have to admit I don't understand fully how rocket-loader works.
In any case I agree it would've been a lot more convenient for all involved if amp documents were required by the spec to self-identify with an http response header.
I would think that the features that CloudFlare offers that might break AMP are a proper subset of the ones that have to parse HTML anyway,
Pretty inefficient to have to do the parsing to discover that you won't be able to the use the parsed content because this is an AMP page. Better look at a header, to know that a specific set of features won't apply, and you don't need to parse (or apply whatever processing you want to apply)
I think "parse" is a bit of a strong word here. It's just a very top header of the page that needs to be inspected. I don't think we can require all publishers to include a header for all amp pages - that would seem to be error-prone and duplicative.
Would it help if we added validation checks that the tag is within the first maybe 100 bytes of the document? Other than comments, it's already required to be the 2nd tag on the page, but comments and whitespace make it possible to be an arbitrary length from the top.
Why not just ban comments until after the opening html
tag? Then it's guaranteed to be before the second >
character.
Within the first 100 bytes is definitely going to be of great help. That's
enough to allow us to decide what to enable/disable, whilst allowing us to
send headers as early as possible.
This at least allows this to be a sniffer and not a parser and the
constraint allows it to be tightly defined which is good for security
(mitigates resource exhaustion during a DDoS).
Content-type is still preferable, but a 100 byte limit, whereby the AMP
string or lightning rune must fully complete within the first 100 bytes
would be good.
On Wed, 16 Mar 2016, 21:42 Justin Ridgewell, notifications@github.com
wrote:
Why not just ban comments until after the opening html tag? Then it's
guaranteed to be before the second > character.—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/ampproject/amphtml/issues/2380#issuecomment-197560035
The first 100 bytes might be tight. E.g. from
https://en.wikipedia.org/wiki/Document_type_declaration
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
lang="ar" dir="ltr" xmlns="http://www.w3.org/1999/xhtml">On Wed, Mar 16, 2016 at 5:54 PM, David Kitchen notifications@github.com
wrote:
Within the first 100 bytes is definitely going to be of great help. That's
enough to allow us to decide what to enable/disable, whilst allowing us to
send headers as early as possible.This at least allows this to be a sniffer and not a parser and the
constraint allows it to be tightly defined which is good for security
(mitigates resource exhaustion during a DDoS).Content-type is still preferable, but a 100 byte limit, whereby the AMP
string or lightning rune must fully complete within the first 100 bytes
would be good.On Wed, 16 Mar 2016, 21:42 Justin Ridgewell, notifications@github.com
wrote:Why not just ban comments until after the opening html tag? Then it's
guaranteed to be before the second > character.—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
<
https://github.com/ampproject/amphtml/issues/2380#issuecomment-197560035>—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/ampproject/amphtml/issues/2380#issuecomment-197567339
I'm new to AMP, but I was under the impression that the DOCTYPE was dictated as HTML5: https://www.ampproject.org/docs/reference/spec.html
If not, and any valid DOCTYPE is acceptable then something more like 256 bytes would be better, though we've just doubled the proposed sniffer requirement and I'm thinking the content-type is still preferable.
The first 100 bytes might be tight.
Not an issue, we mandate <!doctype html>
(HTML5) doctype.
Content-type would be nice, and could be recommended, but plenty of content creators don't have the ability or the know-how to modify headers, therefore I think we will want to support tags.
We do mandate <!doctype html>
for a valid doc currently. Nobody has proposed supporting DTDs yet, though if we want them we should resolve that asap.
The value of 100 was a strawman. I like 256 even more. I could try to get an estimate of what we're seeing in the wild, but given that DTDs are not allowed now, it might not be a good metric to look at.
Not allowing comments would be possible, but is tricky for some validation implementations if they run on top of an HTML parser which doesn't have callbacks for comments. We'd also definitely need to look at this in the wild, there may be some CMS systems that need a comment header for reasons. tl;dr: needs more research.
How about 256? And to be specific, I mean the trailing >
in the <html amp>
tag must be the 256'th byte or lower.
256 bytes is good, I'll run the proposal internally tomorrow. We'd not stop asking for the content-type (especially as there is incompatibility with a user agents expectation of what text/html means and what it's about to receive), but lacking that... this constraint is a good one.
I mean the trailing > in the tag must be the 256'th byte or lower.
Agreed.
We would expect to implement a sniffer that would exit once it encounters either the closing > on the HTML tag, or 256 bytes. Whichever is sooner.
I think if we scan 256 bytes - might as well do 1000 bytes - that doesn't change anything but would remove most of edge cases.
Also, i think it'd be just as easy to wait until the first >
to do the check - it's common for server-side filters to look in the <!doctype>
and <html>
declarations.
Please no.
1000 bytes is excessive. We serve a lot of pages and need to think of both performance and security. 256 is sufficient for performance whilst also mitigating some of the risk of resource exhaustion.
We definitely wouldn't wait for the first > , what if someone serves a 900MB video file as text/html and we're sitting there buffering it all during a sniff operation in anticipation of a character that won't arrive. People do this stuff every day, best not to ask why they do and accept that they do.
If we are going to sniff all text/html (something we're not doing today) and cannot just use a content-type (the right solution), we really want well defined rules for this. First 256 bytes to include the closing HTML tag, and contains either AMP or lightning emoji - this is a good rule and not incompatible with anything in the AMP spec. It just clarifies and defines something currently ambiguous.
There is no need to look further into the document, we just want to know very quickly whether this is an AMP doc or not.
The issue that mod_pagespeed also describe on their linked issue above with http-equiv meta refresh is similar... they would want to detect AMP very early so that they could adjust the HTTP headers if needed. Once they have gone beyond the sniff operation they (like us) want to be streaming that content and not holding it in a buffer.
All of us who provide reverse proxies, transparent proxies, transformation proxies, need to be able to be sure that we can identify AMP extremely early, within the scope of a single tiny buffer.
Let's first agree on an approach and then discuss content-type and scan maximum size.
My concern with the approach of "closing caret of html tag must be within 256 bytes and html tag must contain 'AMP' or 'âš¡', case insensitive" is that it may not be as performant as typical approaches for identifying content.
What about something like "' is invalid.
Does that approach make sense?
I am aligned with Cloudflare's concerns about the tag being in the first
256 bytes. The in-flight mod_pagespeed implementation doesn't have that
limit but probably it should, to reduce the risk of unbounded buffer growth.
I don't think it makes a lot of sense to compress whitespace when computing
that window. Either the proxy has to first the sniff N bytes or it has to
do something more complicated.
I am not a fan of requiring that exact byte sequence "
other cases, HTML attributes are not order sensitive so I don't think theI also think that if someone puts:
in the first 256 bytes, that should _not_ be recognized as an AMP
document. So just scanning for that byte sequence in the first 256 bytes
seems wrong to me.
FWIW here is our open-sourced amp-recognition filter,
currently insensitive to byte-count. Outside this context, MPS sniffs all
content-type:text/html input, expecting the first non-whitespace character
to be a '<'. This implementation ignores all whitespace and comments, but
will give up at the first tag other than
, as well as at the first non-whitespace character.
https://github.com/pagespeed/mod_pagespeed/blob/master/pagespeed/kernel/html/amp_document_filter.cc
We can easily enforce the 256-character limit outside the context of
this filter by only letting this filter run on the first 256 bytes of
text we see, Our full HTML lexer runs in front of this, but I don't
think that'll be a speed issue.
Just wanted to see if folks had any updates to share here?
There is https and http issue in AMP page as While validation AMP page it only accept amp js on https not on http
How to resolve that issue.
The mandatory tag 'amphtml engine v0.js script' is missing or incorrect. (see https://www.ampproject.org/docs/reference/spec.html#required-markup)
/cc @Gregable
@uraayush, I think that this is separate from the cloudflare issue. My understanding is that our general recommendation for these types of questions is to ask over at the amphtml component at stackoverflow. I'll still try to answer it.
I suspect you are including the mandatory script tag as:
<script async src="http://cdn.ampproject.org/v0.js"></script>
, ie: as an http URL
It must instead always be included as an https URL, which is fine to do from an http document, so:
<script async src="https://cdn.ampproject.org/v0.js"></script>
Note: the canonical minimal valid AMP html document has an Apache 2.0 copyright comment at the top:
the html lightning-bolt tag appears at about character 779 in that file (more or less..I'm not sure how Emacs counts utf8 chars :)
I use CloudFlare on a wordpress site and it didn't work until i added a page rule in CloudFlare. I essentially turned off everything under the pattern _www.domain.com//amp/_
It worked for me so hopefully it will help someone else.
I vote for a 256 byte limit. Who owns the next action? Is that validator enforcement/warning? cc @Gregable
Thanks for pinging this. I chatted with @dvoytenko and I think we also agreed to a 256 byte limit, but decided that it wouldn't be something the validator would enforce, but instead we should just document it as a suggestion / possibly make it a validator warning.
Hi all,
Glad to see the discussion has progressed, through I'm not sure where Cloudflare actually stands right now. Anyway, this is just to mention that similar problem appears with NewRelic, which PHP agent (I assume the same with other agents, just haven't dealt with anything else yet) also insert script tags into the page for some of their features.
There is no current fix available, short of disabling the NewRelic agent from the CMS/application generating the AMP page.
cheers
Hi, yet another CloudFlare developer here :) As of the moment, AMP pages (any pages with <html amp>
or <html âš¡>
) should be excluded from our transformations. Please let me know if you still experience issues and will be happy to look into it.
Hi @RReverser
Thanks for updating us. Can you confirm the presence of the AMP tag anywhere on the page is enough, or is there a 256 or more first-characters-in-page limit to be aware of, as mentioned earlier on this issue?
Thanks
@weeblr AMP page, unlike generic HTML, has strict rules in regard to HTML structure, so I went ahead and implemented it without any limit (that is, any valid AMP page should be excluded from the transformations). Note that this code was just recently deployed to production endpoints, but as far as I know, should be already available on all of them.
@RReverser Still facing the issue. Except those script tags added by Cloudflare, HTML is a valid AMP HTML. Is there any other configuration needs to be changed in cloudflare side?
The following code has been added to the bottom of my page.
<script type="text/javascript">window.NREUM||(NREUM={});NREUM.info={"beacon":"bam.nr-data.net","licenseKey":"xxxxxx","applicationID":"xxxx","transactionName":"xxxxxxxxxxxx==","queueTime":0,"applicationTime":193,"ttGuid":"","agentToken":"","atts":"xxxxxxxx=","errorBeacon":"bam.nr-data.net","agent":""}</script>
The following code at the top of my page.
<script type="text/javascript">window.NREUM||(NREUM={}),__nr_require=function(e,t,n){function r(n){if(!t[n]){var o=t[n]={exports:{}};e[n][0].call(o.exports,function(t){var o=e[n][1][t];return r(o||t)},o,o.exports)}return t[n].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<n.length;o++)r(n[o]);return r}({1:[function(e,t,n){function r(e,t){return function(){o(e,[(new Date).getTime()].concat(a(arguments)),null,t)}}var o=e("handle"),i=e(2),a=e(3);"undefined"==typeof window.newrelic&&(newrelic=NREUM);var u=["setPageViewName","addPageAction","setCustomAttribute","finished","addToTrace","inlineHit"],c=["addPageAction"],f="api-";i(u,function(e,t){newrelic[t]=r(f+t,"api")}),i(c,function(e,t){newrelic[t]=r(f+t)}),t.exports=newrelic,newrelic.noticeError=function(e){"string"==typeof e&&(e=new Error(e)),o("err",[e,(new Date).getTime()])}},{}],2:[function(e,t,n){function r(e,t){var n=[],r="",i=0;for(r in e)o.call(e,r)&&(n[i]=t(r,e[r]),i+=1);return n}var o=Object.prototype.hasOwnProperty;t.exports=r},{}],3:[function(e,t,n){function r(e,t,n){t||(t=0),"undefined"==typeof n&&(n=e?e.length:0);for(var r=-1,o=n-t||0,i=Array(0>o?0:o);++r<o;)i[r]=e[t+r];return i}t.exports=r},{}],ee:[function(e,t,n){function r(){}function o(e){function t(e){return e&&e instanceof r?e:e?u(e,a,i):i()}function n(n,r,o){e&&e(n,r,o);for(var i=t(o),a=l(n),u=a.length,c=0;u>c;c++)a[c].apply(i,r);var s=f[g[n]];return s&&s.push([m,n,r,i]),i}function p(e,t){w[e]=l(e).concat(t)}function l(e){return w[e]||[]}function d(e){return s[e]=s[e]||o(n)}function v(e,t){c(e,function(e,n){t=t||"feature",g[n]=t,t in f||(f[t]=[])})}var w={},g={},m={on:p,emit:n,get:d,listeners:l,context:t,buffer:v};return m}function i(){return new r}var a="nr@context",u=e("gos"),c=e(2),f={},s={},p=t.exports=o();p.backlog=f},{}],gos:[function(e,t,n){function r(e,t,n){if(o.call(e,t))return e[t];var r=n();if(Object.defineProperty&&Object.keys)try{return Object.defineProperty(e,t,{value:r,writable:!0,enumerable:!1}),r}catch(i){}return e[t]=r,r}var o=Object.prototype.hasOwnProperty;t.exports=r},{}],handle:[function(e,t,n){function r(e,t,n,r){o.buffer([e],r),o.emit(e,t,n)}var o=e("ee").get("handle");t.exports=r,r.ee=o},{}],id:[function(e,t,n){function r(e){var t=typeof e;return!e||"object"!==t&&"function"!==t?-1:e===window?0:a(e,i,function(){return o++})}var o=1,i="nr@id",a=e("gos");t.exports=r},{}],loader:[function(e,t,n){function r(){if(!w++){var e=v.info=NREUM.info,t=s.getElementsByTagName("script")[0];if(e&&e.licenseKey&&e.applicationID&&t){c(l,function(t,n){e[t]||(e[t]=n)});var n="https"===p.split(":")[0]||e.sslForHttp;v.proto=n?"https://":"http://",u("mark",["onload",a()],null,"api");var r=s.createElement("script");r.src=v.proto+e.agent,t.parentNode.insertBefore(r,t)}}}function o(){"complete"===s.readyState&&i()}function i(){u("mark",["domContent",a()],null,"api")}function a(){return(new Date).getTime()}var u=e("handle"),c=e(2),f=window,s=f.document;NREUM.o={ST:setTimeout,CT:clearTimeout,XHR:f.XMLHttpRequest,REQ:f.Request,EV:f.Event,PR:f.Promise,MO:f.MutationObserver},e(1);var p=""+location,l={beacon:"bam.nr-data.net",errorBeacon:"bam.nr-data.net",agent:"js-agent.newrelic.com/nr-918.min.js"},d=window.XMLHttpRequest&&XMLHttpRequest.prototype&&XMLHttpRequest.prototype.addEventListener&&!/CriOS/.test(navigator.userAgent),v=t.exports={offset:a(),origin:p,features:{},xhrWrappable:d};s.addEventListener?(s.addEventListener("DOMContentLoaded",i,!1),f.addEventListener("load",r,!1)):(s.attachEvent("onreadystatechange",o),f.attachEvent("onload",r)),u("mark",["firstbyte",a()],null,"api");var w=0},{}]},{},["loader"]);</script>
@venkatgig @RReverser As can be seen in the script itself, this is not coming from Cloudflare but instead from NewRelic. Per my previous message, the only fix is to disable the NewRelic agent for the page, which can be done programmatically only, AFAIK
@RReverser @weeblr
Sorry guys, my bad. Weeblr thanks for pointing out.
@weeblr Right, thanks!
@venkatgig Just add:
// Disable newrelic
if (extension_loaded('newrelic')) {
newrelic_disable_autorum();
}
Somewhere in your template PHP
@RReverser We are still having a user with this problem, though the code inserted is not exactly the same as initially, I think:
<script type="text/javascript">
//<![CDATA[
try{if (!window.CloudFlare) {var CloudFlare=[{verbose:0,p:1465027686,byc:0,owlid:"cf",bag2:1,mirage2:0,oracle:0,paths:{cloudflare:"/cdn-cgi/nexp/dok3v=1613a3a185/"},atok:"38dbd94d94476a3bd906cc74fe67ab04",petok:"b2a42bbfa7d27c4bbf6ce1e7bd71cea2be91ac22-1465395126-1800",zone:"bidanku.com",rocket:"0",apps:{"ga_key":{"ua":"UA-6855089-1","ga_bs":"2"}},sha2test:0}];!function(a,b){a=document.createElement("script"),b=document.getElementsByTagName("script")[0],a.async=!0,a.src="//ajax.cloudflare.com/cdn-cgi/nexp/dok3v=e982913d31/cloudflare.min.js",b.parentNode.insertBefore(a,b)}()}}catch(e){};
//]]>
</script>
<script data-pagespeed-no-defer>(function(){function d(b){var a=window;if(a.addEventListener)a.addEventListener("load",b,!1);else if(a.attachEvent)a.attachEvent("onload",b);else{var c=a.onload;a.onload=function(){b.call(this);c&&c.call(this)}}}var p=Date.now||function(){return+new Date};window.pagespeed=window.pagespeed||{};var q=window.pagespeed;function r(){this.a=!0}r.prototype.c=function(b){b=parseInt(b.substring(0,b.indexOf(" ")),10);return!isNaN(b)&&b<=p()};r.prototype.hasExpired=r.prototype.c;r.prototype.b=function(b){return b.substring(b.indexOf(" ",b.indexOf(" ")+1)+1)};r.prototype.getData=r.prototype.b;r.prototype.f=function(b){var a=document.getElementsByTagName("script"),a=a[a.length-1];a.parentNode.replaceChild(b,a)};r.prototype.replaceLastScript=r.prototype.f;
r.prototype.g=function(b){var a=window.localStorage.getItem("pagespeed_lsc_url:"+b),c=document.createElement(a?"style":"link");a&&!this.c(a)?(c.type="text/css",c.appendChild(document.createTextNode(this.b(a)))):(c.rel="stylesheet",c.href=b,this.a=!0);this.f(c)};r.prototype.inlineCss=r.prototype.g;
Can you advise any workaround?
Thanks
First part: Cloudflare insertin GA. Disable the app on Cloudflare & add GA in your template instead.
Second part: Google pagespeed module. You must completely disable that for all AMP pages. (by creating a location in your NGINX for example) To do a test you could of course disable pagespeed alltogether.
Hi @Joostvanderlaan
There must be a confusion in your reading of my past message. All of this JS is inserted by Cloudflare on the fly. From past discussion on this thread, Cloudflare is supposed now to leave AMP pages alone, unmodified, it seems they don't yet.
Rgds
Then still, you could stop Cloudflare from doing that by disabling the App in Cloudflare that injects javascript OR by creating a PageRule that disables the problemetic Cloudflare options. (Most of the Cloudflare options don't even inject JS)
(I'm using AMP + Cloudflare without problems)
@weeblr Actually this second part from your example makes me suspicious too:
<script data-pagespeed-no-defer>(function()...
and
window.pagespeed=window.pagespeed||{};var q=window.pagespeed;function r(){this.a=!0}r.prototype.c=function(b){b=parseInt(b.substring(0,b.indexOf(" ")),10);return!isNaN(b)&&b<=p()};r.prototype.hasExpired=r.prototype.c;r.prototype.b=function(b){return b.substring(b.indexOf(" ",b.indexOf(" ")+1)+1)};r.prototype.getData=r.prototype.b;
This looks like PageSpeed module and not CloudFlare.
If you're certain that it's not, can you please show URL where this is happening so that we could look into it?
@Joostvanderlaan
Then still, you could stop Cloudflare from doing that by disabling the App in Cloudflare that injects javascript"
The question here is not to "solve the problem". It's to understand why this is happening, and whether it's expected (by Cloudflare) behavior.
Again, it was said that Cloudflare would not "touch" any AMP page from now on. Whether you are affected or not may depends on the Cloudflare features you use, right?
@RReverser You are right. I have sent a message to customer, as they seem to have solved the issue now (this is not added to their pages any longer), asking what they did exactly. I cannot share the URL w/o their consent, but again, the JS is not there any more.
I'll post back (if if they reply!) with any info I can get.
Thanks for your replies.
@weeblr Cool, glad it's resolved.
For the record, the next release of PageSpeed will automatically disable
any injection of scripts or other non-amp-compliant changes, whenever
serving an AMP document.
In the meantime, a good workaround is to use
ModPagespeedRewriteLevel OptimizeForBandwidth (apache)
pagespeed RewriteLevel OptimizeForBandwidth; (nginx)
in the /amp subdirectory.
-Josh
On Thu, Jun 9, 2016 at 6:47 AM, Ingvar Stepanyan notifications@github.com
wrote:
@weeblr https://github.com/weeblr Cool, glad it's resolved.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ampproject/amphtml/issues/2380#issuecomment-224860784,
or mute the thread
https://github.com/notifications/unsubscribe/AB2kPXhAQKNPtrETd8O8NuHVY--9Zc7iks5qJ-8sgaJpZM4Hnltf
.
Thanks for this information. Few users have actually control over their servers. Was wondering if adding a header would also be a valid alternative solution? as in:
Cache-Control: no-transform
Rgds
RE disabling PageSpeed via server controls: usually a user who has
installed mod_pagespeed/ngx_pagespeed will be in control of how to adjust
the settings. I guess you are talking about situations where a hosting
provider has installed mod_pagespeed for a site without giving them any
control? Do you see that a lot?
Anyway, in that situation, adding
response-header cache-control:no-transform will work, as will PageSpeed:off.
On Thu, Jun 9, 2016 at 8:58 AM, Weeblr notifications@github.com wrote:
Thanks for this information. Few users have actually control over their
servers. Was wondering if adding a header would also be a valid alternative
solution? as in:
Cache-Control: no-transformRgds
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ampproject/amphtml/issues/2380#issuecomment-224886534,
or mute the thread
https://github.com/notifications/unsubscribe/AB2kPWCP5c0oMpyDyz4lWrwf90qD409gks5qKA4TgaJpZM4Hnltf
.
Re: hosting companies, I assume so, and I don't have a large sample enough for a valid opinion.
I will add the PageSpeed:Off header anyway, doesn't cost much and can save some head scratching for some.
Thnks
@weeblr could you make that an option? Since I would still like to use pagespeed for my images :)
Well I am doing the Joomla AMP extension, so I was only talking about the HTML document response. Assets are not going through the CMS, and are simply served by the server, so there's no issue there.
Example;
I want to optimize images only (to save bandwith). PageSpeed can do that without changing the HTML and without injecting any JS. So this will work fine with AMP. Bonus, my images will be smaller and served faster. That's what AMP is about, right? Being fast and small.
Now when you disable PageSpeed on the HTML, the above wouldn't work. So at least make this an option in the extension if you really must add this. At least people who do have a serious webhost (including control of their server) can decide for themselves if they want PageSpeed or not.
Still a bit confused. InPlace optimization means the images URLs are not altered. Optimization happens independently of the HTML, so adding PageSpeed off on the HTML should not prevent Pagespeed to optimize your images
Pagespeed still needs to read the image urls from the HTML. That's why it needs to be enabled for that html (or better: amphtml) page.
ok, you're right in this case. Still, there's other filters like trim url's that could be useful with AMP.
https://developers.google.com/speed/pagespeed/module/filter-trim-urls#operation
Actually, that's probably another example that would not work, as using the "base' tag is not allowed in AMP html spec (not sure why actually, maybe to make link resolution easier for AMP js?).
It looks like each feature in PS would have to be reviewed and checked for validity.
There is a base tag in the example, though I am not sure that needs to be explicitly there. couldn't it trim the urls without the base tag? (it knows the domain from the config and request uri)
@Joostvanderlaan in any case such transformation is unlikely to give any benefits if you're using gzip.
Though I'm not familiar with PS, trimming seems to be the act of putting the base URL in the base tag, and removing all unneeded parts of URLs on the page, to make them relative to the base tag content. Don't see what could be trimmed otherwise.
A relative url does not need a base tag. Trimming means stripping the domain and making things relative. Not also putting that domain in the base tag. As seen in the PS example, the base url is in both the original + the output, nothing changed there.
You could use it for other tricks like relatively linking all assets in example.com/assets/ as seen here http://webdesign.tutsplus.com/articles/quick-tip-set-relative-urls-with-the-base-tag--cms-21399
A relative URL exists for a long time, even without base tags they work.
Anyway, we're drifting. The point is, why would you add headers that disable PS altogether in an extension, while other people might have the need to use both PS and your extension? It just doesn't make any sense to disable pagespeed by default for all AMP pages.
After all, every removed byte, or even better, millisecond of decreased latency, is a performance gain. And that's what AMP is about.
You're right with that, the base tag is not needed.
This discussion however totally convinced me that for a distributed product such as mine, from a support standpoint, making this a user-available setting is a bad idea.
Remember, we are not talking about disabling PS on AMP pages, only on the HTML. Meaning that resources (images mostly in our case) will still be optimized.
And indeed, this is not the right place to have this discussion!
@cramforce Can we re-name or close this ticket? CloudFlare no longer has this issue.
Definitely. Lets move other topics to its own issue.
Hi again
Just a quick note to mention that Incapsula, another significant CDN provider, is doing about the same and inserts some (analytics) javascript in the fly, not skipping AMP pages.
Still some communication work to do it seems...
Cheers
Mind opening up a new bug? Do you have contacts there you could ping?
On Thu, Sep 15, 2016 at 2:22 AM, Weeblr notifications@github.com wrote:
Hi again
Just a quick note to mention that Incapsula, another significant CDN
provider, is doing about the same and inserts some (analytics) javascript
in the fly, not skipping AMP pages.Still some communication work to do it seems...
Cheers
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/ampproject/amphtml/issues/2380#issuecomment-247279203,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAFeT_LKG01-XtLnoiw0B4zZJnQe5l_lks5qqQ5vgaJpZM4Hnltf
.
Did that here #5051
I have no contact at all :)
Rgds
Hello all,
I'm your contact from Incapsula.
We've reviewed this thread and working on a solution
Has there been any update on this yet?
Yes, Incapsula now avoids injections in AMP pages automatically for all sites on all accounts
This was tracked on #5051.
Most helpful comment
For now, a CloudFlare customer will want to disable:
Those can be disabled using page rules, such that you can leave the features enabled for the rest of the content served.
We'd recommend leaving the following enabled for AMP: