I'm on mobile so I can't put too much info, but my search results on Google are all ?no-cache=1 I don't know why it's so messed up. I now have a lot of duplicate content indexed and I hope I won't get blocked or similar.
What is going on?
Thank you
Describe the issue that you're seeing.
Clear steps describing how to reproduce the issue. Please please please link to a demo project if possible, this makes your issue _much_ easier to diagnose (seriously).
What should happen?
What happened.
Run gatsby info --clipboard in your project directory and paste the output here. Not working? You may need to update your global gatsby-cli - npm install -g gatsby-cli
Anything that I can do to fix it quickly before I lose my Google ranking?
I can see from Google web console that this is happening
Only happens if you come from Google on a mobile. Works fine on desktop
I think google somehow indexed most of my pages with ?no-cache and they started to replace my pages without ?no-cache because these rank better. If I'm on mobile and I select "Desktop site" I receive results without ?no-cache in google.
@hackhat There is a way to exclude no-cache in Google Webmaster tools: https://support.google.com/webmasters/answer/6080550?hl=en&authuser=1&ref_topic=6080547
Also implementing canonical urls should help too.
I've seen some mutterings within this repo around no-cache, but was wondering if there was a more official place I could go to find out the main purpose of it. Since this URL param is inherently seen as a duplicate URL/content and would need to be specifically removed from every single Gatsby website in Google's eyes through webmaster tools; is there something we can do to prevent the use of this in general?
I may not be understanding the core concept; and the parameter itself has caused a bit of a headache for us internally as of late.
cc @KyleAMathews
It's a work around to get gatsby-plugin-offline working. I agree it's not a great solution. @davidbailey00 could you look into other options for checking if a page has been seen?
@KyleAMathews Appreciate the response; we're not using that plugin currently, but sounds like it's just kinda baked-in there in case it is. Do you happen to know of any workarounds to disable that entirely, or is that "too core" to touch at the moment until an alternative solution is figured out?
It's in core right now
Gotcha; while I'm not as versed into that part of the app in particular, if there is any headway on testing out a different solution; I'll be ready to give things a whirl on my end!
@hackhat we're sorry to hear this! Is there any way you could let us know if you're able to reproduce this? I just tried with gatsbyjs.org and wasn't able to replicate. I also checked out Google Analytics logs, and didn't really see any issues, definitely not ?no-cache=1 on _all_ of our logs!
Could be an outdated gatsby-plugin-offline plugin issue, perhaps?
Hi @hackhat @zslabs, I'm the one responsible for the ?no-cache=1 stuff so just to let you know, I'll be checking this out ASAP.
The purpose of this query is to prevent service workers gobbling up pages on your site which aren't generated by Gatsby - e.g. if I visit Netlify CMS /admin/ with a service worker installed, Gatsby will show a 404 because the SW returns a basic offline shell for all pages, and Gatsby can't find a page called /admin/ which it generated (even though it exists on the server, it wasn't created by Gatsby).
So to work around this, we check the HTTP status code for that page if Gatsby can't find it, and then unless it's a 404, we redirect there appending ?no-cache=1 to prevent the SW handling it again. (If it's a 404 then we just show the Gatsby 404 page as usual.)
However, what's happening here is that Gatsby isn't loading the resources properly for a page, and therefore thinking that it's a page not generated by Gatsby, just like Netlify CMS. So if we were to look in the Googlebot console we'd probably see an error saying "Found ?no-cache=1 while attempting to load page directly" - this is logged whenever Gatsby detects the just problem described, and should never be logged on a working app.
Until now I thought we'd ironed out all edge cases and prevented this from happening, but from the looks of things there are still some cases where it might occur - which might be difficult to debug if it's only problematic with Googlebot. Hopefully it'll also be problematic in some desktop browsers so we can actually tell what's going wrong rather than just guessing - last time I checked in Chrome there were no issues, and we're currently working on adding tests to prevent this sort of thing.
馃憢 @davidbailey00 Thanks for jumping in and the rundown of how that feature works!
Googlebot certainly doesn't make it easy, does it?! We also ran into some weirdness with it's mobile-bot not being able to read our pages because we were using the CSS vh unit for certain hero images https://stackoverflow.com/questions/38103636/fetch-as-google-googlebot-desktop-not-rendering-page-correctly 馃憟 That was a fun one.
You're correct though; desktop seems to be working pretty consistently, but I'm wondering if there's something going on with mobile that we just can't see. We're hosting the site on Netlify and not really doing anything out of the ordinary there, but I've included our gatsby info if that helps shed some light on how things are put together on our end.
System:
OS: macOS 10.14
CPU: x64 Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Shell: 2.7.1 - /usr/local/bin/fish
Binaries:
Node: 10.12.0 - /usr/local/bin/node
Yarn: 1.12.1 - ~/.yarn/bin/yarn
npm: 6.4.1 - /usr/local/bin/npm
Browsers:
Chrome: 70.0.3538.77
Firefox: 62.0.3
Safari: 12.0
npmPackages:
gatsby: 2.0.34 => 2.0.34
gatsby-image: 2.0.18 => 2.0.18
gatsby-plugin-canonical-urls: 2.0.7 => 2.0.7
gatsby-plugin-catch-links: 2.0.6 => 2.0.6
gatsby-plugin-feed: 2.0.9 => 2.0.9
gatsby-plugin-google-tagmanager: 2.0.6 => 2.0.6
gatsby-plugin-manifest: 2.0.7 => 2.0.7
gatsby-plugin-netlify: 2.0.3 => 2.0.3
gatsby-plugin-netlify-cms: 3.0.5 => 3.0.5
gatsby-plugin-node-fields: 1.0.0 => 1.0.0
gatsby-plugin-react-helmet: 3.0.1 => 3.0.1
gatsby-plugin-sass: 2.0.2 => 2.0.2
gatsby-plugin-sharp: 2.0.10 => 2.0.10
gatsby-plugin-sitemap: 2.0.2 => 2.0.2
gatsby-plugin-twitter: 2.0.7 => 2.0.7
gatsby-remark-autolink-headers: 2.0.9 => 2.0.9
gatsby-remark-copy-linked-files: 2.0.6 => 2.0.6
gatsby-remark-custom-blocks: 2.0.1 => 2.0.1
gatsby-remark-embed-video: 1.4.0 => 1.4.0
gatsby-remark-images: 2.0.5 => 2.0.5
gatsby-remark-prismjs: 3.0.3 => 3.0.3
gatsby-remark-relative-images-v2: 0.1.5 => 0.1.5
gatsby-remark-relative-links: 0.0.1 => 0.0.1
gatsby-remark-responsive-iframe: 2.0.6 => 2.0.6
gatsby-source-filesystem: 2.0.6 => 2.0.6
gatsby-transformer-remark: 2.1.11 => 2.1.11
gatsby-transformer-sharp: 2.1.7 => 2.1.7
gatsby-transformer-yaml: 2.1.4 => 2.1.4
npmGlobalPackages:
gatsby-cli: 2.4.3
Would you be able to expand on what "resources" could cause a page to fail and show that cache? I've seen some oddities where (even on desktop) there's some type of decoding issue with the JSON payload that is "the content" of the page (/static/d/244/path---privacy-06-e-82f-UbMjLGBJ3CMBK5a0x1kvn6Bq8c.json), although the page itself shows up fine. When visiting the file; it says it can't be loaded, but automatically refreshes and shows the correct content -- which is pretty weird and extremely inconsistent. And all this while never showing the no-cache param. Not sure if that's useful. We're using the yaml transformer for a lot of our static content coming from Netlify; aside from remark for the blog posts, tutorials, etc.
Thanks again for digging into this!
@davidbailey00 If you add your website to analytics.moz.com campaign it will also be redirected to no-cache, so frustrating. I'm hosting on cloudflare + s3 and again nothing fancy here as well.
@DSchau and I have decided to remove the ?no-cache=1 URL parameter entirely, in favour of better docs which explain to users how to blacklist non-Gatsby pages. Hopefully this will be available by next week, sorry for all the problems this has caused
@davidbailey00 excellent decision, it seemed to be a bit tricky and with many edge cases. Thank you very much for your support.
馃憢 @davidbailey00 Just wanted to check-in to see if there was anything I could help test. Thanks again for looking into this!
@zslabs is there more info on your end about this issue? We actually solved this issue previously as there was an issue in the source code, i.e. arrow functions weren't being transpiled in Chrome 41 (what Googlebot uses). You shouldn't see _many_ of the no-cache hits in your normal flow!
@DSchau Oh, interesting! Did you have the commit for that handy? I've been holding off on upgrading a few deps until this was resolved, so wanted to see where we were at currently.
@zslabs it was not an issue on _our_ end, but rather with a node_modules lib making its way into the browser untranspiled!
To clarify, is this at all related to any Gremlin stuff?
Partly - our SEO guy mentioned he was still seeing these sporadically after that other issue was fixed, so I was hoping a fresh-look at the no-cache feature might shed some light on a different (or better) way to handle those.
Yeah - we've seen it very infrequently on our end on gatsbyjs.org, as well. I believe @davidbailey00 is working on a PR that will avoid the no-cache stuff, so we'll just keep you in the loop on that? Should be something to see very soon here!
Sure thing, sounds great! Thanks as always.
Not a problem :) Thanks for using Gatsby! We're excited to get this fixed and all the issues ironed out :)
I believe @davidbailey00 is working on a PR that will avoid the no-cache stuff, so we'll just keep you in the loop on that? Should be something to see very soon here!
Yeah that's right, I'm working on the new approach right now actually - hopefully it should be ready later today or tomorrow!
Most helpful comment
@DSchau and I have decided to remove the
?no-cache=1URL parameter entirely, in favour of better docs which explain to users how to blacklist non-Gatsby pages. Hopefully this will be available by next week, sorry for all the problems this has caused