Gatsby: Google translate does not work with gatsby sites

Created on 4 Jul 2018  ·  32Comments  ·  Source: gatsbyjs/gatsby

Description

When opening a gatsby site in google translate, it shows a flash of translated content and then shows a 404 page.

E.g.
https://translate.google.com/translate?sl=auto&tl=fr&js=y&hl=en&u=https%3A%2F%2Freactjs.org

or

https://translate.google.com/translate?sl=auto&tl=fr&js=y&hl=en&u=https%3A%2F%2Fabout.sourcegraph.com

Steps to reproduce

Open any gatsby page in google translate.

Expected result

To show the page with the translated text.

Actual result

A 404 page is shown.

Environment

Production

help wanted not stale bug

Most helpful comment

I had some free time this morning, and I looked into it. The issue is from production-app.js and the router.

Problems

navigate in production-app.js

When the page is opened on the translation service, production-app.js made a redirection to the actual webpage. https://github.com/gatsbyjs/gatsby/blob/dced9f1a02eaa266e355599816f8ee14f64614f6/packages/gatsby/cache-dir/production-app.js#L111-L124
The translation service fetches the webpage from their host so that they can manipulate the content on other pages without the CORS problem. That means the URL structure is changed from our routing rules. navigate is called from here. This is the reason for blinking.

It redirects to /translate_c or something similar URL. The URL is not valid on our website, so a 404 error happens as a result.

Hydration

The other problem happens when the components hydrate it. If you add the translation service into headers, it will hydrate the data again; however, it loads wrong data based on the wrong URL because of the reason above.

If the website doesn't have CORS headers, it will fail to hydrate the page anyway because the loader cannot load page-data.json, actually any resources from the website.

If the website has CORS headers for the translation service, it will hydrate again. However, the router using window.location and... you know, it will fail to load the proper page as we see above.

Solution (but dirty)

This is my solution for this but it is not clean and not ideal. I don't want to specify some translation services in my code so I just added some logic before hitting the navigate in production-app.js.

// gatsby-node.js

const fs = require("fs")
const path = require("path")

exports.onPreBootstrap = ({ store }) => {
  const { program } = store.getState()
  const filePath = path.join(program.directory, ".cache", "production-app.js")

  const code = fs.readFileSync(filePath, {
    encoding: `utf-8`,
  })

  const newCode = code.replace(
    `const { pagePath, location: browserLoc } = window`,
    `const { pagePath } = window
    let { location: browserLoc } = window

    if (window.parent.location !== browserLoc) {
      browserLoc = {
        pathname: pagePath
      }
    }
  `
  )

  fs.writeFileSync(filePath, newCode, `utf-8`)
}

If there is a parent frame, add some stub into browserLoc and avoid navigate calling. If you don't have CORS headers for these services, it doesn't have a problem with hydration too because the code cannot fetch page-data.json from your website.

Obviously, there are downsides to this approach because of missing hydration. Also, it will be a problem if you are using an iframe with the website.

The demo page is here:
https://translate.google.com/translate?sl=ko&tl=en&u=https%3A%2F%2Fxenodochial-swartz-f568e8.netlify.app%2F

Screen Shot 2020-04-22 at 1 00 36 pm


I'd love to see some nice solution for this issue.

All 32 comments

Huh. Google translate re-hosts the site (with translated content) in an iframe with a URL like https://translate.google.com/translate?hl=en&sl=en&tl=fr&u=about.sourcegraph.com.

Gatsby doesn't know it's being hosted on a different domain and tries to load the content for the /translate page, which doesn't exist, so it then shows the 404 page instead.

I've marked this as a bug but I'm not sure what the fix would be. Maybe Gatsby shouldn't load the 404 page if the initial SSR render _isn't_ the 404 page?

That sounds like it could be a reasonable fix. Check if a SSRed page is loaded and trust it over the client URL.

This also seems to be an issue with google's webcache, as the url ends up looking like: http://webcache.googleusercontent.com/search?q=cache:[GOOGLE_CACHE_KEY]:[YOUR_CONTENTS_ENDPOINT][GOOGLE_APPENDED_QUERY_VARIABLES]

Which then causes it to redirect to a 404 page and output to the console that A page wasn't found for "/search".

I would imagine this is going to happen for any service that acts as a sort of proxy to the content created through Gatsby.

Currently using Gatsby v1.9.273.

@m-allanson

Have you tried accessing your endpoint on google webcache? I tried it a few times with the example you posted (about.sourcegraph.com) and it seems to be working, but trying to access the same endpoint via google translate redirects to the 404 page. Whereas in my case, it redirects to 404 consistently on both.

EDIT: Now that I think about it, what I was seeing was probably the /search endpoint for that site. So please disregard the comments above and assume the behavior is consistently broken for sites without those endpoints.

Dan Abramov came up with a work around for this — https://github.com/reactjs/reactjs.org/pull/1148

To be clear my workaround is for the crash when using the Translate extension. I haven’t looked into the URL issue but we’d need to solve it too. Ideas?

@gaearon oh hmm yeah — so fixing that would mean Gatsby needs to support alt URL patterns where some other software has taken control of the URL. Seems doable.

Hi All, from Dan Twitter. https://twitter.com/dan_abramov/status/1035575858843578369

Oh, Page Not Found 😧

@KyleAMathews @gaearon

Console say A page wasn't found for "/translate_c"
Perhaps Google Translate attempt to try access /translate_c at inner JavaScript.

video

https://www.dropbox.com/s/bfy2t4kc31smc7d/google-react-gatsby.mp4?dl=0

findPage() appears to be the cause.
How to handle it😰

  • loader.js
      const page = findPage(path)

      if (!page) {
        handleResourceLoadError(path, `A page wasn't found for "${path}"`)

@ryota-murakami thanks for looking into this! The logic for finding pages is in https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/cache-dir/find-page.js

@KyleAMathews I'm afraid super slow response🙇‍♂️
I tried fix the Issue, however currently 404 page doesn't show(whiteout browser screen instead) because following change in https://github.com/johncmunson/gatsby/commit/224f6a883e62c134a600e8f707a507bfa166be14.
https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/cache-dir/loader.js#L315-L317

As far as I reed commit message that change has been have a different purpose(preload 404) but that is affecting this Issue.

I suppose that fix approach might be better if build by gatsby website loaded from