I18n-module: SEO / GoogleBot - Language Redirect's issue

Created on 10 Jun 2020 · 28Comments · Source: nuxt-community/i18n-module

Version

nuxt-i18n: 6.13.0-beta.0
nuxt: 2.12.2

Nuxt configuration

mode:

[x] universal
[ ] spa

Steps to reproduce

1) Enable language redirect based on browser locale
2) Set Google bot to index your translated site
3) Google bot will be redirected to the original content (creating a redirect issue and thus fetching the original language instead of the /da/ url).

What is Expected?

Should not redirect when a google bot is visiting, OR perhaps a user configurable setting that would allow the browser language redirect to only happen on the frontpage.

What is actually happening?

Google Bot is being redirected to the english translation of a page, when visitng e.g. Danish translation.

bug 🐛

Source

simplenotezy

👍4

Most helpful comment

Work in progress in #799

rchl on 20 Jul 2020

❤3 👍2

All 28 comments

I suppose that the only possible solution for this is detecting crawlers and not redirecting then...

rchl on 10 Jun 2020

Yes that would be nice as well. Perhaps some default popular User-Agents to block by default, and then allow the user to define custom agents. Here is a list of popular agents: https://perishablepress.com/list-all-user-agents-top-search-engines/

However, I've heard from a SEO guy that Google also crawls websites using other User-Agents to check if website is "cheating", e.g. only showing some content to crawlers -- I'm not sure a redirect would matter in this case, but I can't say for sure.

Either way, it would also be useful to only allow the language redirect on the frontpage (e.g. when user types www.domain.com -- and not when clicking a sent link).

simplenotezy on 10 Jun 2020

I'm also having trouble with this, in PWA mode for my case. I was thinking about at least pre-rendering some pages to see if that's solve the problem, but it actually seems to really be the auto redirect which is causing issues.

Here is my nuxt-i18n config:

  i18n: {
    locales: [
      {
        code: 'fr',
        iso: 'fr',
        name: 'Français',
        file: 'fr-FR.js',
        isCatchallLocale: true
      },
      {
        code: 'en',
        iso: 'en',
        name: 'English',
        file: 'en-US.js'
      }
    ],
    lazy: true,
    langDir: 'lang/',
    defaultLocale: 'fr',
    vueI18n: {
      fallbackLocale: 'fr'
    },
    vueI18nLoader: true,
    seo: false
  },
  sitemap: {
    hostname: 'https://freatle.com',
    i18n: 'fr'
  }

SEO mode is set to false, but I override the head:

head() {
  return this.$nuxtI18nSeo()
}

This is quite critical because my main user base is supposed to be french, but Google only sees the english version of the app though my default language is set to french everywhere, even in the sitemap.

mmorainville on 13 Jun 2020

Same issue here. In the Goolge Search Console I can see that many of my pages have been excluded as duplicates

Screenshot 2020-06-17 at 12 39 24

I don't know if that is a result of the way redirect works right now or if some some configuration mistake was made.

If its a bug, I wonder what the best solution would be to fix this, since displaying the correct locale/version of the page to the Google Bot should also work for websites that use static pre-rendered pages, so detection would have to be done in JavaScript.

Either way, it would also be useful to only allow the language redirect on the frontpage (e.g. when user types www.domain.com -- and not when clicking a sent link).

That would be good option to have, but I think that would only work with the prefix strategy, because otherwise you wouldn't be able to tell if the user is visiting your base url or if he specifically wants the default language for your website.

ems1985 on 17 Jun 2020

I actually did a bit more testing using the Live URL Inspection tool of Google's search console.

This is what Google sees when visiting abc.url/de:
Screenshot 2020-06-17 at 12 49 17

So apparently Google fetches the page, executes the JavaScript and ends up with the EN version of the page. That means it will think it's a duplicate page, mark it as such an exclude it from search results.

To make sure there was no configuration error, I tested the page from a users perspective. This is the initial HTML sent from the server (from Chrome dev tools)
Screenshot 2020-06-17 at 12 49 43

And that also doesn't change, after rendering it, if the users browser is set to a DE language:

Screenshot 2020-06-17 at 13 01 25

ems1985 on 17 Jun 2020

Well, googlebot has Accept-language: en set so it will get redirected to English page, depending on your detectBrowserLocale setting. So I don't see any option here other then detecting bots and not redirecting.

When testing locally you have to make sure to delete the "i18n" cookie first to trigger the redirect (but then again, it depends on your exact settings.

rchl on 17 Jun 2020

I think this article might be very helpful to solve some of the SEO problems:
https://support.google.com/webmasters/answer/182192?hl=en

Well, googlebot has Accept-language: en

Actually, according to this article above, the crawler sends HTTP requests without setting Accept-Language in the request header. So the language is probably set via JavaScript's navigator.languages?

So I don't see any option here other then detecting bots and not redirecting.

I'm not sure, that would be the best way to go. Taken from another article about (mobile version) redirection, Google says:

If your website uses automatic redirection, be sure to treat all Googlebots just like any other user-agent and redirect them appropriately.

and there was another suggestion:

Avoid automatic redirection based on the user’s perceived language. These redirections could prevent users (and search engines) from viewing all the versions of your site.

So I guess the bottom line is, for SEO don't use automatic language redirect (?) Or at least not on ALL pages...

My two solutions would be to do either of the two solutions:

Solution 1

As suggested above, use the strategy: prefix option and only do the automatic redirect when the user is visiting the base url of your website.

So let's say you have:
https://domain.url/en/ and
https://domain.url/de/

If you visit the URLs above, you would get served either the EN or DE version of the page, regardless of your browsers language settings. Only if you navigate to https://domain.url/, nuxt should try to detect your browser language and then redirect/serve you one of the pages above.

I guess this solution could also work with strategy: prefix_and_default, where automatic redirects would only be done when requesting non-prefixed URLs.

Solution 2
If the locale of the current page does not match the detected user locale, you could prompt the user asking if they want to switch to the other version of your page.

I am however not sure if that would be a good solution as you would end up with the prompt showing up on thumbnail previews of your website.

ems1985 on 17 Jun 2020

👍2

Actually, according to this article above, the crawler sends HTTP requests without setting Accept-Language in the request header. So the language is probably set via JavaScript's navigator.languages?

That seems promising then because it means we might be able to handle that case in a special way.
(navigator.languages should also be unset if Accept-language isn't set, I think. But we might still be redirecting to default language in that case.)

rchl on 17 Jun 2020

So after some more research it seems that what I suggested as Method 1 above, would probably be the best and also easiest way to go from a SEO standpoint.

According to https://support.google.com/webmasters/answer/189077?hl=en you should:

Consider adding a fallback page for unmatched languages, especially on language/country selectors or auto-redirecting homepages. Use the the x-default value: <link rel="alternate" href="http://example.com/" hreflang="x-default" />

So that means, for ideal SEO you should use the strategy: prefix_and_default option and ONLY perform language detection when the user is visiting a non-prefixed URL, like so:

https://some.url/en/hello -> no browser language detection / no redirect
https://some.url/de/hello -> no browser language detection / no redirect
https://some.url/hello -> browser language detection / redirect

That way you don't need to detect whether the user is a bot or not, you would still automatically redirect users who visit a non-locale specific URL and everything would get indexed in Google correctly.

The only thing that needs to be done would be to add an option like disableBrowserDetectionOnPrefixedUrls or something to that effect.

There was also recently some SEO/localization related sitemap update, which works well with the prefix_and_default strategy
https://github.com/nuxt-community/sitemap-module/issues/91

ems1985 on 17 Jun 2020

👍3

@ems1985 great progress. I like option 1 except I don't like to prefix all pages. Could we still enable the base path redirect on front-page, and not on other pages?

simplenotezy on 17 Jun 2020

I guess it would be possible as long as the default (non-prefixed) language is English.

But I think its not quite how Google would recommend it, since you tell Google your non-prefixed base URL is a certain Language/Locale (in the sitemap.xml and/or via meta tags), when in reality it could redirect and display a different one.

But like I said, if you really want to avoid prefix for default language, it should be possible that way too... (as long as the default language is English)

I will try to come up with a quick fix/work-around tomorrow and then maybe someone could help coming up with a permanent solution for this ( I'm not very familiar with the nuxt code yet)

But this affects a lot of projects, so I'm sure there is quite some interest to get this done.

ems1985 on 17 Jun 2020

Sounds great @ems1985 ill be happy to test it out

simplenotezy on 17 Jun 2020

I referenced my very simple solution above.

It basically adds a onlyRedirectFromRoot option to the detectBrowserLanguage object. By default it is set to false, so all options unchanged, it should behave as before.

If you set

detectBrowserLanguage: {
  onlyRedirectFromRoot = true
}

the redirection will only be done when you are in the root of your domain. I am using it with the prefix_except_default strategy but it should also work with others.

I didn't write any tests or do extensive testing. I think someone more familiar with nuxt-i18n should make those changes, since I very likely didn't think of (or even know) all possible configurations.

I also think that this should probably not be treated as a bug but rather as a new feature.

ems1985 on 18 Jun 2020

There is a relevant discussion in #603

rchl on 18 Jun 2020

@rchl true, seems pretty related

simplenotezy on 18 Jun 2020

@rchl @ems1985 I have the same question, is there a new update for this feature?

elliottssu on 4 Jul 2020

👍3

I also encountered this problem, do you plan to fix it?

TheLetslook on 20 Jul 2020

Work in progress in #799

rchl on 20 Jul 2020

❤3 👍2

I wanted to address both this and #455 with the same fix (work in progress at #799) but now I'm having seconds thoughts about it and I think that the current solution (where I'm only redirecting from the root path) wouldn't be right.

It wouldn't be ideal for this issue as we would still redirect on the root path so crawlers wouldn't be able to access that one if default locale is other than English (I'm assuming crawlers report English, if anything).

And it wouldn't be ideal for #455 as we should redirect from any path as long as it matches the default locale.

So I'm thinking that those two issues should be handled separately. For this one, maybe the solution is to not redirect if the user agent doesn't provide Accept-Language information. I just need to be sure what headers are crawlers actually sending. There are some conflicting information about that on the Internet.

rchl on 21 Jul 2020

@rchl let me try to clarify a little bit.

It wouldn't be ideal for this issue as we would still redirect on the root path so crawlers wouldn't be able to access that one if default locale is other than English (I'm assuming crawlers report English, if anything).

Search engine bots fetches robots.txt, after that it takes in account if web-site is allowed to be indexed/parsed.

Robots.txt at the same time allows (and all SE suggests) defining sitemap.xml where web-site owners should list all urls that web-site has or should be parsed.

Nuxt has a great sitemap module with i18n support where it lists all urls with correct canonical urls (so none of urls wouldn't be marked as duplicates as well as indexed!)

See https://github.com/nuxt-community/sitemap-module/issues/91

Robots.txt -> Sitemap.xml -> List of urls.

I think that the current solution (where I'm only redirecting from the root path) wouldn't be right.

That's a correct solution for prefixed strategies - redirecting only from the root path.

For this one, maybe the solution is to not redirect if the user agent doesn't provide Accept-Language information. I just need to be sure what headers are crawlers actually sending.

You shouldn't care about crawlers at all as they're might behave differently and just might be testing different scenarios.

--
Crawler/User:
Visit root path -> redirect to User agent language (if none redirect to default)
Visits /fr page - language should be changed to French so it might parse and index without redirection to it's Accept-language.

That's it.

Thank you for PR! I'm ready to test it, any ETA for alpha/beta release?

Thanks!

divine on 21 Jul 2020

Search engine bots fetches robots.txt...

I think the part about robots.txt and sitemaps is not relevant to the issue I'm thinking about.

I think that the current solution (where I'm only redirecting from the root path) wouldn't be right.

That's a correct solution for prefixed strategies - redirecting only from the root path.

Also for this case?

prefix_except_default (default locale is es)

{ name: 'index___es', url: '/' },
{ name: 'index___en', url: '/en/' },
{ name: 'index___fr', url: '/fr/' }

In this case, if the crawler would report en locale then it wouldn't be able to access the / path as it would be redirected to `/en/.

rchl on 21 Jul 2020

I think the part about robots.txt and sitemaps is not relevant to the issue I'm thinking about.

You've asked how crawlers could detect other links and I've simply tried to answer it. If you still think that isn't related then sorry and accept my apologies.

Also for this case?

prefix_except_default (default locale is es)
{ name: 'index___es', url: '/' },
{ name: 'index___en', url: '/en/' },
{ name: 'index___fr', url: '/fr/' }
In this case, if the crawler would report en locale then it wouldn't be able to access the / path as it would be redirected to `/en/.

This issue started as SEO improvement which everyone needs and the main problem was when visiting /fr page was redirected to / or /en based on crawler locale, so now it's being fixed with your recent PR.

However, you're asking what to do with the root page?

On prefix_except_default root path user (or crawler*) asking for default locale and page shouldn't do redirection.

My suggestion would just add another option or just disable redirection on root path for that strategy.

There is no possible way to solve this on prefix_except_default root path only, similar to what you were asking about no prefixes strategy.

* Every search engine always say that crawlers should be treated as normal users, doing something different based on user-agents or accepted locale wouldn't make it better.

divine on 22 Jul 2020

You've asked how crawlers could detect other links and I've simply tried to answer it. If you still think that isn't related then sorry and accept my apologies.

I've just said that, in that particular case, if I'm gonna redirect from the root path, the crawler will not be able to crawl that one particular page as it will be redirected to /en/ (that is assuming that crawler's Accept-language header has English).

On prefix_except_default root path user (or crawler*) asking for default locale and page shouldn't do redirection.

My suggestion would just add another option or just disable redirection on root path for that strategy.

I'll have to think about that...
That would basically mean that detectBrowserLanguage is completely ineffective for that strategy which again, would be quite a major, breaking change.

And you know what opinion I have about adding more options...

Every search engine always say that crawlers should be treated as normal users, doing something different based on user-agents or accepted locale wouldn't make it better.

I don't want to do that. I'm investigating a behavior change that would apply to both crawlers and users but might fix crawlers. Namely, if there is no Accept-Language header provided then don't redirect to the default locale like happens now. Instead, just load the requested locale. Normal users are not likely to have Accept-Language header missing and crawlers potentially don't send it so it could work.

rchl on 22 Jul 2020

would be quite a major, breaking change.

@rchl well, from what I can see there is already a breaking change with module renaming #800

And you know what opinion I have about adding more options...

If adding options isn't the way, then this should be included by default in my opinion (let me clarify this is should be default behavior at first).

i18n module is cool, however it's basically great only for user who will change language by clicking somewhere and not locale that could be changed by only visiting any other urls.

If it will be enabled by default it will speedup and boost localized sites and ranked in search engines by their locale (this is what I'm trying to achieve with nuxt and any other framework/libraries does this).

Forgot to add, under features this module has "Search Engine Optimization" so it wouldn't break up anything with new module renaming at least 😉

crawlers potentially don't send it

This is a bit risky, they do send sometimes and there is no guarantees for them to change in any time.

Thanks!

divine on 23 Jul 2020

This is a bit risky, they do send sometimes and there is no guarantees for them to change in any time.

I don't see how it would be risky though. It would be a consistent behavior (for both normal users and crawlers) when Accept-Language is not provided.

rchl on 23 Jul 2020

Hi guys,

i finally found this thread that could help with this problem: with nuxt-i18n I set two languages, Italian (default) and English.
But in SERP when searching the site the snippets are in english (sometimes even mixed, some result in italian, and some in english). Inspecting the code of my pages and the meta created it seems everything ok. Have you solved this? What can I do?

In fact, as stated before, testing the pages with Search Console it gets redirected to the EN version by default

sintj on 18 Aug 2020

Hi guys,

i finally found this thread that could help with this problem: with nuxt-i18n I set two languages, Italian (default) and English.
But in SERP when searching the site the snippets are in english (sometimes even mixed, some result in italian, and some in english). Inspecting the code of my pages and the meta created it seems everything ok. Have you solved this? What can I do?

In fact, as stated before, testing the pages with Search Console it gets redirected to the EN version by default

@rchl working on integrating this https://github.com/nuxt-community/i18n-module/pull/799

divine on 18 Aug 2020

👍2

Fix released in v6.15.0

rchl on 10 Sep 2020

🎉2 👍2

Was this page helpful?

0 / 5 - 0 ratings