I18n-module: SEO / GoogleBot - Language Redirect's issue

Created on 10 Jun 2020  路  28Comments  路  Source: nuxt-community/i18n-module

Version

nuxt-i18n: 6.13.0-beta.0
nuxt: 2.12.2

Nuxt configuration

mode:

  • [x] universal
  • [ ] spa

Steps to reproduce

1) Enable language redirect based on browser locale
2) Set Google bot to index your translated site
3) Google bot will be redirected to the original content (creating a redirect issue and thus fetching the original language instead of the /da/ url).

What is Expected?

Should not redirect when a google bot is visiting, OR perhaps a user configurable setting that would allow the browser language redirect to only happen on the frontpage.

What is actually happening?

Google Bot is being redirected to the english translation of a page, when visitng e.g. Danish translation.

bug 馃悰

Most helpful comment

Work in progress in #799

All 28 comments

I suppose that the only possible solution for this is detecting crawlers and not redirecting then...

Yes that would be nice as well. Perhaps some default popular User-Agents to block by default, and then allow the user to define custom agents. Here is a list of popular agents: https://perishablepress.com/list-all-user-agents-top-search-engines/

However, I've heard from a SEO guy that Google also crawls websites using other User-Agents to check if website is "cheating", e.g. only showing some content to crawlers -- I'm not sure a redirect would matter in this case, but I can't say for sure.

Either way, it would also be useful to only allow the language redirect on the frontpage (e.g. when user types www.domain.com -- and not when clicking a sent link).

I'm also having trouble with this, in PWA mode for my case. I was thinking about at least pre-rendering some pages to see if that's solve the problem, but it actually seems to really be the auto redirect which is causing issues.

Here is my nuxt-i18n config:

  i18n: {
    locales: [
      {
        code: 'fr',
        iso: 'fr',
        name: 'Fran莽ais',
        file: 'fr-FR.js',
        isCatchallLocale: true
      },
      {
        code: 'en',
        iso: 'en',
        name: 'English',
        file: 'en-US.js'
      }
    ],
    lazy: true,
    langDir: 'lang/',
    defaultLocale: 'fr',
    vueI18n: {
      fallbackLocale: 'fr'
    },
    vueI18nLoader: true,
    seo: false
  },
  sitemap: {
    hostname: 'https://freatle.com',
    i18n: 'fr'
  }

SEO mode is set to false, but I override the head:

head() {
  return this.$nuxtI18nSeo()
}

This is quite critical because my main user base is supposed to be french, but Google only sees the english version of the app though my default language is set to french everywhere, even in the sitemap.

Same issue here. In the Goolge Search Console I can see that many of my pages have been excluded as duplicates

Screenshot 2020-06-17 at 12 39 24

I don't know if that is a result of the way redirect works right now or if some some configuration mistake was made.

If its a bug, I wonder what the best solution would be to fix this, since displaying the correct locale/version of the page to the Google Bot should also work for websites that use static pre-rendered pages, so detection would have to be done in JavaScript.

Either way, it would also be useful to only allow the language redirect on the frontpage (e.g. when user types www.domain.com -- and not when clicking a sent link).

That would be good option to have, but I think that would only work with the prefix strategy, because otherwise you wouldn't be able to tell if the user is visiting your base url or if he specifically wants the default language for your website.

I actually did a bit more testing using the Live URL Inspection tool of Google's search console.

This is what Google sees when visiting abc.url/de:
Screenshot 2020-06-17 at 12 49 17

So apparently Google fetches the page, executes the JavaScript and ends up with the EN version of the page. That means it will think it's a duplicate page, mark it as such an exclude it from search results.

To make sure there was no configuration error, I tested the page from a users perspective. This is the initial HTML sent from the server (from Chrome dev tools)
Screenshot 2020-06-17 at 12 49 43

And that also doesn't change, after rendering it, if the users browser is set to a DE language:

Screenshot 2020-06-17 at 13 01 25

Well, googlebot has Accept-language: en set so it will get redirected to English page, depending on your detectBrowserLocale setting. So I don't see any option here other then detecting bots and not redirecting.

When testing locally you have to make sure to delete the "i18n" cookie first to trigger the redirect (but then again, it depends on your exact settings.

I think this article might be very helpful to solve some of the SEO problems:
https://support.google.com/webmasters/answer/182192?hl=en

Well, googlebot has Accept-language: en

Actually, according to this article above, the crawler sends HTTP requests without setting Accept-Language in the request header. So the language is probably set via JavaScript's navigator.languages?

So I don't see any option here other then detecting bots and not redirecting.

I'm not sure, that would be the best way to go. Taken from another article about (mobile version) redirection, Google says:

If your website uses automatic redirection, be sure to treat all Googlebots just like any other user-agent and redirect them appropriately.

and there was another suggestion:

Avoid automatic redirection based on the user鈥檚 perceived language. These redirections could prevent users (and search engines) from viewing all the versions of your site.

So I guess the bottom line is, for SEO don't use automatic language redirect (?) Or at least not on ALL pages...

My two solutions would be to do either of the two solutions:

Solution 1

As suggested above, use the strategy: prefix option and only do the automatic redirect when the user is visiting the base url of your website.

So let's say you have:
https://domain.url/en/ and
https://domain.url/de/

If you visit the URLs above, you would get served either the EN or DE version of the page, regardless of your browsers language settings. Only if you navigate to https://domain.url/, nuxt should try to detect your browser language and then redirect/serve you one of the pages above.

I guess this solution could also work with strategy: prefix_and_default, where automatic redirects would only be done when requesting non-prefixed URLs.

Solution 2
If the locale of the current page does not match the detected user locale, you could prompt the user asking if they want to switch to the other version of your page.

I am however not sure if that would be a good solution as you would end up with the prompt showing up on thumbnail previews of your website.

Actually, according to this article above, the crawler sends HTTP requests without setting Accept-Language in the request header. So the language is probably set via JavaScript's navigator.languages?

That seems promising then because it means we might be able to handle that case in a special way.
(navigator.languages should also be unset if Accept-language isn't set, I think. But we might still be redirecting to default language in that case.)

So after some more research it seems that what I suggested as Method 1 above, would probably be the best and also easiest way to go from a SEO standpoint.

According to https://support.google.com/webmasters/answer/189077?hl=en you should:

Consider adding a fallback page for unmatched languages, especially on language/country selectors or auto-redirecting homepages. Use the the x-default value: <link rel="alternate" href="http://example.com/" hreflang="x-default" />

So that means, for ideal SEO you should use the strategy: prefix_and_default option and ONLY perform language detection when the user is visiting a non-prefixed URL, like so:

https://some.url/en/hello -> no browser language detection / no redirect
https://some.url/de/hello -> no browser language detection / no redirect
https://some.url/hello -> browser language detection / redirect

That way you don't need to detect whether the user is a bot or not, you would still automatically redirect users who visit a non-locale specific URL and everything would get indexed in Google correctly.

The only thing that needs to be done would be to add an option like disableBrowserDetectionOnPrefixedUrls or something to that effect.

There was also recently some SEO/localization related sitemap update, which works well with the prefix_and_default strategy
https://github.com/nuxt-community/sitemap-module/issues/91

@ems1985 great progress. I like option 1 except I don't like to prefix all pages. Could we still enable the base path redirect on front-page, and not on other pages?

I guess it would be possible as long as the default (non-prefixed) language is English.

But I think its not quite how Google would recommend it, since you tell Google your non-prefixed base URL is a certain Language/Locale (in the sitemap.xml and/or via meta tags), when in reality it could redirect and display a different one.

But like I said, if you really want to avoid prefix for default language, it should be possible that way too... (as long as the default language is English)

I will try to come up with a quick fix/work-around tomorrow and then maybe someone could help coming up with a permanent solution for this ( I'm not very familiar with the nuxt code yet)

But this affects a lot of projects, so I'm sure there is quite some interest to get this done.

Sounds great @ems1985 ill be happy to test it out

I referenced my very simple solution above.

It basically adds a onlyRedirectFromRoot option to the detectBrowserLanguage object. By default it is set to false, so all options unchanged, it should behave as before.

If you set

detectBrowserLanguage: {
  onlyRedirectFromRoot = true
}

the redirection will only be done when you are in the root of your domain. I am using it with the prefix_except_default strategy but it should also work with others.

I didn't write any tests or do extensive testing. I think someone more familiar with nuxt-i18n should make those changes, since I very likely didn't think of (or even know) all possible configurations.

I also think that this should probably not be treated as a bug but rather as a new feature.

There is a relevant discussion in #603

@rchl true, seems pretty related

@rchl @ems1985 I have the same question, is there a new update for this feature?

I also encountered this problem, do you plan to fix it?

Work in progress in #799

I wanted to address both this and #455 with the same fix (work in progress at #799) but now I'm having seconds thoughts about it and I think that the current solution (where I'm only redirecting from the root path) wouldn't be right.

It wouldn't be ideal for this issue as we would still redirect on the root path so crawlers wouldn't be able to access that one if default locale is other than English (I'm assuming crawlers report English, if anything).

And it wouldn't be ideal for #455 as we should redirect from any path as long as it matches the default locale.

So I'm thinking that those two issues should be handled separately. For this one, maybe the solution is to not redirect if the user agent doesn't provide Accept-Language information. I just need to be sure what headers are crawlers actually sending. There are some conflicting information about that on the Internet.

@rchl let me try to clarify a little bit.

It wouldn't be ideal for this issue as we would still redirect on the root path so crawlers wouldn't be able to access that one if default locale is other than English (I'm assuming crawlers report English, if anything).

Search engine bots fetches robots.txt, after that it takes in account if web-site is allowed to be indexed/parsed.

Robots.txt at the same time allows (and all SE suggests) defining sitemap.xml where web-site owners should list all urls that web-site has or should be parsed.

Nuxt has a great sitemap module with i18n support where it lists all urls with correct canonical urls (so none of urls wouldn't be marked as duplicates as well as indexed!)

See https://github.com/nuxt-community/sitemap-module/issues/91

Robots.txt -> Sitemap.xml -> List of urls.

I think that the current solution (where I'm only redirecting from the root path) wouldn't be right.

That's a correct solution for prefixed strategies - redirecting only from the root path.

For this one, maybe the solution is to not redirect if the user agent doesn't provide Accept-Language information. I just need to be sure what headers are crawlers actually sending.

You shouldn't care about crawlers at all as they're might behave differently and just might be testing different scenarios.

--
Crawler/User:
Visit root path -> redirect to User agent language (if none redirect to default)
Visits /fr page - language should be changed to French so it might parse and index without redirection to it's Accept-language.

That's it.

--

Thank you for PR! I'm ready to test it, any ETA for alpha/beta release?

Thanks!

Search engine bots fetches robots.txt...

I think the part about robots.txt and sitemaps is not relevant to the issue I'm thinking about.

I think that the current solution (where I'm only redirecting from the root path) wouldn't be right.

That's a correct solution for prefixed strategies - redirecting only from the root path.

Also for this case?

  • prefix_except_default (default locale is es)
{ name: 'index___es', url: '/' },
{ name: 'index___en', url: '/en/' },
{ name: 'index___fr', url: '/fr/' }

In this case, if the crawler would report en locale then it wouldn't be able to access the / path as it would be redirected to `/en/.

I think the part about robots.txt and sitemaps is not relevant to the issue I'm thinking about.

You've asked how crawlers could detect other links and I've simply tried to answer it. If you still think that isn't related then sorry and accept my apologies.

Also for this case?

  • prefix_except_default (default locale is es)
{ name: 'index___es', url: '/' },
{ name: 'index___en', url: '/en/' },
{ name: 'index___fr', url: '/fr/' }

In this case, if the crawler would report en locale then it wouldn't be able to access the / path as it would be redirected to `/en/.

This issue started as SEO improvement which everyone needs and the main problem was when visiting /fr page was redirected to / or /en based on crawler locale, so now it's being fixed with your recent PR.

However, you're asking what to do with the root page?

On prefix_except_default root path user (or crawler*) asking for default locale and page shouldn't do redirection.

My suggestion would just add another option or just disable redirection on root path for that strategy.

There is no possible way to solve this on prefix_except_default root path only, similar to what you were asking about no prefixes strategy.

* Every search engine always say that crawlers should be treated as normal users, doing something different based on user-agents or accepted locale wouldn't make it better.

You've asked how crawlers could detect other links and I've simply tried to answer it. If you still think that isn't related then sorry and accept my apologies.

I've just said that, in that particular case, if I'm gonna redirect from the root path, the crawler will not be able to crawl that one particular page as it will be redirected to /en/ (that is assuming that crawler's Accept-language header has English).

On prefix_except_default root path user (or crawler*) asking for default locale and page shouldn't do redirection.

My suggestion would just add another option or just disable redirection on root path for that strategy.

I'll have to think about that...
That would basically mean that detectBrowserLanguage is completely ineffective for that strategy which again, would be quite a major, breaking change.

And you know what opinion I have about adding more options...

  • Every search engine always say that crawlers should be treated as normal users, doing something different based on user-agents or accepted locale wouldn't make it better.

I don't want to do that. I'm investigating a behavior change that would apply to both crawlers and users but might fix crawlers. Namely, if there is no Accept-Language header provided then don't redirect to the default locale like happens now. Instead, just load the requested locale. Normal users are not likely to have Accept-Language header missing and crawlers potentially don't send it so it could work.

would be quite a major, breaking change.

@rchl well, from what I can see there is already a breaking change with module renaming #800

And you know what opinion I have about adding more options...

If adding options isn't the way, then this should be included by default in my opinion (let me clarify this is should be default behavior at first).

i18n module is cool, however it's basically great only for user who will change language by clicking somewhere and not locale that could be changed by only visiting any other urls.

If it will be enabled by default it will speedup and boost localized sites and ranked in search engines by their locale (this is what I'm trying to achieve with nuxt and any other framework/libraries does this).

Forgot to add, under features this module has "Search Engine Optimization" so it wouldn't break up anything with new module renaming at least 馃槈

crawlers potentially don't send it

This is a bit risky, they do send sometimes and there is no guarantees for them to change in any time.

Thanks!

This is a bit risky, they do send sometimes and there is no guarantees for them to change in any time.

I don't see how it would be risky though. It would be a consistent behavior (for both normal users and crawlers) when Accept-Language is not provided.

Hi guys,

i finally found this thread that could help with this problem: with nuxt-i18n I set two languages, Italian (default) and English.
But in SERP when searching the site the snippets are in english (sometimes even mixed, some result in italian, and some in english). Inspecting the code of my pages and the meta created it seems everything ok. Have you solved this? What can I do?

In fact, as stated before, testing the pages with Search Console it gets redirected to the EN version by default

Hi guys,

i finally found this thread that could help with this problem: with nuxt-i18n I set two languages, Italian (default) and English.
But in SERP when searching the site the snippets are in english (sometimes even mixed, some result in italian, and some in english). Inspecting the code of my pages and the meta created it seems everything ok. Have you solved this? What can I do?

In fact, as stated before, testing the pages with Search Console it gets redirected to the EN version by default

@rchl working on integrating this https://github.com/nuxt-community/i18n-module/pull/799

Fix released in v6.15.0

Was this page helpful?
0 / 5 - 0 ratings

Related issues

varna picture varna  路  14Comments

adetbekov picture adetbekov  路  21Comments

manniL picture manniL  路  24Comments

koteezy picture koteezy  路  16Comments

vodnicearv picture vodnicearv  路  20Comments