Hugo: Add multilingual multihost support

Created on 30 Oct 2017  Â·  23Comments  Â·  Source: gohugoio/hugo

The current multilingual support in Hugo is restricted to 1 baseURL, i.e. the languages are put into subfolders named after the language code (the default language may be kept on the top level), i.e. https://example.com/en etc.

This works great and is possibly the most common use case.

This does, however, not allow using the baseURL to differentiate the languages, i.e. https://en.example.com, https://jp.example.com and similar variations.

Core Changes

This issue describes a way to define a baseURL per language. The new rule will be:

If a baseURL is set on the language level, then all languages must have one and they must all be different.

Example:

[languages]
[languages.no]
baseURL = "https://example.no"
languageName = "Norsk"
weight = 1
title = "PÃ¥ norsk"

[languages.en]
baseURL = "https://example.com"
languageName = "English"
weight = 2
title = "In English"

With the above, the two sites will be generated into public with their own root:

public
├── en
└── no

The important part here is that:

All URLs (i.e .Permalink etc.) will be generated from that root. So the English home page above will have its .Permalink set to https://example.com/.

Hugo Server

The changes above will work for the regular hugo; the two sites will be ready to be configured as two virtual hosts in Nginx or similar.

But you'd want to test it before you go live, of course. So we need to adapt hugo server to also handle multiple base URLs.

This issue suggests that we start as many HTTP servers as there are languages. Which, in its default port settings will just increment from port 1313 and get:

http://localhost:1313/
http://localhost:1314/
...

Then you can navigate the sites (i.e. jump from article to its translations etc.) just as it was deployed live on your production environment.

Multiple content and static dirs

Also see #4073 and #3757

With the new deployment topology this new feature creates, you often end up wanting better control of your content and your static files.

We will improve on this on two levels:

  1. staticDir can now be a slice of strings.
  2. Each language can have its own staticDir settings.
  3. Additional staticDir entries can be added by adding a ID from 1..10 to the key, e.g. staticDir1. This can be useful if you want to keep the global static dirs settings, but have one or more additional directories for a specific language.

All the static directories will create a union filesystem from left to right:

theme static dir, global config static dirs, language static dirs.

Example:

staticDir = ["static1", "static2"]
[languages]
[languages.no]
staticDir = ["staticDir_override", "static_no"]
baseURL = "https://example.no"
languageName = "Norsk"
weight = 1
title = "PÃ¥ norsk"

[languages.en]
staticDir2 = "static_en"
baseURL = "https://example.com"
languageName = "English"
weight = 2
title = "In English"

In the above, with no theme used:

  • the English site will get its static files as a union of "static1", "static2" and "static_en". On file duplicates, the right-most version will win.
  • the Norwegian site will get its static files as a union of "staticDir_override" and "static_no".
Enhancement

All 23 comments

Thank you for organizing this @bep.

An aside: the .co.jp domain is an interesting one, because the authorities in Japan regulate it so that one corporate entity in Japan can get only one .co.jp domain. (There was a big problem at the beginning of the Internet here, with people "squatting" on .co.jp domains.) Sometimes after a Japan entity gets a .co.jp, they put other languages under say .com or .no etc, whatever is logical for that language.

I can think of these things:

  • the idea that it will start many HTTP servers with a different port per language is great. Hugo server would need to increment up from a CLI-specified port, given multiple baseURLs, as well.
  • Japanese registrars offer kanji domain names now (technically, IDN's), like 日本語.jp etc. It might be an edge case, but does hugo allow a baseURL that is non-ASCII? Reference: https://unicode.org/faq/idn.html

but does hugo allow a baseURL that is non-ASCII?

Yes, Hugo is all UTF-8 and I'm pretty sure what we do no "normalization" of the baseURL part.

As to hugo server there are some existing logic that says that we use localhost by default if --baseURL is not set in CLI. We probably need something like this here as well. You would normally not use a "real domain" in test, but I use some TypeKit fonts in my sites, so I need a local domain (defined in my hosts file) to get the validation to pass. But let me think about that when I get to it.

thanks @bep. I think I confused things. I meant to say, if the _port_ is set on the CLI, then hugo would need to increment up from that.

 hugo ... -p 1317 ...

... would get you 1317, 1318 etc, if you have multiple baseURLs specified in the config.

Regarding typekit needing a local domain, I did not know that, and just specified "localhost" and "127.0.0.1" in the typekit "kit editor" setup, for my sites that use typekit. It seems to work...

just specified "localhost" and "127.0.0.1" in the typekit "kit editor" setup, for my sites that use typekit. It seems to work...

Yes, that works too, I guess, but having a "secret domain" prevents others from using my subscription. Not a big thing.

And yes; I agree about the incremental port thing.

ah, never thought of that. Uh oh!

Just to complete that thought. I have something like this in /etc/hosts:

127.0.0.1   somename.local

And then I do hugo server --baseUrl=http://somename.local.

Which I will make sure works also in a multihost setup (http://somename.local:1313, http://somename.local:1314 etc.).

@RickCogley I'm back from Spain and about to wrap my head around finishing this implementation.

One remark:

We may refine this in the future, but my first take on the static folder will be to duplicate it for the different languages. Which I think makes the most sense.

Welcome back @bep. Hope you got to relax & enjoy. :-)

Yeah, I can see duplicating static. Just wondering, if I had stuff that was shared between both sites, could I use a symlink between? Say, the main language's static/img is linked into other languages' static/img. Or, doesn't hugo deal well with symlinks...?

@RickCogley I have a better idea. I will revise my first post to include this.

@RickCogley I have updated the description with a new section about this. I think this will be very valuable, not just for this particular feature.

@bep, the current description lists this config block (with irrelevant elements removed):

[languages]
staticDir = "static_no"
[languages.no]

[languages.en]
staticDir = "static_en"

Did you mean to include staticDir = "static_no" directly under the [languages] table?

Did you mean to include staticDir = "static_no" directly under the [languages] table?

Copy and paste mistake. Thanks for spotting.

@bep that sounds slick. Given the same file in say static_en/img and static_ja/img which one will "win" if the idea is the "right most" will? Will it be alphabetical, or, last defined in the config.toml?

Also, are the rules different if, say, they have the same filename but one is newer?

I have updated the description with a new section about this. I think this will be very valuable, not just for this particular feature.

On file duplicates, the right-most version will win.

Given the same file in say static_en/img and static_ja/img which one will "win" if the idea is the "right most" will?

In my head: In the above case, no files in static_ja/img will be visible in the English site and vice versa. That is the foundation of this. So you can have logo.png in both places and it would just work. I don't think it would make sense to mix those two "bags of resources". This becomes even more clear when we start to talk about content.

The "righ most" is for the Japanese site the static_jp/img, for English static_en/img -- with a note to the above: Files from static_jp/img will not be visible to the English site.

Also, are the rules different if, say, they have the same filename but one is new

No.

@RickCogley considering #2699 it may (at least for the content part) make sense to let both "language folders" be visible to both -- but in the static case the current language will always win on duplicates. Will think.

@bep If you set:

 staticDir = ["static1", "static2"]

... do you also have to add the language static dirs like static_no into that?

 staticDir = ["static1", "static2", "static_no"]

Or are those set under the language blocks only?

In a multilingual site, in my experience you have images that are language specific but, you also have images that are common to both. So I suppose, common ones go into "global static" and language specific go into "language specific" correct?

Or are those set under the language blocks only?

Yes.

So I suppose, common ones go into "global static" and language specific go into "language specific" correct?

Yes. The use case for having a "static_no" is typically for assets that are different between the two languages: Logo (text in the particular language), maybe also CSS files per language, whatever.

Ok, it seems clean / good to me.
Someone will probably stick their language statics in there, though! :-)

How about staticLangDir?

How about staticLangDir?

I have slept on this, and I think I have figured out a good way to differentiate override these static dirs vs add these static dirs.

It will behave conceptually a little different when running in multihost mode vs regular, but I think it should be logical for most people.

If you need more staticDir properties, add an ID as suffix. An ID is a integer between 1 and 10.

So:

staticDir = ["static1", "static2"]
[languages]
[languages.no]
staticDir = ["static1"]
staticDir2 = "static_no"
baseURL = "https://example.no"
languageName = "Norsk"
weight = 1
title = "PÃ¥ norsk"

[languages.en]
staticDir2 = "static_en"
baseURL = "https://example.com"
languageName = "English"
weight = 2
title = "In English"

The above shows a mix of override and additions.

For no: "static1", "static_no"
For en: "static1", "static2", "static_en"

In both of the above, the right-most directory will win on duplicates.

Oh, smart!

  • Override a global staticDir property by specifying the same property name as the global, under the language.
  • Give a language a unique static directory by specifying a unique property name under the language block.

@bep Is it possible to add custom Google Analytics ID for each language domain? And if not, when it can be implemented?

@biodranik you can set site params per language

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nikolas picture nikolas  Â·  3Comments

sigma picture sigma  Â·  3Comments

carandraug picture carandraug  Â·  3Comments

chrissparksnj picture chrissparksnj  Â·  3Comments

MunifTanjim picture MunifTanjim  Â·  3Comments