Here is a brain dump of many things to consider for i18n support in v2.
I'll keep this issue updated over time, but feel free to comment if you have anything to say, particularly if you used v1 i18n support and can provide valuable feedback.
Superseed this older issue (that still have interesting content): https://github.com/facebook/docusaurus/issues/2651
Links to get inspiration from.
Have an upstream repo (often in English), and one fork per language
A translation strategy first seen on Vue translation: each language creates a git fork.
We can build tooling on top of that, so that a translation change made in the upstream repo can trigger new PRs on forked repos, to automate the process and ensure translations stay in sync.
Pros:
Cons:
Links related to the work of Nat Alison.
Contains some interesting notes on why a SaaS like Crowdin was not a good fit, despite an attempt to use it.
https://reactjs.org/blog/2019/02/23/is-react-translated-yet.html
https://github.com/reactjs/reactjs.org/issues/1605
https://github.com/reactjs/reactjs.org-translation
https://github.com/reactjs/reactjs.org-translation/blob/master/PROGRESS.template.md
https://github.com/facebook/react/issues/8063
https://github.com/reactjs/reactjs.org/issues/82
https://github.com/reactjs/reactjs.org/pull/873
Another translation RFC from Nat Alison, quite close to her work on ReactJS:
https://github.com/gatsbyjs/rfcs/blob/master/text/0010-gatsby-docs-localization.md
I don't think this work is in production.
Also some interesting bits on this thread where she explains her unfortunate situation working at Gatsby.
You have a repo and you just have a folder per language.
Pros:
Cons:
The nuxt doc is a simple repo with language folders.
It works fine, but the author told me it was hard to keep all languages in sync. Looks like a manual process.
Quite similar, TS website has one languages folder per package: <packageName>/copy/<lang> and the translations are handled on the same github monorepo, but split by package
https://github.com/microsoft/TypeScript-Website/issues/100
https://github.com/microsoft/TypeScript-Website/pull/181
Note: Orta found a way to solve the per-language permission problem, as he created a bot so that code owners can self merge through a github PR comment despite not having git permissions:
https://github.com/microsoft/TypeScript-Website/issues/130#issuecomment-675557535
https://github.com/orta/code-owner-self-merge
I think it's possible to handle the "sync with upstream" problem inside a mono repo by using git patch.
git diff origin/master HEAD~100 -- ./website/docshttps://stackoverflow.com/questions/9939952/create-a-patch-including-specific-files-in-git
It's a way to emulate the upstream repo -> language forks pattern
Using a SaaS like Crowdin / Transifex or others has benefits, like the ability to have advanced translation features (UI, editors supporting various formats (PO, Markdown, ICU key/values), translation memory, automatically pay for platform translators, track translation progress, sync with upstream language, version management...)
Pros:
Cons:
Solution suggested by Docusaurus 1, free plan for open-source, used by Docusaurus site v1, Jest, Yarn, Electron...
We should rather try to make it easy to migrate from v1.
Not everybody like this solution however.
Some drawbacks mentioned here:
https://github.com/gatsbyjs/rfcs/blob/master/text/0010-gatsby-docs-localization.md#saas-platform-crowdin
Note: some questions I have asked to Crowdin here: https://gist.github.com/slorber/30643299196c7efa77084eec10c1c609
???
Unlike presented use-cases, we are a framework, not a site, and we don't serve a single community.
I think we want to be able to support both the developers and non-developers.
We can't expect all Docusaurus translators to be developers, nor git users, yet we know that developers don't necessarily always like the lock-in to a SaaS like Crowdin.
I think the translation system should be file-system based, as it's probably the common abstraction between git-based workflows and saas-based workflows
Basically, if you build your site for the fr language, and if you have i18n/fr/docs/myDoc.md, then it should be used for the french page instead of the file at docs/myDoc.md.
I think ./i18n is a good default path to put the translated content, but the paths of such system should be flexible enough so that you can adopt the workflow of your choice, but I thin
./i18nSo, the first step is to support the first case where you just put the translations in a folder of your site. I'm going to experiment with this on Docusaurus 2 website and try to see if I can provide a french translation.
It's unlikely we'll be able to provide integrations with all the existing translation SaaS, but a 2nd step would be to write integration scripts with Crowdin, so that v1 users can keep using it.
It's likely we'll try to use FBT, a translation tool from Facebook.
I have personally a good experience with React-intl as well and prefer it over many react alternatives.
Supposing en is the "main" language.
Does https://myDomain.com/en/myDoc exist?
What should be the behavior of the site if the URL does not contain a language, like https://myDomain.com/myDoc ? Is it the English language? Or do we add code to redirect to the most suitable language?
Is it ok for SEO to have a homepage that just redirects? Or is the homepage english? Then which page is the canonical one?
Note: v1 redirects docs, but not the homepage: https://docusaurus.io/ & https://docusaurus.io/docs/installation
Interesting comment (point 5): https://github.com/facebook/docusaurus/issues/2651#issuecomment-660792635
Let's not forget to add the proper page meta tags such as:
<html lang="en">
<link rel="alternate" href="https://myDomain.com/fr/myDoc" hrefLang="fr-FR"/>
See also https://github.com/facebook/docusaurus/issues/2471
(I think if we have this header in pages, it's not needed to add it in sitemaps)
There are multiple ways to handle the URLs of translated pages
https://fr.myDomain.com/myDocUsing a custom subdomain seems not a very good fit, as it would require one separate deployment per lang (or you'd need to have some custom reverse proxy logic to handle that?).
I don't think this is the workflow we'll encourage, but we could still support this if people really want it. Maybe with an option like docusaurus build --fr, so that it builds a single language site.
Note: this can't be done on simple hosting solutions like Github Pages
https://myDomain.com/fr/myDocI think having a path language prefix is a simpler option, and can be easily done with a single deployment.
There's still a choice to be made here:
baseUrl?Both solutions has cons:
For now I think 2 is a better solution
As we have seen above, it may be a good idea for performance to split the site into multiple smaller SPAs.
But this also means that we'll build the SPAs independently, but what would be the dev experience if you run docusaurus start?
Do we code something completely different in dev so that the routes of all languages are accessible as a single SPA? Do we instead provide a docusaurus start --lang fr to only run the "french SPA"? I think it's an acceptable tradeoff and have some advantages, but can also be annoying for some users.
Auto-generated ids are a problem for anchor links.
As a translator change a heading of some translated markdown file, the id changes, and links from other files do change as well. We should provide an easy way to make the anchors stable across translations
https://github.com/reactjs/reactjs.org/issues/1605#issuecomment-458816106
https://github.com/reactjs/reactjs.org/issues/1605#issuecomment-458819231
https://github.com/ethereum/ethereum-org-website/issues/272
https://github.com/reactjs/reactjs.org/pull/1636/files
https://github.com/mdx-js/mdx/issues/810
Support RTL in themes?
TODO
If the user is browsing a french doc, and press "edit", he should rather open the correct URL (git or crowdin), so we should make this configurable.
Related:
https://github.com/facebook/docusaurus/issues/648
We should not assume english will be the default language, like in v1.
https://github.com/facebook/docusaurus/issues/3317
The build time mostly depends on 3 factors:
To decrease build time and make it sustainable, you can remove older versions from the SPA part, and make them available as a standalone, single version deployment.
We'll work on a cli feature to "archive" older versions more easily: https://github.com/facebook/docusaurus/issues/3286
A missing page/translation should be allowed, in such case we'd fallback to the default language and could show a warning
See 6: https://github.com/facebook/docusaurus/issues/2651#issuecomment-660792635
We need a cli to init a language folder based on current language/versions
See proposal here: https://github.com/facebook/docusaurus/issues/2651#issuecomment-660792635
We'll have to snapshot each localized folder too
It's possible to colocate assets close to the docs. Somehow it permits to use a different image per version. What's the story for i18n? This colocated image would likely end up being copied in the language folders too, so it might be duplicated on multiple axis (version/lang). Is it a good thing? At the same time, if an image contains text, that text could be translated differently so it still makes sense...
Should we allow to create custom slugs per language?
If we do that, to be able to switch from one lang to the other without loosing context (the doc you are currently reading), one version would have to be aware of the slugs of all the other language versions, which might be quite a lot of data. How do we access such data in a performant way?
To me, it does not look so critical to be able to switch language and preserving context. If the user wants to browse docs in french, he can go through the french home and browse from there, and it's likely google gives him the docs in the correct language in the first place.
We should try to find a solution though, but this can probably be done later, with some code that would, on language switch request, read some json file emitted by the other language, and then obtain a mapping from document id to slug of the other language.
Note: Yarn 1/classic (Jekyll based?) can switch language and preserve context when doing so, but the slugs are not localized: https://classic.yarnpkg.com/es-ES/docs/usage
If you add the ?translate=true querystring, it could enhance the UI so that we add in-place translation features.
It could be possible to integrate with the translation API of a SaaS like crowdin.
This is mostly for key/value translations, as markdown docs will be translated as a whole and there's already the editUrl on the docs plugin.
TODO ...
Ongoing PR: https://github.com/facebook/docusaurus/pull/3325
Worth studying the NextJS i18n routing RFC: https://github.com/vercel/next.js/discussions/17078
Hi there, eavesdropping as I've also been grappling translation approaches for another project
Regarding translation management in a single repo scenario, are you aware of git localize? A system that incorporates or mimics this could make for happy devs
thanks @clairefro , didn't know about this one, will take a look :)
Here are some news about i18n support.
You'll find the i18n RFC here: https://github.com/facebook/docusaurus/issues/3317
The i18n core PR has already been merged but it is not officially released yet.
https://github.com/facebook/docusaurus/pull/3325
However, can test it using the @canary npm dist tag (yarn add @docusaurus/core@canary etc) and reading some instructions in that PR.
We are in the dogfooding phase to see if the i18n API and system works fine, and if we need some breaking changes.
We dogfood this on 2 sites:
fr subpath with some basic translation: https://v2.docusaurus.io/fr)When tests are ok for these 2 sites, we'll release i18n with proper documentation, hopefully before the end of the year.
The Jest v2 + i18n migration is in progress and can be tracked here: https://github.com/jest-website-migration/jest/issues/2
Most helpful comment
Here are some news about i18n support.
You'll find the i18n RFC here: https://github.com/facebook/docusaurus/issues/3317
The i18n core PR has already been merged but it is not officially released yet.
https://github.com/facebook/docusaurus/pull/3325
However, can test it using the
@canarynpm dist tag (yarn add @docusaurus/core@canaryetc) and reading some instructions in that PR.We are in the dogfooding phase to see if the i18n API and system works fine, and if we need some breaking changes.
We dogfood this on 2 sites:
frsubpath with some basic translation: https://v2.docusaurus.io/fr)When tests are ok for these 2 sites, we'll release i18n with proper documentation, hopefully before the end of the year.
The Jest v2 + i18n migration is in progress and can be tracked here: https://github.com/jest-website-migration/jest/issues/2