Docusaurus: Docusaurus v2 doesn't allow for "mypagename.html" links

Created on 30 Apr 2020  Â·  26Comments  Â·  Source: facebook/docusaurus

We migrated and now have a series of issues being raised because none of our 7 year old links ending in .html resolve correctly.

This is the main issue tracking this:
https://github.com/facebook/watchman/issues/798

A couple of kind souls have submitted PRs to change links elsewhere:
https://github.com/facebook/watchman/pull/806
https://github.com/facebook/watchman/pull/801

but this is really a docusaurus issue. How can we get this fixed?

bug

Most helpful comment

Hi @wez , sorry for the delay.

Just wanted to let you know that the fixes you need are already merged on master and will be released soon in 2.0.0-alpha.57


What you will need to do on Watchman:

add a slug to each doc

With the extension you want, which will be used for the main/canonical/SEO url

---
id: nodejs
slug: nodejs.html
title: NodeJS
---

Use the client redirects plugin

Plugin doc

Your configuration should look like:

module.exports = {
  plugins: [
    [
      '@docusaurus/plugin-client-redirects',
      {
        toExtensions: ['html'],
      },
    ],
  ],
};

And if there exist a /docs/nodejs.html page, then going to /docs/nodejs will redirect to /docs/nodejs.html

All 26 comments

I don't really understand why this should be considered a bug v2. Obviously, when migrating to v2, you need to create redirects from old URLs to new ones.

And it is completely correct behavior in v2, why links with the .html extension at the end do not work. Just use _valid_ internal URLs -
https://facebook.github.io/watchman/docs/nodejs.html -> https://facebook.github.io/watchman/docs/nodejs

- [Read the NodeJS watchman documentation](https://facebook.github.io/watchman/docs/nodejs)
+ [Read the NodeJS watchman documentation](nodejs)

It certainly can't and shouldn't do in v2 for you. So moreover we discussed it earlier, and decided not to provide our own solution for creating redirects.

You can look at an example of a redirect plugin here.

@lex111 While this might not be a bug, I think https://github.com/facebook/docusaurus/pull/2393 by @rlamana was supposed to add support for such cases. Seems like it isn't working.

Let's reopen for the time being.

But it is works fine - https://v2.docusaurus.io/docs/introduction.html -> https://v2.docusaurus.io/docs/introduction/

Maybe because we don't have a baseUrl while https://facebook.github.io/watchman/docs/install.html has.

I think that's the reason. In packages/docusaurus/src/client/normalizeLocation.js, we don't take into account the baseUrl when it can be part of the location.

Maybe it depends on the hosting provider, since in #2393 only the paths with index.html at the end are normalized.

However, this is still not a proper redirect, and does not affect SEO very well. Therefore, I would not trust this approach, and would generate real redirects with a custom plugin.

@lex111 is right, my change was meant to fix paths ending with index.html only. My intention was to keep the same behavior that currently docusuarus v1 has. The truth is that v1 also supports links ending with *.html and in my humble opinion v2 maybe should also support it. But a custom plugin sounds like a fair option too.

The change would be as simple as updating the regex here: https://github.com/facebook/docusaurus/blob/master/packages/docusaurus/src/client/normalizeLocation.ts#L20

In my perspective as a consumer of a doc build system, my motivation to write custom plugins for a complex system in a foreign language is basically zero. I can switch back to our prior Jekyll based system to restore my URLs: they work out of the box there, and it just simply isn't a question of writing code for that.

Speaking honestly, I'm surprised that such a basic feature is not considered a bug. Who is the intended audience for this tool and why wouldn't they need to control their URLs?

FWIW, I don't appreciate the immediate "not a bug -> close" response. You guys reached out with the offer to migrate our docs and we're left with widespread broken incoming links as a result. Not a great customer experience.

@wez we never thought of that feature when we first build v2. But since this feature has been added and it's not working correctly, it's a bug. We'll prioritize fixing this. Apologies!

@wez sorry about that, but I closed your issue based on our past experience. Docusaurus v2 is significantly different from Jekyll, so when migrating from your previous static generator website, the one who did the migration of Watcher should take into account that the URLs will be different and therefore it was necessary to create redirects. I did not track the process of migrating your project website to Docusaurus v2, so I can not say anything about this.

However, you probably shouldn't have migrated to Docusaurus if you were completely comfortable working with Jekyll, because inevitably you would encounter with various kinds of difficulties when migrating to a new tool.

I don't disclaim responsibility for this incident from Docusaurus team, but in our opinion static generator websites should not deal with redirects and fixing incorrect URLs (even though we were trying to mitigate this for users v1). Moreover, we are still in alpha status, so Docusaurus is not a very stable tool to use without problems in daily routine...

I hope there are no hard feelings.

Hi all 👋

While Facebook's open source projects have general autonomy in what infrastructure they use for their documentation, I, of course, hope they choose Docusaurus. We felt Watchman was a good starter candidate for community members to use to migrate to Docusaurus 2, given that it was a relatively vanilla site with good content.

So it wasn't @wez who proactively did the migration. It was me recommending it as an option for a community member.

I will take responsibility on not recognizing the explicit .html ending problem that has occurred with Watchman. But we should fix the issue -- either in the core (which is probably the right place) or having a plugin that allows folks that may run into this problem in any migration to resolve the problem.

This should hopefully be resolved soon. Apologies @wez for any churn this has caused your site.

Thanks.

Hi all,

Here are a few comments, proposals and questions I have

Valid urls of Watchman

I understand that such url should work:

Does it mean that BOTH urls should work?

I don't know how Watchman site worked before, is it still online somewhere to check?

Hosting on Github pages

Watchman is hosted on Github pages.
As far as I know, it's not possible to do any server-side redirect on this hosting solution.

Also, for Github pages to serve a non-404 answer, the file actually has to exist on the FS with the html extension.

On other platforms like netlify, it would have been posssible to drop a simple _redirects file and handle this.

It might be a good idea to start using a custom domain, which would allow more flexibility to change the underlying hosting solution without too much pain.

Using .html extension in document id

If Watchman just need the /nodejs.html, and not the /nodejs page, it's possible to use .html as suffix in document ids

image

Using a file like filename.html.md also works

image

image

duplicating the pages

Creating 2 html pages for the same document could be a portable solution

```md

id: nodejs
extraPaths:

  • nodejs.htm
  • nodejs.html

title: NodeJS page

````

The duplicate pages would have a canonical url to the main page so that SEO can know which page is the main one.

Is it worth redirecting in this case? or can the browser just stay on the non-canonical page if it serves the correct content?

404 + redirecting

This looks like the solution @lex111 implemented here: https://github.com/facebook/docusaurus/pull/2704
Which was not merged due to SEO reasons related to serving 404.

I found this note here: https://github.com/rafrex/spa-github-pages

A quick SEO note - while it's never good to have a 404 response, it appears based on Search Engine Land's testing that Google's crawler will treat the JavaScript window.location redirect in the 404.html file the same as a 301 redirect for its indexing. From my testing I can confirm that Google will index all pages without issue, the only caveat is that the redirect query is what Google indexes as the url. For example, the url example.tld/about will get indexed as example.tld/?p=/about. When the user clicks on the search result, the url will change back to example.tld/about once the site loads.

I'd prefer not to do that as well but that remains an option.


What do you think?

Thank you so much @slorber for the wonderful summary of the options here.

I would definitely prefer the option that could apply to the most sites that would run into this problem. So while I like the custom domain idea for Watchman specifically, that wouldn't scale to all sites that may not be able to have a custom domain. Same for migrating over to Netlify. We could probably do that, but not all projects (outside of FB) that may have this issue.

I think we should think hard about the duplicating of pages or the 404+ redirecting. This should only affect a handful of sites, and with new sites using v2, this shouldn't be an issue.

Thanks @slorber!

I did try simply renaming files before I posted this issue and it didn't change the behavior.
If there have been changes since then I'm happy to work through and try your suggestions.

Regarding whether we need to support both names: we didn't, but now that we've been in the current state for a couple of months(!) we have both names out there and will need to transition back to the one true naming scheme (with .html). I'd like the canonical names to be .html and ideally we'd redirect everything to those, but I'm ok with generating a second set of names for a couple of months and then removing them in favor of the .html names.

Regarding GH pages vs. other hosting solutions: I'm not super motivated to switch away from GH just for this software; that introduces another whole host of issues to deal with and that's not how I want to spend time, especially as we'd then also need to introduce a new domain name to make that work; that's three different sets of names to wrangle.

If you wanted to inspect the jekyll setup in watchman to compare behavior:
https://github.com/facebook/watchman/tree/ecf0934079d869b90eead1a0efef838aca48cf88/website
is a link to the website dir at the commit prior to the first docusaurus change; I believe that the readme there has accurate info on running those docs locally.

Hi @wez @JoelMarcey

I understand that we are looking for a portable solution, and not really willing to leverage hosting platform configuration.

That means that we must write to disk these 2 files if we don't want a 404 status code from github pages:

  • docs/nodejs.html
  • docs/nodejs/index.html

I'm looking at how to make this work and I see 2 solutions:

  • duplicate the page, and use canonical urls to tell google nodejs.html is the main one
  • have nodejs/index.html do a client-side redirect (empty/lightweight page, no content)

Note, it seems possible to trigger a client-side redirection with a html tag as well. It seems understood by google as a redirect (despite being not recommended).

<meta http-equiv="refresh" content="0; url=https://facebook.github.io/watchman/docs/nodejs.html">

It should be possible to provide more advanced configuration, like:

---
id: nodejs
path: nodejs.html
redirectPaths: 
  - nodejs
duplicatePaths: 
  - nodejs
title: NodeJS page
---

@wez, if we succeed to make nodejs.html the main page, is it ok to have a simple client-side redirect from "/nodejs" to "/nodejs.html"?


I did try simply renaming files before I posted this issue and it didn't change the behavior.
If there have been changes since then I'm happy to work through and try your suggestions.

According to this comment: https://github.com/facebook/watchman/issues/798#issuecomment-619300064

It's not totally clear to me what you have tried, but it looks like you tried this: nodejs/index.html.md
What I'm suggesting is nodejs.html.md as filename. If you specify document id as frontmatter you need to use id: nodejs.html. In Watchman docs I can see the nodejs doc has a frontmatter id (the filename actually has no effect on the pathname).

I'm able to get this working locally (should also work for GH pages).
I opened an example PR for watchman website here:
https://github.com/facebook/watchman/pull/812


If we validate that the workaround works:

  • is id: nodejs.html the api we want to recommend for this usecase? (it's a bit weird to me, we should probably be able to customize the path of each doc completely?)
  • We should still implement something to redirect from /nodejs to /nodejs.html, due to existing unprefixed links actually deployed

We should decide if we want a document pathname customization feature, as if we start migrating Watchman doc to the workaround id: nodejs.html, to ship 1 week later a clean way to solve this usecase, we'd then have to migrate the Watchman site from workaround to clean solution.

That means that we must write to disk these 2 files if we don't want a 404 status code from github pages:

@slorber This plugin by @lex111 might be helpful - https://github.com/single-spa/single-spa.js.org/blob/master/website/src/plugins/docusaurus-plugin-redirects/src/index.js

@wez, if we succeed to make nodejs.html the main page, is it ok to have a simple client-side redirect from "/nodejs" to "/nodejs.html"?

@slorber Yeah, that's fine with me! Thanks for looking at this!

Hi @wez , sorry for the delay.

Just wanted to let you know that the fixes you need are already merged on master and will be released soon in 2.0.0-alpha.57


What you will need to do on Watchman:

add a slug to each doc

With the extension you want, which will be used for the main/canonical/SEO url

---
id: nodejs
slug: nodejs.html
title: NodeJS
---

Use the client redirects plugin

Plugin doc

Your configuration should look like:

module.exports = {
  plugins: [
    [
      '@docusaurus/plugin-client-redirects',
      {
        toExtensions: ['html'],
      },
    ],
  ],
};

And if there exist a /docs/nodejs.html page, then going to /docs/nodejs will redirect to /docs/nodejs.html

Hi @wez

The features you need have been released in https://github.com/facebook/docusaurus/releases/tag/v2.0.0-alpha.58

Tell me if you need something else

I've been looking at updating this morning; I have added the slug lines to my front matter and the plugin config to the config, and have .html links working, but navigating to the non-html version of the links just shows a "Page Not Found" page when I run locally under yarn run start.
Is there some other configuration needed?
I'm hesitant to push this to gh_pages if the redirects don't function locally!

Example of the local changes to clarify; I've applied a similar change to all of the docs/*.md files:

diff --git a/fbcode/watchman/website/docs/install.md b/fbcode/watchman/website/docs/install.md
--- a/fbcode/watchman/website/docs/install.md
+++ b/fbcode/watchman/website/docs/install.md
@@ -1,5 +1,6 @@
 ---
 id: install
+slug: install.html
 title: Installation
 ---

and the config:

diff --git a/fbcode/watchman/website/docusaurus.config.js b/fbcode/watchman/website/docusaurus.config.js
--- a/fbcode/watchman/website/docusaurus.config.js
+++ b/fbcode/watchman/website/docusaurus.config.js
@@ -24,12 +24,12 @@
       },
       links: [
         {
-          to: 'docs/install',
+          to: 'docs/install.html',
           activeBasePath: 'docs',
           label: 'Docs',
           position: 'left',
         },
-        {to: 'docs/support', label: 'Support', position: 'left'},
+        {to: 'docs/support.html', label: 'Support', position: 'left'},
         {
           href: 'https://github.com/facebook/watchman',
           label: 'GitHub',
@@ -63,4 +63,12 @@
       },
     ],
   ],
+  plugins: [
+    [
+      '@docusaurus/plugin-client-redirects',
+      {
+        toExtensions: ['html'],
+      },
+    ],
+  ],
 };

Hi

The plugin only applies to the production build,

It's not expected to work with yarn start

We don't want those redirects to need to be SPA-based and part of the app
(because it would need to load React and infra code, before being able to
redirect)

Le ven. 10 juil. 2020 à 18:06, Wez Furlong notifications@github.com a
écrit :

Example of the local changes to clarify; I've applied a similar change to
all of the docs/*.md files:

diff --git a/fbcode/watchman/website/docs/install.md b/fbcode/watchman/website/docs/install.md
--- a/fbcode/watchman/website/docs/install.md
+++ b/fbcode/watchman/website/docs/install.md
@@ -1,5 +1,6 @@


id: install
+slug: install.html
title: Installation


and the config:

diff --git a/fbcode/watchman/website/docusaurus.config.js b/fbcode/watchman/website/docusaurus.config.js
--- a/fbcode/watchman/website/docusaurus.config.js
+++ b/fbcode/watchman/website/docusaurus.config.js
@@ -24,12 +24,12 @@
},
links: [
{
- to: 'docs/install',
+ to: 'docs/install.html',
activeBasePath: 'docs',
label: 'Docs',
position: 'left',
},
- {to: 'docs/support', label: 'Support', position: 'left'},
+ {to: 'docs/support.html', label: 'Support', position: 'left'},
{
href: 'https://github.com/facebook/watchman',
label: 'GitHub',
@@ -63,4 +63,12 @@
},
],
],
+ plugins: [
+ [
+ '@docusaurus/plugin-client-redirects',
+ {
+ toExtensions: ['html'],
+ },
+ ],
+ ],
};

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/facebook/docusaurus/issues/2697#issuecomment-656755002,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAFW6PS7F4FEAP22NAQQNULR2436PANCNFSM4MVC2M2Q
.

Does it work for you @wez?

I got frustrated by not being able to see this working locally, and frustrated that I'd spent more time on this than I think I've spent in total on "doc infra" in the past several years, so I opted to revert back to jekyll where it was simple to enable redirections for the "wrong" style URLs to the desired .html URLs. It doesn't look as good as docusaurus but doing so closes a half-dozen issues with various broken links.

My plan is to let that stay on jekyll until the search engines have re-indexed the new canonical locations and then revisit docusaurus again later, perhaps when it is out of alpha. Now that we have a way to control the URLs that later pass should be easier to manage.

I think it would be good if you could clarify in the docs that you cannot test the client redirect plugin when running locally; it was not clear to me at all that that was the expectation and I burned an hour trying to figure out what might have been wrong in my configuration, and poring through commits to see if I could glean anything that might help.

I'd love to have a way to test redirects locally when we revisit docusaurus; it would help build confidence before we push to production!

Sorry that your experience wasn't as great as it should have been, and for the time lost giving it a try 😞

I didn't document that the plugin worked only for the production build on the initial release alpha 58, sorry about that. It is currently documented in the master branch here, but didn't backport it to the alpha 58 doc.

Next time you give it a try, please reach on Discord, I'll be there to help.

I'd love to have a way to test redirects locally when we revisit docusaurus; it would help build confidence before we push to production!

It is possible to test locally, but still involves the production build.

You can run the docusaurus build cmd (viayarn build normally), and then serve the build folder locally with any http server (I'd recommend serve, very simple one, no need for Apache or whatever)

# creates the /build folder (production build including the redirects)
yarn build

# host it locally:
yarn add serve
yarn serve build

// open http://localhost:5000


It is not so simple to make this work with yarn start easily, because the redirect files are lightweight, and not part of the Docusaurus client side routing system (SPA based on React / ReactRouter). We should be able to redirect to the correct page asap, without needing to wait for React and Docusaurus JS infra to download.

It may be possible to generate those lightweight redirect files before spawning the webpack dev server, but would probably decrease the startup speed of the project in dev mode.

We'll also make a docusaurus serve command and recommend a way to test a production build locally => https://github.com/facebook/docusaurus/issues/3062

FYI we now have the serve command shipped
https://twitter.com/docusaurus/status/1286715187983048704

Was this page helpful?
0 / 5 - 0 ratings