Docusaurus: [v2] XML Sitemaps are incomplete

Created on 14 Apr 2020  路  10Comments  路  Source: facebook/docusaurus

馃悰 Bug Report

The XML sitemaps currently output loc, changefreq and priority for every url set. I would propose dropping the changefreq and priority fields, as none of the search engines use these, and instead _adding_ the lastmod field, with the last modification date of the file.

Have you read the Contributing Guidelines on issues?

Yes.

To Reproduce

(Write your steps here:)

  1. Open any DocuSaurus v2 sitemap :)

Expected behavior

The current output would be:

<url>
        <loc>https://developer.yoast.com/features/canonical-urls/api</loc>
        <changefreq>weekly</changefreq>
        <priority>0.5</priority>
</url>

(Write what you thought would happen.)

Actual Behavior

I propose changing it to:

<url>
        <loc>https://developer.yoast.com/features/canonical-urls/api</loc>
        <lastmod>2020-04-14T11:22:05+00:00</lastmod>
</url>

Your Environment

  • Docusaurus version used: v2
bug intermediate help wanted v2

All 10 comments

I think this would be a good addition, but I do know web crawlers that use the priority field.

@RDIL such as? Honestly I鈥檝e been doing SEO for well over a decade, not seen it used in the last 5 years.

Fair enough.

Great idea! Thanks for the suggestion!

Hello! I want to help solve this issue.
As I can see, there are several implementation options here:

  1. Should I leave the old tags and add new ones or replace them?
  2. Which date should be specified in the "lastmod" tag: the date of the last build of the project or the date of the last page change? If the second, are there any easier ways to do it?

Most likely the last build time since even just tiny changes end up changing the chunk hashes, so its constantly being modified.

@RDIL FYI Webpack 5 might help to make the js chunks more "stable" (see my recent comment in https://github.com/facebook/docusaurus/issues/3383), we may try to migrate after i18n is ready.

Not sure what we should do for this date. Also not sure how the sitemaps plugin could access the "last modification date" of the page, as this plugin is decoupled from the others.

Is it mandatory to add it to the sitemaps? It could likely be easier to handle this by adding a meta directly on the page, otherwise, we'd have to find a way to provide such metadata per path to the sitemap plugin.

Asking this, because for my work on i18n I'll also have to think about how to set up useful headers for localization (hreflang), and thought about adding them to the page directly instead of the sitemaps.

@jdevalk as it seems you know more about SEO than the rest of us, can you give us some insights?

Last modified is somewhat of a must for XML sitemaps indeed.

I think for hreflang I'd go for adding it to the page instead of the XML sitemaps as that makes debugging a lot easier and maybe even makes it accessible to other features within docusaurus, like a language switcher.

Thanks, will do that.

About lastModified, some plugins already read git history to get the last modified date. We can enable also to hardcode it through frontmatter.

I think we should:

  • call addRoute apis with lastModified: lastModifiedFrontmatter || lastModifiedGit || lastModifiedFS || undefined
  • use that data when generating the sitemaps. If not available, add the date of the build?

If this info can't be obtained (pages might not be generated from FS files), is it better to not add the lastmod entry, or to fallback to build time (which is likely to be a recent value if the site is built often).

We agree that this date should rather be updated when the content change, but not when the code (ie the layout rendering the content etc) change?

If this info can't be obtained (pages might not be generated from FS files), is it better to not add the lastmod entry, or to fallback to build time (which is likely to be a recent value if the site is built often).

I would not add it then. Having it change all the time when it's actually _not_ changing is also not beneficial.

We agree that this date should rather be updated when the content change, but not when the code (ie the layout rendering the content etc) change?

Agreed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chandankumar4 picture chandankumar4  路  3Comments

nebrelbug picture nebrelbug  路  3Comments

awibox picture awibox  路  3Comments

muuvmuuv picture muuvmuuv  路  3Comments

rickyvetter picture rickyvetter  路  3Comments