We recently launched a new site on Gatsby and I've been keeping track of Google's re-indexing of our website. I noticed a lot of our pages are showing up as âExcluded â Submitted URL not selected as canonicalâ in Google Search Console.
Investigating this further, it seems that the sitemap generated by gatsby-plugin-sitemap adds a trailing / to the end of URLs, whereas Google's crawler does not. The end result is that the Google determines these are duplicate pages (ones with trailing / and ones without), prefers the URLs without the / as the canonical, and excludes most, if not all, of the URLs submitted via the sitemap.
More information on Google Search Consoleâs excluded URLs.
gatsby-plugin-sitemap and deploy your website./, whereas indexed URLs will not.It would be nice if gatsby-plugin-sitemap did not add a trailing / to the end of URLs by default. This would then ensure the sitemap URLs match what Google crawls independently, resulting in little to no excluded URLs due to the âSubmitted URL not selected as canonicalâ reason.
Lots of âExcluded â Submitted URL not selected as canonicalâ URLs in Google Search Console.
Only posting the relevant portions for brevity. There are more files, but they're not relevant.
gatsby-config.js:
module.exports = {
siteMetadata: {
siteName: "Dovetail",
siteUrl: "https://dovetailapp.com"
},
plugins: [
"gatsby-plugin-sitemap",
]
};
package.json:
{
"dependencies": {
"gatsby": "^1.9.244",
"gatsby-link": "^1.6.40",
"gatsby-plugin-canonical-urls": "^1.0.18",
"gatsby-plugin-google-analytics": "^1.0.29",
"gatsby-plugin-manifest": "^1.0.20",
"gatsby-plugin-nprogress": "^1.0.14",
"gatsby-plugin-react-helmet": "^2.0.10",
"gatsby-plugin-react-next": "^1.0.11",
"gatsby-plugin-sentry": "^0.0.4",
"gatsby-plugin-sharp": "^1.6.42",
"gatsby-plugin-sitemap": "^1.2.22",
"gatsby-plugin-typescript": "^1.4.19",
"gatsby-remark-images": "^1.5.61",
"gatsby-source-filesystem": "^1.5.29",
"gatsby-transformer-remark": "^1.7.39",
},
}
Thanks for the detailed report @humphreybc! Having a quick look at your sitemap it seems there's a mix of paths with and without a trailing slash, which makes me wonder if gatsby-plugin-sitemap is the right place to look.
Could you try adding gatsby-plugin-remove-trailing-slashes to your your project and see if that helps?
@humphreybc I was working through this exact issue today. I found that gatsby-plugin-sitemap sets the URL based on how the path is set during createPage.
In my case I wanted to ensure all my pages had a trailing slash so I went back and changed path: `/${post.slug}` to path: `/${post.slug}/` inside gatsby-node.js. I imagine you can do this in reverse to ensure none of your paths and therefore none of your sitemap URLs have a trailing slash. :smile:
Adding gatsby-plugin-remove-trailing-slashes seems to have fixed the issue. I guess it just modifies the âpathâ in Gatsby, which the sitemap plugin uses. Nice find, @m-allanson and @lightstrike.
Seems like this is resolved! Closing the issue.
Most helpful comment
Thanks for the detailed report @humphreybc! Having a quick look at your sitemap it seems there's a mix of paths with and without a trailing slash, which makes me wonder if
gatsby-plugin-sitemapis the right place to look.Could you try adding
gatsby-plugin-remove-trailing-slashesto your your project and see if that helps?