Hello
I have problem with 301 redirects adding trailing slashes (/path => /path/) ONLY when I build+serve my site (gatsby build && gatsby serve && open http://localhost:8000/hi => I get redirected to http://localhost:8000/hi/). Doesn't happen using gatsby develop.
I have :
exports.createPages = ({ actions: { createPage } }) =>
createPage({
path: '/hi',
component: HiComponent,
});
So... minimum code possible, and still have this issue. Tried gatsby-plugin-remove-trailing-slashes but didn't do anything, as it is only for pages created from the pages folder.
No 301, no redirect
Redirection
System:
OS: macOS 10.14.6
CPU: (4) x64 Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
Shell: 5.3 - /bin/zsh
Binaries:
Node: 12.12.0 - /var/folders/mt/71kkzssd3fq24s3x3y6wpc4h0000gn/T/yarn--1573821730999-0.42525738248781186/node
Yarn: 1.19.1 - /var/folders/mt/71kkzssd3fq24s3x3y6wpc4h0000gn/T/yarn--1573821730999-0.42525738248781186/yarn
npm: 6.11.3 - /usr/local/bin/npm
Languages:
Python: 2.7.16 - /usr/local/bin/python
Browsers:
Chrome: 78.0.3904.97
Firefox: 65.0
Safari: 13.0.3
Thanks in advance!
@adrienharnay thank you for the detailed report! I have confirmed and reproduced the issue. Is this something that you might be interested in contributing a fix? I'd be happy to help guide you through the process if you're interested!
Sure, I would be happy to!
Do you have any idea where that could come from? As someone who doesn't know much about how Gatsby works internally, I would guess that the servers from gatsby develop and gatsby serve behave in a different manner, since the bug only happens using gatsby serve.
I will set up a repository with the minimum code possible to reproduce the issue so that we can test the fix on it when it's done.
@adrienharnay gatsby develop is running a webpack dev server process, while gatsby serve is just an express app running against the build artifacts from gatsby build
You can see all of the command code under packages/gatsby/src/commands. I think the problem might be in how we use express.static. Looking at the source for that they do the redirect if the requested path is a directory. Which is what our build artifact produces.
So I'm curious to see what you come up with. Please ask any questions along the way. We have a document here on how to get your dev environment setup too!
https://www.gatsbyjs.org/contributing/setting-up-your-local-dev-environment/
Thanks for the pointers, will report back as soon as I have some news :)
I've created the repo: https://github.com/adrienharnay/gatsby-simplest-example. Now starting to work on the issue
I have developed a fix on the repo I linked earlier. The procedure is detailed in the README. Basically, the fix is to render public/hello.html instead of public/hello/index.html if the path given to createPage does not end with a slash.
I tried several other fixes by tweaking express.static and express.Router, without luck. I don't know if the new architecture could break anything else, but that's the only fix I could find.
Please let me know if that's OK, then I'll file a PR to gatsby :)
Edit: I have created the PR so that you can see the changes/merge it directly if everything is OK!
@adrienharnay Thank you so much for taking your time to do this! I played around with express.static for about 30m and wasn't sure if we had any solutions, but it's cool to see what you found!
I'm looking at your PR and trying to see if there are any negative side-effects to your change.
I'm also curious how you found contributing so far? Was it easy to get set up locally? Was it easy to test and feel confident in your fix? I'm just trying to gauge how well the community can feel empowered to contribute. :)
Thank you! It seems e2e tests are broken, but running yarn test locally works, so I'm not sure what I'm missing.
Other than that, contributing has been really easy, thanks to your fast answers and good pointers! I felt overwhelmed by the process to test Gatsby locally, so I went with patch-package instead to patch my node_modules since the changes were fairly simple, and it took me no time to implement the fix! There's no wonder why Gatsby sees so many contributions :)
@adrienharnay That is valuable feedback. Thank you!
This issue affects gatsby develop command as well. It only surfaces when you reload the page.
So i've spent more time researching this. I'm more convinced now that Gatsby is actually doing the right thing. For reference read this article: https://searchfacts.com/url-trailing-slash/
In addition, did you know that we have plugins for adding and removing slashes? I wonder if those would solve your needs instead of changing the way the codebase works. gatsby-plugin-remove-trailing-slash
This issue also pops up in gatsby serve when you do a hard refresh Ctrl+Shift+R. It changes the URL from mysite.com/about to mysite.com/about/.
@blainekasten OP has already tried using gatsby-plugin-remove-trailing-slashes without success. It's mentioned in the issue. Any other suggestions for workarounds?
Ideally, the remove-trailing-slashes plugin would enforce this.
So i've spent more time researching this. I'm more convinced now that Gatsby is actually doing the right thing. For reference read this article: https://searchfacts.com/url-trailing-slash/
As long as the trailing slash is always removed, the worry of Google seeing URLs differently won't be a problem.
@varunagrawal Hi there! Are you still experiencing this problem? If so, can you please attach a minimal reproduction so we can take a look? Thanks!
I think i'm running into this same issue, however even if the trailing slash is consistently added/removed this still causes issues with Google SEO as the redirect that adds the slash shows as a 301 and as a result google doesn't index that page. Is there a way to disable the addition of a slash? e.g. if i go to https://gatsby-blog-no-slash.wardpeet.dev/new-beginnings i can see that the redirect to add the slash is happening and then the slash is later being removed, this would still cause issues with google indexing.
We were running into the same issue and the only solution that worked was to ask the marketing team to include a trailing slash and direct all traffic to the "canonical" URL.
Alternatively, you could write an S3 deployment script that would essentially duplicate all index.html files outside of their parent folder and rename them after the parent folder (without an extension).
Keep in mind, this can't be done or tested locally on your machine because that would create conflicting file names on your filesystem but S3 actually supports such a configuration where a folder and a file can share the same name while residing in the same parent directory. This way you could support both URLs and then add a canonical URL link tag that would point at the preferred destination.
Example:
https://gist.github.com/cmawhorter/b706e45ba88e43bc379c
@varunagrawal please keep in mind gatsby serve is not meant for production, but instead just to _simulate_ production locally. Every hosting platform is different and can usually be customized on how files are resolved in reference to their urls.
I'm going to close this because I'm not convinced there is an absolutely correct approach to this. We have the plugins to add and remove trailing slashes. Feel free to keep discussing in the closed issue though!
We experience this issue on first load and reload on both gatsby develop, gatsby serve and in production. gatsby-plugin-remove-trailing-slashes only changes the links, it doesn't fix the issue that urls without trailing / will be redirected to a url with a trailing /. Because of this are gatsby-plugin-remove-trailing-slashes actually harming our website links, because all links will result in a url that will redirect on reload.
We managed to resolve this issue as described in my previous comment. It requires two scripts:
index.html files and move them outside of their parent folder, then rename them to reflect the name of the parent folder + collision prevention suffix needs to be appended since filesystems typically don't allow a folder and a file with the same name to coexist:Before:
- grandparent
- parent
- index.html
After:
- grandparent
- parent
- index.html
- parent.collision.html (this is a clone of the index.html above)
Scripts are available here and one of them is loosely based on the gist I shared above:
https://gist.github.com/mareksuscak/e0a94987a24038a3b81f045602f6274b
@mareksuscak That was really helpful. It took me a few days to work this out as I thought my Cloudfront was doing something weird.
Some information that I believe you may find useful: If you use your command find ./public -mindepth 2 -name "index.html" | sed 's/^\(.*\)\/index\.html$/\1/' | grep -v 404 | xargs -L1 -t -I {} cp "{}/index.html" "{}.html" in an AWS Codepipeline and Codebuild, you don't actually have to do the renaming step because it stores the temp. artifact build files inside S3.
How did you configure your S3 index document?
"However, if you exclude the trailing slash from the preceding URL, Amazon S3 first looks for an object photos in the bucket. If the photos object is not found, it searches for an index document, photos/index.html. If that document is found, Amazon S3 returns a 302 Found message and points to the photos/ key. For subsequent requests to photos/, Amazon S3 returns photos/index.html. If the index document is not found, Amazon S3 returns an error."
from here
302 Responses are a nightmare for SEO. I know it can be fixed with a lambda@edge but this all seems a lot of effort to remove a trailing slash.
Most helpful comment
We experience this issue on first load and reload on both
gatsby develop,gatsby serveand in production. gatsby-plugin-remove-trailing-slashes only changes the links, it doesn't fix the issue that urls without trailing / will be redirected to a url with a trailing /. Because of this are gatsby-plugin-remove-trailing-slashes actually harming our website links, because all links will result in a url that will redirect on reload.