The currently distributed Robots.txt file restricts multiple directories, including /Portals, /Resources, and /DesktopModules. These directories often contain important content that bots need to use for proper rendering of the site, such as JS, CSS, as well as user loaded content such as PDF's or otherwise.
This file, in default should ONLY restrict paths that do not contain any possible user-supplied content or otherwise
Do we have a method for overriding this only if it's the default? Or would this just affect new installations?
Right now, I wanted to create this issue to start the discussion. Current functionality any DNN changes to this file will override existing files.
I personally think we need to open this up, as blocking /portals for example stops PDF content from being indexed by specific bots.
Funny point: Google Bot is ALLOWED to all of this content. My proposed change would be to match the GoogleBot config for all
I suggest providing an improved version for new installs.
IMO we shouldn't change the file upon upgrades.
I agree with @sleupold we should not change that file on upgrades since people might have customized it to their needs. Howerver it would be nice to still provide it somehow. What about having robots.txt in the install package only and robots.9.3.0.txt in both packages
We could check if it had been changed in Upgrade.cs, but that's only if we decide it's okay to change the default out from under folks.
@bdukes there might have been different versions distributed with previous versions of DNN.
We should consider including upgrade instructions and hints with any new DNN release.
I will not allow a DNN upgrade to change an existing robots.txt. We have made specialized robots.txt for various sites we have developed.
Documentation and advice for users is welcome, also an option to install the default robots.txt, but do not overwrite an existing version.
An advanced option would be to automatically overwrite an existing robots.txt during the upgrade if that existing version is the default from the DNN version you are upgrading from.
DNN, by default, does NOT modify the Robots.txt file with upgrades. If you look at the Upgrade Package it does not have a robots.txt file, just like it doesn't have a web.config file.
My recommended change is to standardize the robots.txt file that would be included in the default installation. Additionally, add to the release notes information about the change allowing users to edit the robots.txt file if they desire for existing/upgraded installations.
@EPTamminga, my suggestion that we could change the default upon upgrade would be implementing your "advanced option," I definitely wouldn't consider replacing a modified robots.txt
Seems pretty easy to do... maybe a how to tutorial would work to get it out prior to the next release
A copy posted with changes from default is all that is needed. You can post a copy of it in the list of files for download potentially. I like the idea of changing the default if it has not been modified manually. Seems like a lot of work maybe just a heads up on a release along with a few discussions and a blog post may cover the topic pretty well.
Could possibly add this to the current release downloads still with an explanation of what to do.
What if we add a check into the security analyzer, check to see if those directories are excluded and then just warn the user and link to a blog post discussing the required changes?
@ohine I like this idea, but it isn't really a "security" issue. And I wouldn't want to clutter the Security Analyser with something that isn't security
Make a SEO admin page robot txt analyzer? It would be nice to have something assist with managing the robot.txt file in the seo area even if its just a link to the file. It could help discover ways to enhance things getting listed to blocking what needs restricted on a dnn portal.
It would also be nice to be able to identify and block specific bots from crawling your site if you like to.
My idea of old was to serve robots.txt through a http handler. I stopped researching it, but maybe it's still an interesting idea
Yeah, ultimately it would be nice to be able to have a different file per site/portal, which would require that being a virtual URL. But adding to the SEO PersonaBar menu would be a good first step.
We are still discussing this, pushing it to Dnn 10
My idea of old was to serve robots.txt through a http handler. I stopped researching it, but maybe it's still an interesting idea
Just following up to this thread, I have a working proof of concept for serving the robots file from a given portal folder through a HTTP Handler (rewrite rules also were required to stop the DNN url rewriter from touching it). Having it in the portal files will let a site admin modify the robots file through the file manager.
If the robots file does not exist at a portal level it defaults to the standard robots file at the root of the application.
Is this something that you'd like to see contributed to the Platform? I can create an issue for it, tidy it up and get it submitted for DNN v10. Thoughts?
sounds great, @mikesmeltzer!
if possible, I would prefer to combine site level robots.txt with the global one - there are a couple of install folders, we don't want to be indexed as well as some site specific URLs.
@sleupold thanks for the quick response! I will create another issue to handle the site level robots.txt work (linking to this issue) and the functionality can be discussed there as it's different than the original issue brought up here.
@mikesmeltzer This is great, maybe as part of your change we improve the default rules.
@mitchelsellers I think this issue should stay separate from the one I've created. My intention isn't to create a new default robots.txt file at a portal level but to allow admins to upload their own custom robots.txt file through the file manager, fall backing to what is currently in place at a global level if they don't.
That being said, I don't mind updating the robots.txt file while I'm in there if everyone on the same page with (my interpretation of above comments):
Sound good?
IMO Mitch is completely right. At least a new default DNN install should not block access to /Portals/ (I really wonder what the idea behind this was when it was added).
The rest of the discussion is nice, but IMO the technological solution is of less importance than just fixing this one issue ASAP.
We have detected this issue has not had any activity during the last 90 days. That could mean this issue is no longer relevant and/or nobody has found the necessary time to address the issue. We are trying to keep the list of open issues limited to those issues that are relevant to the majority and to close the ones that have become 'stale' (inactive). If no further activity is detected within the next 14 days, the issue will be closed automatically.
If new comments are are posted and/or a solution (pull request) is submitted for review that references this issue, the issue will not be closed. Closed issues can be reopened at any time in the future. Please remember those participating in this open source project are volunteers trying to help others and creating a better DNN Platform for all. Thank you for your continued involvement and contributions!
sounds to be still an issue, can anybody validate?
Just chiming in here as I am now moving this issue through the new Issue Triage project. Where do things stand? What is the actionable item here? Has this already been split out into separate issues or is there more to do here? Thanks.
This is a change that is needed, I'm working a PR on it, will try to submit this afternoon
Fantastic @mitchelsellers - moving through the process and assigning to you.
I was going to take care of this in January when I do a few other things. If you're able to do this Mitch that is great.
Thanks,
Mike
Mike Smeltzer
DNN MVP / IT Consultant / Solution Developer
On Tue, Dec 31, 2019 at 1:24 PM -0400, "Mitchel Sellers" notifications@github.com wrote:
This is a change that is needed, I'm working a PR on it, will try to submit this afternoon
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
I've resolved this in PR #3522.
Closing per #3522
Most helpful comment
Yeah, ultimately it would be nice to be able to have a different file per site/portal, which would require that being a virtual URL. But adding to the SEO PersonaBar menu would be a good first step.