If I decide not to include a page in "the" XML sitemap used for search engines, it will not appear in the frontend module sitemap either. There should be a possibility to choose the visibility in both sitemaps separately, because XML sitemap and sitemap frontend module are different matters IMHO.
In a project we decided to set the Robots-Tag for the privacy statement page to "noindex, nofollow", but at the same time we chose to include the page in the sitemap, because we want to show it in the sitemap page which uses the sitemap frontend module. Now, Google sends me mails complaining about the page being included in the XML sitemap while being set to noindex in the Robots-Tag. Therefor, I need to exclude the page from XML sitemap but keeping it included in the frontend module.
If I decide not to include a page in "the" XML sitemap
How exactly did you do this?
Actually, I didn't. If I would do it, then I would do it in the properties of the page by setting "In der Sitemap anzeigen" to "Nie anzeigen" or "Standard". But then it wouldn't show up any more in the frontend module as well.
My settings at the moment are:
Im MenĂĽ verstecken: Activated/Checked
In der Sitemap anzeigen: Immer anzeigen
The privacy statement page is not to be included in the main navigation ("NavigationsmenĂĽ"), only in a modul "Individuelle Navigation" which is shown in the footer. It should also be included in the sitemap frontend module. So I think my settings are pretty much forced. If I set the page to "Standard" or "Nie anzeigen" it is not shown in the frontend module.
How exactly do I reproduce this?
I'm not sure, if the "Google Search Console Team" still sees this as a problem. I haven't got another mail about this "problem" since. I also didn't get one during the years before. So my case may be a purely academic problem at the moment.
Still I think, that the sitemap module and the XML-Sitemap are separate things and pages should have separate settings for both. Anyway, here goes ...
Now, the text elements page is not visible anymore in the navigation, but is still shown on the sitemap page and, unfortunately, also contained in the XML-Sitemap while its head section shows
<meta name="robots" content="noindex,nofollow">
And this was the reason for the email I received from Google in May 2019. The best thing would be just like we got it now, but the text elements page should not appear in the XML-Sitemap. And I couldn't find a combination of settings which does the trick (Page is not shown in navigation module and XML-Sitemap, page is shown in sitemap frontend module.)
As discussed in Mumble on September 26th, the XML sitemap should not contain pages that have the noindex,nofollow attribute but it should ignore the "Show in sitemap" setting, which is meant for the HTML sitemap module only.
Fixed in a67301f0384dfa9914713da591144ce2b6cb6c2c.
We are currently checking for noindex,nofollow. Shouldn't it just check for noindex instead?
if (strpos($objParent->robots, 'noindex') === 0)
Otherwise the page will still be in the sitemap.xml if you are using noindex,follow - and then Google will complain that you have pages in your sitemap.xml that should not be indexed.
Otherwise the page will still be in the
sitemap.xmlif you are usingnoindex,follow- and then Google will complain that you have pages in yoursitemap.xmlthat should not be indexed.
I had that, too.
I don't think so. After all, we want Google to follow the links on the site, so it has to be in the sitemap, hasn't it? @ausi /cc
Well the Google Search Console lists it as an error. Also I don't know of any use case where you would want to actually set noindex,nofollow.
Privacy statement pages are often good candidates for "noindex,nofollow" IMHO. To include them in the Google Index is - at least - not nexessary. And there are rarely any links on such pages to other internal pages. To guide the Google-Bot to follow the external links doesn't make much sense as well. Anyway, "nofollow" is irrelevant IMHO when we decide if it should be in the XML-Sitemap. The setting "index" or "noindex" ist obviously the only relevant one here.
Privacy statement pages are often good candidates for "noindex,nofollow" IMHO. To include them in the Google Index is - at least - not nexessary. And there are rarely any links on such pages to other internal pages.
Not sure I agree with that. Even your privacy statement page is part of your regular web site (usually) and thus most links on that page are still relevant for indexing. As long as there is _any_ link on a page, which in turn does _not_ contain noindex, using nofollow would be wrong.
To guide the Google-Bot to follow the external links doesn't make much sense as well.
Outbound links should be qualified with the rel attribute on the link itself anyway.
Anyway, "nofollow" is irrelevant IMHO when we decide if it should be in the XML-Sitemap. The setting "index" or "noindex" ist obviously the only relevant one here.
Agreed.
Die Einstellung "Nie anzeigen " wird unterschiedlich verwendet bezogen auf Modul und sitemap.xml?
Das mag zwar technisch alles richtig sein, aber das versteht doch wieder keiner.
Dann ändert das Label "In der Sitemap zeigen" in "Im Sitemap Modul zeigen" oder so, damit das eindeutiger wird.
If you set a page to noindex,follow it has to be in the sitemap.xml IMO, otherwise the search engine might not find the page and thus cannot follow the links on the page.
If Google has a problem with that we could change it I think, but does the Google Search Console show this as an error which makes the whole sitemap invalid? Or is it just a notice that the submitted URL cannot be added to the index because of the noindex meta tag?
The latter.
From my understanding it is technically correct then as it is implemented now.
From sitemaps.org:
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling.
And crawling of a noindex,follow page is desired, only indexing is not.
I’d recommend to set the robots meta tag to noindex,nofollow if you want to get rid of the Google Search Console error.
I’d recommend to set the robots meta tag to
noindex,nofollowif you want to get rid of the _Google Search Console_ error.
But that would be incorrect as well. You only use nofollow if you are sure, that none of the links on that page can be indexed. Which is usually not the case. As I said, I don't know of a real world use-case (within the site structure of a Contao installation) where you would ever want to set noindex,nofollow.
I agree. But I think we have to live with either the “correct” warning in the Google Search Console or the “incorrect” meta robots tag.
Or we introduce a separate setting for the Sitemap ;)
If you set a page to noindex,follow it has to be in the sitemap.xml IMO, otherwise the search engine might not find the page and thus cannot follow the links on the page.
I do not agree because the links the crawler should follow are listed in the sitemap.xml anyway if you want them to be indexed.
noindex should equal not in sitemap.xml
I do not agree because the links the crawler should follow are listed in the sitemap.xml anyway if you want them to be indexed.
If all the links to follow are in the sitemap.xml anyway you can safely set the meta robots tag to noindex,nofollow.
We have re-discussed this in Mumble on October 24th and we want to keep URLs to noindex,follow pages in the XML sitemap.
Dann ändert das Label "In der Sitemap zeigen" in "Im Sitemap Modul zeigen" oder so, damit das eindeutiger wird.
We want to distinguish between "XML sitemap" and "HTML sitemap" in the future, so we need to adjust the following:
Backend::findSearchablePages() to $blnIsXmlSitemapUnfortunately I (yet again) couldn't be in the Mumble.
This means, that I am forced to set a page to noindex,nofollow which is simply wrong ;). Any page within the Contao site structure is very likely to have a link to a page, that does _not_ have noindex. And again: what is a real world use-case for noindex,nofollow?
No, Google is wrong. If you have nonidex,follow, the page is perfectly placed in the sitemap.xml. Otherwise a search engine could never find these pages.
So imho that's clearly something Google has to fix. Maybe you can submit an issue?
You can keep using noindex,follow. Google just shows you that the page will not be indexed.
See #879.
A long time ago I read somewhere that nofollow is only intended for the case when you present links to "bad" sites. So that they are not crawled. Or in a guest book, which allows visitors to show external links. I would really have problems setting a site to nofollow that I don't want Google to index if the other links on that site are all "good" pages.
I would really have problems setting a site to nofollow that I don't want Google to index
You should set it to noindex,follow in that case.
A long time ago I read somewhere that
nofollowis only intended for the case when you present links to "bad" sites.
Can you link to the article where you read that?
You should set it to
noindex,followin that case.
Did I get something wrong? I thought I had to use the combination noindex, nofollow so that a page does not appear in the sitemap.xml.
Can you link to the article where you read that?
Oops, that was years ago, I no longer have the source. But I found a post about it.
https://moz.com/blog/nofollow-sponsored-ugc
https://webmasters.googleblog.com/2019/09/evolving-nofollow-new-ways-to-identify.html
If I interpret that correctly, then the nofollow has nothing to do with an instruction that concerns the indexing of a page. This means that noindex is the only value that is responsible for this and if this is set, this should result in Contao in the fact that the page does not come into the sitemap.xml.
The word noindex already expresses this. So why use nofollow at the same time?
The more I think about it, the more I feel like not using this Metatag Robots at all.
My sitemap.xml submitted to the search engine declares the pages that should be indexed. And if I have links where I would like to recommend the search engines not to follow them, I have to make sure that they get the attribute rel="nofollow".
I thought I had to use the combination
noindex, nofollowso that a page does not appear in the sitemap.xml.
If Google (or any search engine) should not index your page you should use noindex. If you want that the page is not included in your sitemap.xml you have to use noindex, nofollow. With noindex,follow the page will still show up in the sitemap.xml because it has to in order to be able to “follow” the links on that page.
nofollowis only intended for the case when you present links to "bad" sites.
I was not able to find something that would suggest this on the linked articles.
I was not able to find something that would suggest this on the linked articles.
Das war nur das, was ich seit Jahren im Hinterkopf behalten habe. Auf der ersten verlinkten Seite finde ich z.B.
rel=nofollow - Catch-all for all non-trusted links
Aus diesem Grund habe ich noch nie eine Seite auf nofollow gesetzt. @fritzmg kennt ja auch keinen Usecase dafĂĽr.
Aber ich habe da wohl einen Denkfehler, was die sitemap.xml betrifft. Ich dachte, diese wäre für die Indexierung verantwortlich. Aber nachdem ich jetzt nochmal alle eure Beiträge intensiv gelesen habe, sieht es wohl so aus, als wäre die sitemap.xml nur eine Liste der Seiten, welche gecrawlt werden sollen. Die eigentliche Anweisung eine Seite nicht zu indexieren steht dann im Metatag Robots der Seite selbst. Also kann ich eine Formular-Danke-Seite auf noindex,follow setzen und sie erscheint dann in der sitemap.xml, wird aber nicht indexiert.
Also kann ich eine Formular-Danke-Seite auf
noindex,followsetzen und sie erscheint dann in der sitemap.xml, wird aber nicht indexiert.
It won't be indexed by Google, but it will be indexed by Contao, if not disabled.
What does that mean? Indexed for the Contao search engine? If so, does that mean I additionally have to exclude the page from search?
Current status:
At the moment it is so that you can no longer exclude a page from the xml without using the attribute "nofollow".
For me this means that I now always have to have all pages in the xml, since I definitely don't want to give any of my pages the "nofollow" attribute.
Do I have a link on my pages that the search engines shouldn't follow, e.g. 3p domains, then I use rel="nofollow" for these links but not the "nofollow" attribute for the page.
Most helpful comment
Die Einstellung "Nie anzeigen " wird unterschiedlich verwendet bezogen auf Modul und sitemap.xml?
Das mag zwar technisch alles richtig sein, aber das versteht doch wieder keiner.
Dann ändert das Label "In der Sitemap zeigen" in "Im Sitemap Modul zeigen" oder so, damit das eindeutiger wird.