Sphinx: Consider removing old Sphinx documentation versions from Google search

Created on 11 Oct 2020  Â·  4Comments  Â·  Source: sphinx-doc/sphinx

A frustration that I often have with Sphinx is that Google search results consistently bring up (often very) old documentation pages for Sphinx.

For example, a recent search for sphinx glob toctree brought up 3 different Sphinx versions as the top results:

image

The first result was from Sphinx 1.5.

I think this kind of thing confuses newcomers because they'll find themselves on an out-of-date page but don't realize that there's a newer version. It is also frustrating for current developers because they have to click-through multiple versions of a page before getting to the latest-documented version.

I believe that Sphinx is hosted on ReadTheDocs, and I believe they have support for hiding older versions of documentation (so they're still reachable but won't show up in search results). I think that Sphinx should consider hiding all documentation versions except for the last few releases, or hiding all but stable/ and latest/ from google searches.

I looked up in RTD how to do this, and found this page on hidden versions and this page on robots.txt but not quite sure if those are the right pattern. Maybe @ericholscher or @stsewd have thoughts?

docs enhancement

Most helpful comment

Agreed. I think it's fine to show only the master and latest release versions, too.

For now, I've hidden 1.2-1.8, which I'm sure you won't object to by hiding them.
image

This operation on the ReadTheDocs also disables robots.txt.
https://www.sphinx-doc.org/robots.txt

User-agent: *

Disallow: /en/1.8/ # Hidden version

Disallow: /en/1.7/ # Hidden version

Disallow: /en/1.6/ # Hidden version

Disallow: /en/1.5/ # Hidden version

Disallow: /en/1.4/ # Hidden version

Disallow: /en/1.3/ # Hidden version

Disallow: /en/1.2/ # Hidden version

Sitemap: https://www.sphinx-doc.org/sitemap.xml

However, direct access to the hidden version is still valid. This will not break direct links to the old version from other sites.
(e.g. https://www.sphinx-doc.org/en/1.8/ )

If there are no other objections, I will also hide 2.0 on en and modify versions on other languages.

All 4 comments

Yeah, using hidden versions or a custom robots.txt is the solution. If you hide a version, it won't be listed here

Screenshot_2020-10-11 Overview — Sphinx 4 0 0+ documentation

But they will still be listed here https://readthedocs.org/projects/sphinx/versions/ (maybe we should link to this page when there are hidden versions).

If you don't want that behavior, using a custom robots.txt should be fine.

+1: I feel old documents are not useful for users. What do you think? @shimizukawa @stephenfin

Agreed. I think it's fine to show only the master and latest release versions, too.

For now, I've hidden 1.2-1.8, which I'm sure you won't object to by hiding them.
image

This operation on the ReadTheDocs also disables robots.txt.
https://www.sphinx-doc.org/robots.txt

User-agent: *

Disallow: /en/1.8/ # Hidden version

Disallow: /en/1.7/ # Hidden version

Disallow: /en/1.6/ # Hidden version

Disallow: /en/1.5/ # Hidden version

Disallow: /en/1.4/ # Hidden version

Disallow: /en/1.3/ # Hidden version

Disallow: /en/1.2/ # Hidden version

Sitemap: https://www.sphinx-doc.org/sitemap.xml

However, direct access to the hidden version is still valid. This will not break direct links to the old version from other sites.
(e.g. https://www.sphinx-doc.org/en/1.8/ )

If there are no other objections, I will also hide 2.0 on en and modify versions on other languages.

An other (possibly better) option is <link rel="canonical">, it doesn't prevent or stop search engines from indexing older versions of the documentations, and thus doesn't prevent users from finding them if they need to, however it would lead the non-canonical versions to "feed into" the canonical ones, and should lead search engine bots to favor crawling the canonical page.

Was this page helpful?
0 / 5 - 0 ratings