Readthedocs.org: Custom 404 for subdomains

Created on 12 Mar 2013  Â·  22Comments  Â·  Source: readthedocs/readthedocs.org

I just moved www.tornadoweb.org to readthedocs using a CNAME. This broke some links (which is not ideal but I can live with it), but the resulting 404 page is not helpful (e.g. www.tornadoweb.org/documentation/). The 404 page for a recognized subdomain/cname should include a link to that hostname's root. (of course, custom redirects for my domain would be awesome too)

Accepted Improvement

Most helpful comment

The PR was merged and deployed. I'd like to hear back from users that wanted to have a custom 404 if you were able to configure it properly: dropping a 404.html on the root of your documentation's output with absolute URLs for resources should be enough.

All 22 comments

Yea, I've thought a little about how to improve the 404 pages. I think also
having logic that looks for similarly named files and auto-redirects before
404ing would be pretty awesome as well. 404s for transferring projects give
me a big sad :(

Can you explain exactly what you want in the page? Basically just more
information about what the domain is, and other places on the domain they
might want to look? This seems sane to me.

Cheers,
Eric

On Mon, Mar 11, 2013 at 6:49 PM, bdarnell [email protected] wrote:

I just moved www.tornadoweb.org to readthedocs using a CNAME. This broke
some links (which is not ideal but I can live with it), but the resulting
404 page is not helpful (e.g. www.tornadoweb.org/documentation/). The 404
page for a recognized subdomain/cname should include a link to that
hostname's root. (of course, custom redirects for my domain would be
awesome too)

—
Reply to this email directly or view it on GitHubhttps://github.com/rtfd/readthedocs.org/issues/353
.

Eric Holscher
Maker of the internet residing in Portland, Or
http://ericholscher.com

It would be nice if the page could be branded as "Tornado" instead of "Read the Docs", and have the most prominent link on the page go to www.tornadoweb.org/ (the link at the top goes to the right place, but it looks like you're on the wrong site now). If I could upload my own html file that would be ideal (and then assuming I can run javascript I could do my own redirects from /documentation/ to /en/branch2.4/).

It would be pretty simple to handle smarter 404 logic, by adding it to this function: https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/core/views.py#L437 -- It would be nice to try and figure out a proper page to redirect to automatically, or at least give some possible pages they might want in the response as well.

It would be pretty nice if it was possible just to redirect to the projects custom 404.html if it exists. That way I can just create a 404.rst and add something to it that at least keeps the user on a page that looks like the rest of the site.

Thinking about this now, it should be pretty simple to do. We can look for a 404.html in the root directory of your docs. I've also thought it might be neat to create a customized 404 from RTD that's themed in your docs theme, which would fix a lot of the issues around breaking style during 404.

Looked into this. The 404.html at the root has relative URL's for the media files, so we can't use it for a generic 404 page. Need to either post-process the 404.html to make the media links absolute, or proxy the page somehow in a way that doesn't break things.

I think the best option would be to build a Sphinx extension that inserts a 404.rst if it doesn't exist, and then rewrites the linked media files on output. I'd be happy to integrate this into RTD if someone writes it :)

I wrote an initial implementation of this that needs some cleanup to deploy:

def html_collect_pages(app):
        return [('404', {'body': '<h1>Page not found</h1>\n\nThanks for trying.'}, 'page.html')]

def finalize_media(app, pagename, templatename, context, doctree):
    """ Point media files at our media server. """

    def pathto(otheruri, resource=False, baseuri='/'):
       "Hack pathto to display absolute URL's"
       if resource and '://' in otheruri:
           # allow non-local resources given by scheme
           return otheruri
       elif not resource:
           otheruri = app.builder.get_target_uri(otheruri)
       if otheruri and otheruri[0] != '/':
           otheruri = '/' + otheruri
       uri = otheruri or '#'
       return uri

    if pagename == '404':
         context['pathto'] = pathto


def setup(app):
    app.connect('html-collect-pages', html_collect_pages)
    app.connect('html-page-context', finalize_media)

It will also need updates to our nginx 404 settings, but getting the 404 pages working at any URL is the first step.

We can look for a 404.html in the root directory of your docs.

Adding some context here, this is what GitHub does: https://help.github.com/articles/creating-a-custom-404-page-for-your-github-pages-site/

We need to consider that we host docs per version, I guess we could have this:

  • /en/v1/no-existing -> /en/v1/404.html
  • /en/no-existing -> /en/default-version/404.html

And we could fallback to the own rtd 404 page if users don't have one.

Some context about our current setup

  1. first tries to serve an static HTML via NGINX,
  2. then fallbacks to Django which checks for redirects,
  3. if there is no redirects a returns a rendered version of 404.html

NOTE: some pieces of this configuration are not public.


Considerations

  1. 404 custom pages will on branding the docs completely (probably more important in the corporate site)
  2. all URLs have to be hardcoded (if you 404 on a /foo/bar/baz.html, and we load a 404 handler generated with relative paths in /404.html, the links for, say, js/something.js will be for /foo/bar/js/something.js -- so it's not straight forward how to use sphinx for this)

Idea of a potential solution

From Read the Docs server/config side, at step 3) of our current setup, we could check if the project has already a 404.html page under (resolve_path(project, version_slug=version.slug, language=language, filename='404.html') and serve it directly.

NOTE: I'm considering version_slug to get the 404.html file which may not make sense, and the default_version should be used (or at least, fallback to the default one)

From the user side, a plain HTML (with hardcoded URLs) has to be provided at /404.html. This file could be generated from a .rst if we write an Sphinx extension that convert all the relatives URLs to absolute ones based on some configs like domain, language and version.

If we could write this extension I think it would be good UX from the user perspective.

NOTE: this idea will only works with Sphinx, though.

From the user side, a plain HTML (with hardcoded URLs) has to be provided at /404.html. This file could be generated from a .rst if we write an Sphinx extension that convert all the relatives URLs to absolute ones based on some configs like domain, language and version.

I worked on this and I created the Sphinx extension (sphinx-notfound-page) for this based on @ericholscher's solution from this issue.

You can see a live example under Read the Docs: https://test-builds.readthedocs.io/en/custom-404-page/

This example does not everything we need yet because RTD source code needs some update as well to serve this 404.html page on all not found page.

If you check the source code of the 404 page served by the example, you will see that all the links are absolute. Example,

  <link rel="stylesheet" href="/en/latest/_static/css/theme.css" type="text/css" />
  <link rel="stylesheet" href="/en/latest/_static/pygments.css" type="text/css" />
  <link rel="index" title="Index" href="/en/latest/genindex.html" />
  <link rel="search" title="Search" href="/en/latest/search.html" /> 

So, it seems the only missing piece here is to modify the RTD source code to find this file and serve it if it does exists.

The PR was merged and deployed. I'd like to hear back from users that wanted to have a custom 404 if you were able to configure it properly: dropping a 404.html on the root of your documentation's output with absolute URLs for resources should be enough.

Works great!

@humitos We created a custom 404.rst in root and let ReadtheDocs render the pages. All paths that start like /en/latest/pagethatdoesnotexist show the 404.html perfectly in the theme but when that /en/latest part is missing the 404.html gets shown but the css is broken and the theme is missing. Any hints how to fix that? :)

@Solosneros yes, you have to use absolute links to make the resources load properly. This is the most important part.

You have 2 options to achieve this:

  1. create the HTML by hand hard-coding all the URL resources (and adding that 404.html page as a static file using the config html_extra_path)
  2. use the extension https://github.com/rtfd/sphinx-notfound-page that automatically generates the page for you with the proper URLs

@humitos thanks for the fast reply. I tried using your extension and the build keeps failing. Is the extension automatically included in ReadtheDocs.org?

Could not import extension sphinx-notfound-page (exception: No module named 'sphinx-notfound-page')

You probably need to add it to a requirements.txt file and let rtd know where that file is located.

@Solosneros no, you have to install it as any other dependency (https://docs.readthedocs.io/en/latest/guides/specifying-dependencies.html) as @kdheepak mentioned.

@humitos thanks, it works now :)

@humitos I was just looking for custom 404 page support.
Thank you for this extension :+1:

Separate thanks to @bdarnell for his foresight requesting this feature :smiley:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

humitos picture humitos  Â·  4Comments

enielse picture enielse  Â·  4Comments

humitos picture humitos  Â·  4Comments

jaraco picture jaraco  Â·  4Comments

SylvainCorlay picture SylvainCorlay  Â·  3Comments