Almanac.httparchive.org: Investigate 404 errors

Created on 11 Nov 2019  路  15Comments  路  Source: HTTPArchive/almanac.httparchive.org

In the production server logs I'm seeing lots of ambiguous error messages like this:

werkzeug.exceptions.NotFound: 404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.
at match (/env/lib/python3.7/site-packages/werkzeug/routing.py:1799)
at match_request (/env/lib/python3.7/site-packages/flask/ctx.py:336)
at raise_routing_exception (/env/lib/python3.7/site-packages/flask/app.py:1774)
at dispatch_request (/env/lib/python3.7/site-packages/flask/app.py:1791)
at full_dispatch_request (/env/lib/python3.7/site-packages/flask/app.py:1813)

At times the server is spiking at 200 404s per minute. (This is suspiciously high)

Sometimes this happens when a site doesn't have a favicon or something innocuous, but I can't imagine why we'd be having this many 404s unless there's a broken link somewhere.

Two things:

  • [ ] Improve error logging so we know what the broken link is and where it's coming from (cc @mikegeyser)
  • [ ] Rerun the SEO-style audit of the website so that we can more easily/proactively find broken links (#286 cc @AymenLoukil @catalinred @rachellcostello)
ASAP bug development

Most helpful comment

OK I got it.

We don't have a working 404 page - except for the routes we have defined (i.e. /static/XXX or /lang/year/XXX).

This repeats the error: http://127.0.0.1:8080/en/ for example, as does https://127.0.0.1:8080/anythingrandom - because we have no routes matching those patterns.

It shows an error page instead of the 404 page and returns a 500 to the browser, though it did start life as a 404:

ERROR:root:An error occurred during a request due to page not found: /en/
Traceback (most recent call last):
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1791, in dispatch_request
    self.raise_routing_exception(req)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1774, in raise_routing_exception
    raise request.routing_exception
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/ctx.py", line 336, in match_request
    self.url_adapter.match(return_rule=True)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/werkzeug/routing.py", line 1799, in match
    raise NotFound()
werkzeug.exceptions.NotFound: 404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.
INFO:werkzeug:127.0.0.1 - - [11/Nov/2019 20:04:47] "GET /en/ HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1791, in dispatch_request
    self.raise_routing_exception(req)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1774, in raise_routing_exception
    raise request.routing_exception
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/ctx.py", line 336, in match_request
    self.url_adapter.match(return_rule=True)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/werkzeug/routing.py", line 1799, in match
    raise NotFound()
werkzeug.exceptions.NotFound: 404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 2309, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 2295, in wsgi_app
    response = self.handle_exception(e)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1741, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1713, in handle_user_exception
    return self.handle_http_exception(e)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1644, in handle_http_exception
    return handler(e)
  File "/Users/barry/almanac.httparchive.org/src/main.py", line 145, in page_not_found
    return render_template('error/404.html', error=e), 404
  File "/Users/barry/almanac.httparchive.org/src/main.py", line 18, in render_template
    year = request.view_args.get('year', DEFAULT_YEAR)
AttributeError: 'NoneType' object has no attribute 'get'

Adding a default route like this fixes it:

@app.route('/', defaults={'path': ''})
@app.route('/<path:path>')
def catch_all(path):
    abort(404, 'barry was here')

And I know this fixes it as it returns our correct 404 page and gives that exact error message on it (barry was here) so I know it's making it to this route.

Other posts seem to suggest that is how this should work, and I've tested and the other routes still work (home page, chapters, methodology...etc.) as well as static pages, sitemap.xml ...etc.

Will submit a PR, though suppose I should change the 404 error message 馃榾

However I'm also going to add a case to handle that /en/ case and redirect to default year:

@app.route('/<lang>/')
@validate
def lang_only(lang):
    return redirect(url_for('home', lang=lang, year=DEFAULT_YEAR))

All 15 comments

This error may be related:

AttributeError: 'NoneType' object has no attribute 'get'
at render_template (/srv/main.py:17)
at page_not_found (/srv/main.py:136)
at handle_http_exception (/env/lib/python3.7/site-packages/flask/app.py:1644)
at handle_user_exception (/env/lib/python3.7/site-packages/flask/app.py:1713)
at full_dispatch_request (/env/lib/python3.7/site-packages/flask/app.py:1815)
at wsgi_app (/env/lib/python3.7/site-packages/flask/app.py:2292)

But it's similarly ambiguous.

I added the favicon with https://github.com/HTTPArchive/almanac.httparchive.org/pull/438 btw.

Also see this with <link rel="apple-touch-icon" so could be that - and we probably should add anyway even if not that this time.

PWA has a link to ./mobile instead of ./mobile-web but doubt that's the cause.

I run a crawl and here are the links generating errors :

https://almanac.httparchive.org/en/2019/%5D(https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS) from https://almanac.httparchive.org/en/2019/resource-hints

https://almanac.httparchive.org/static/images/2019/05_Third_Parties/fig7.png from https://almanac.httparchive.org/en/2019/third-parties

https://almanac.httparchive.org/static/images/2019/08_Security/fig1.png from https://almanac.httparchive.org/en/2019/security

https://www.ssllabs.com/ssl-pulse/) from https://almanac.httparchive.org/en/2019/security

https://almanac.httparchive.org/static/images/2019/08_Security/fig8.png from https://almanac.httparchive.org/en/2019/security

https://almanac.httparchive.org/static/images/2019/08_Security/fig3.png from https://almanac.httparchive.org/en/2019/security

https://almanac.httparchive.org/static/images/2019/08_Security/fig2.png from https://almanac.httparchive.org/en/2019/security

https://fonts.gstatic.com/ from https://almanac.httparchive.org/en/2019/fonts

https://rainy-periwinkle.glitch.me/permalink/bc8f154a95dfe06a6d0fdb099b6c8df61727b2289141a0ef16dc17b2b57d3068.html from https://almanac.httparchive.org/en/2019/markup
https://rainy-periwinkle.glitch.me/permalink/3214f840b6ae3ef1074291f60fa1be4b9d9df401fe0190bfaff4bb078c8614a5.html from https://almanac.httparchive.org/en/2019/markup

Modify these links to HTTPS :

http://speedcurve.com/ from https://almanac.httparchive.org/en/2019/contributors
http://paulcalvano.com/ from https://almanac.httparchive.org/en/2019/contributors
http://www.filamentgroup.com/ from https://almanac.httparchive.org/en/2019/fonts

Fixed all the ones I could as part of https://github.com/HTTPArchive/almanac.httparchive.org/pull/455

Remaining are:

  • Third Party missing figure
  • Security images (https://github.com/HTTPArchive/almanac.httparchive.org/issues/237#issuecomment-552231431).
  • Markup links

Images should be fixed now.

We'll need @bkardell's help to resolve the Glitch URLs.

How's it looking now @rviscomi ? Any reduction in errors? Any more detail as to what pages are missing?

Still seeing the errors and it doesn't look like the logging changed helped debugging. The error is lower level than our messaging.

image

OK I got it.

We don't have a working 404 page - except for the routes we have defined (i.e. /static/XXX or /lang/year/XXX).

This repeats the error: http://127.0.0.1:8080/en/ for example, as does https://127.0.0.1:8080/anythingrandom - because we have no routes matching those patterns.

It shows an error page instead of the 404 page and returns a 500 to the browser, though it did start life as a 404:

ERROR:root:An error occurred during a request due to page not found: /en/
Traceback (most recent call last):
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1791, in dispatch_request
    self.raise_routing_exception(req)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1774, in raise_routing_exception
    raise request.routing_exception
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/ctx.py", line 336, in match_request
    self.url_adapter.match(return_rule=True)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/werkzeug/routing.py", line 1799, in match
    raise NotFound()
werkzeug.exceptions.NotFound: 404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.
INFO:werkzeug:127.0.0.1 - - [11/Nov/2019 20:04:47] "GET /en/ HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1791, in dispatch_request
    self.raise_routing_exception(req)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1774, in raise_routing_exception
    raise request.routing_exception
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/ctx.py", line 336, in match_request
    self.url_adapter.match(return_rule=True)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/werkzeug/routing.py", line 1799, in match
    raise NotFound()
werkzeug.exceptions.NotFound: 404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 2309, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 2295, in wsgi_app
    response = self.handle_exception(e)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1741, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1713, in handle_user_exception
    return self.handle_http_exception(e)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1644, in handle_http_exception
    return handler(e)
  File "/Users/barry/almanac.httparchive.org/src/main.py", line 145, in page_not_found
    return render_template('error/404.html', error=e), 404
  File "/Users/barry/almanac.httparchive.org/src/main.py", line 18, in render_template
    year = request.view_args.get('year', DEFAULT_YEAR)
AttributeError: 'NoneType' object has no attribute 'get'

Adding a default route like this fixes it:

@app.route('/', defaults={'path': ''})
@app.route('/<path:path>')
def catch_all(path):
    abort(404, 'barry was here')

And I know this fixes it as it returns our correct 404 page and gives that exact error message on it (barry was here) so I know it's making it to this route.

Other posts seem to suggest that is how this should work, and I've tested and the other routes still work (home page, chapters, methodology...etc.) as well as static pages, sitemap.xml ...etc.

Will submit a PR, though suppose I should change the 404 error message 馃榾

However I'm also going to add a case to handle that /en/ case and redirect to default year:

@app.route('/<lang>/')
@validate
def lang_only(lang):
    return redirect(url_for('home', lang=lang, year=DEFAULT_YEAR))

Good find!

Here are some weird findings from the production server logs:

404: /static/images/favicon.ico/static/images/favicon.ico (Firefox)
404: /static/123 (Safari)
404: /static/images/home-hero-bg.pnghttps://almanac.httparchive.org/en/2019/ (bitlybot)
404: /static/images/apple-touch-icon.png/static/images/apple-touch-icon.png (Chrome 77)

404: /static/123 (Safari)

Sure this was Safari? Think I tested that one on production from Chrome 馃榾

BTW as we had a route for /static/ all 4 of those examples error in the same way with and without my fix.

Some people just ask for weird stuff!

image

I'm still seeing vague 404 error messages in Stackdriver:

image

However, the actual App Engine server logs are no longer showing any meaningful errors on things like broken images or bad requests, so I'm comfortable closing this issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bazzadp picture bazzadp  路  4Comments

ibnesayeed picture ibnesayeed  路  5Comments

rviscomi picture rviscomi  路  3Comments

MSakamaki picture MSakamaki  路  6Comments

bazzadp picture bazzadp  路  4Comments