Shields: Require production traffic to pass through the CDN

Created on 18 Feb 2019  路  11Comments  路  Source: badges/shields

Requests that hit the production servers via img.shields.io come through Cloudflare. However, the server IP addresses are well known, and accept direct requests. This circumvents Cloudflare's DOS protections and makes it useless in an attack like the one documented in #2991. (There is some protection in place on the server, but it is trivial to bypass.)

It's nice to be able to send test requests directly to the server, and browse the UI on the server for testing, though perhaps we can authenticate those using a header instead. Perhaps we could require HTTP Basic auth for requests that don't come from the CDN, and share the credentials with all the maintainers.

Some useful reference here:

Related: we could consider letting Cloudflare handle our rate limiting instead of using our own code on the servers.

core operations

Most helpful comment

Totally agreed with all the above. Additionally, I also think it would be beneficial to have the ability to completely block all requests not in an allowed list whitelisted of IP ranges (traffic routed via CloudFlare, certain maintainer IPs, etc.) from getting to the servers in the first place. I assume this could be handled via firewall/networking features of our infra provider.

All 11 comments

Totally agreed with all the above. Additionally, I also think it would be beneficial to have the ability to completely block all requests not in an allowed list whitelisted of IP ranges (traffic routed via CloudFlare, certain maintainer IPs, etc.) from getting to the servers in the first place. I assume this could be handled via firewall/networking features of our infra provider.

I'm not sure if OVH allows configuring that in the UI or if it would have to go in the firewall config on the VPS. I don't have access to either.

I gave this a shot in #5666.

The Shields production Heroku hostname is pretty easy to guess if not already public, so it still seems like a very good idea to require that the production traffic actually come from Cloudflare.

This prevents us from firing requests to Heroku directly, but that seems fine. We already can't address requests to specific dynos, and I don't see any reason why e.g. the admin endpoints couldn't be accessed through the CDN.

This prevents us from firing requests to Heroku directly, but that seems fine.

IIRC we're pushing metrics from the badge servers now vs. the Prometheus pull model correct?

This is not at the top of my head either 馃檲 but yea, there's already not a way to fetch them from each dyno separately.

I think this is where we push them:

https://github.com/badges/shields/blob/3a23695f8984ea616cc72a93fcf4779c2a2aeed9/core/server/influx-metrics.js#L47-L52

I flipped this on for a couple seconds in production and was unable to load static badges from https://shields.io/, so clearly this is not working as intended.

Should we set up our staging environment to go through Cloudflare, so that we can test this out in staging?

Should we set up our staging environment to go through Cloudflare, so that we can test this out in staging?

I think that's fine, as well as the next best step.

Yeah I guess something like that is the next step. This is one of those problems where our deployment/review/branch protection setup makes it hard to debug/diagnose. Tbh, if there's going to be some trial and error involved in debugging this it might even be useful to temporarily put a shields-staging-pr-xxxx.herokuapp.com behind CloudFlare so we can push commits to a dev branch and debug it?

I think, through the Heroku console, we could manually deploy a test branch to the staging app. You have to configure the app with the custom domain so I think it's probably better that we set it up once and then keep using it.

Alternatively if we want to leave staging pristine, we could create a "test" environment instead, that we spin up and down as needed.

Added: The first pass of this is online: https://staging.shields.io/. It's proxying the same as img.shields.io. I haven't added a page rule, so the caching behavior will be different from img.shields.io.

This is now turned on in production (via the REQUIRE_CLOUDFLARE env var) and seems to be working!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

salaros picture salaros  路  3Comments

Turnerj picture Turnerj  路  3Comments

korenyoni picture korenyoni  路  3Comments

chadwhitacre picture chadwhitacre  路  4Comments

techtonik picture techtonik  路  3Comments