We currently have an infrastructure where we have Fastly sitting in front of three origin servers:
Generally speaking the happy path is that files go directly from Fastly to S3, and /simple/ the happy path is from Fastly to our Warehouse cluster.
However, if our Warehouse cluster is down for some reason, then Fastly is configured to hit our internal Bandersnatch mirror, to keep pip install working.
This made a lot of sense when we initially deployed because pip install foo depended on Warehouse being up, and this meant even in the case Warehouse was down, that pip install would still work.
However, in the TUF work, we're now looking to move from generating the /simple/ files on demand, to pregenerating them and storing them inside of an object store. This means that hypothetically we could serve the /simple/ index similarly to how we serve the actual files, having Fastly directly contact the object store.
This raises the question that, if we have Fastly directly contacting the object store, is there a large benefit to continuing to maintain our own internal mirror? In what cases do we expect to actually fall back to the mirror if the Warehouse cluster is taken out of the "hot path" for pip install? Are any of those cases better handled by some form of native object store replication that can replicate a single bucket to another and having Fastly just round robin between them?
This also reminds me that we probably have to disable the bandersnatch mirror once TUF support lands, until bandersnatch gains support for TUF as well.
@ewdurbin @woodruffw @cooperlees @di
Can someone please open an issue when it's fully known what bandersnatch will need to do for TUF please and I'll make it so :)
Yea will do for sure, I'm pretty sure the answer is just "more files to copy", but it will be good to have some clarity on the issue.
Just so it's known, I'm down to help/own the bandersnatch mirror for PyPI if we move forward with it (if it's of an advantage). I propose to ansible it all and commit that to https://github.com/python/pypi-infra.
@ewdurbin and I were also keen to change it to use S3, but @techalchemy never came forth with the s3 plugin he's supposedly written.
But totally down to not need this for PyPI if it makes sense. E.g. does it even make sense to run a second origin on a separate cloud if we have enough credits? I only see DB sync complexities there.
Most helpful comment
Yea will do for sure, I'm pretty sure the answer is just "more files to copy", but it will be good to have some clarity on the issue.