Warehouse: Consider the role of our internal bandersnatch mirror, and if it makes sense to continue to use it

Created on 16 Sep 2020  路  3Comments  路  Source: pypa/warehouse

We currently have an infrastructure where we have Fastly sitting in front of three origin servers:

  • Our Warehouse cluster
  • A Bandersnatch Mirror
  • GCP (I think?) Bucket storing files

Generally speaking the happy path is that files go directly from Fastly to S3, and /simple/ the happy path is from Fastly to our Warehouse cluster.

However, if our Warehouse cluster is down for some reason, then Fastly is configured to hit our internal Bandersnatch mirror, to keep pip install working.

This made a lot of sense when we initially deployed because pip install foo depended on Warehouse being up, and this meant even in the case Warehouse was down, that pip install would still work.

However, in the TUF work, we're now looking to move from generating the /simple/ files on demand, to pregenerating them and storing them inside of an object store. This means that hypothetically we could serve the /simple/ index similarly to how we serve the actual files, having Fastly directly contact the object store.

This raises the question that, if we have Fastly directly contacting the object store, is there a large benefit to continuing to maintain our own internal mirror? In what cases do we expect to actually fall back to the mirror if the Warehouse cluster is taken out of the "hot path" for pip install? Are any of those cases better handled by some form of native object store replication that can replicate a single bucket to another and having Fastly just round robin between them?

This also reminds me that we probably have to disable the bandersnatch mirror once TUF support lands, until bandersnatch gains support for TUF as well.

@ewdurbin @woodruffw @cooperlees @di

needs discussion

Most helpful comment

Yea will do for sure, I'm pretty sure the answer is just "more files to copy", but it will be good to have some clarity on the issue.

All 3 comments

Can someone please open an issue when it's fully known what bandersnatch will need to do for TUF please and I'll make it so :)

Yea will do for sure, I'm pretty sure the answer is just "more files to copy", but it will be good to have some clarity on the issue.

Just so it's known, I'm down to help/own the bandersnatch mirror for PyPI if we move forward with it (if it's of an advantage). I propose to ansible it all and commit that to https://github.com/python/pypi-infra.

@ewdurbin and I were also keen to change it to use S3, but @techalchemy never came forth with the s3 plugin he's supposedly written.

But totally down to not need this for PyPI if it makes sense. E.g. does it even make sense to run a second origin on a separate cloud if we have enough credits? I only see DB sync complexities there.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nlhkabu picture nlhkabu  路  4Comments

nlhkabu picture nlhkabu  路  4Comments

zt2 picture zt2  路  4Comments

toddrme2178 picture toddrme2178  路  3Comments

ruohoruotsi picture ruohoruotsi  路  3Comments