Go-ipfs: Repo migration using IPFS

Created on 18 Sep 2017  路  11Comments  路  Source: ipfs/go-ipfs

Currently, we download the repo migration tool using HTTPS from the gateways as necessary. Really, we should dogfood our own tech and use IPFS. The wrinkle is that we currently need to do the repo migration in order to start IPFS (because we need a working repo).

However, there's actually a simple solution to this. We can:

  1. Start IPFS.
  2. Notice that the repo is out-of-date.
  3. Create a new "transition" repo and config (possibly in /tmp, use the same ports for firewall reasons but a new temporary identity as this won't really be the same node).
  4. Fetch the migration tool with IPFS_HOME=tmp_ipfs_repo ipfs get /... (without starting the daemon).
  5. Run the migration tool.
  6. Continue booting the main IPFS daemon.
  7. (optionally) Open the migration repo and copy data out of it and into the main datastore.
  8. Delete the temporary datastore.
help wanted kinenhancement statudeferred

Most helpful comment

This can be an option, but we should still use the gateway as a backup option. We could prompt the user if they want to retrieve the file over the ipfs network and when they say no provide instructions to get the upgrade by retrieving via the gateway (likely a command line flag).

All 11 comments

Could also create an in memory repo and with the (not yet implemented) ipfs get --filestore feature, stream it straight to disk

This can be an option, but we should still use the gateway as a backup option. We could prompt the user if they want to retrieve the file over the ipfs network and when they say no provide instructions to get the upgrade by retrieving via the gateway (likely a command line flag).

but we should still use the gateway as a backup option

What's the rational for keeping this option? Providing instructions for manually downloading and running the migration tool in case of failure is reasonable but otherwise, IPFS should work (and if it doesn't, we need to make it work).

IPFS should work (and if it doesn't, we need to make it work).

I always prefer the more convective approach. Yes it should work, except when it doesn't. And in my experience, in general, things don't work a lot of the time for me. I suppose the fully manual approach can be reasonable if it is a very simple process and does the exact same thing the automatic approach does.

I agree that it would be nice to use IPFS as the first choice where possible.
However, since the functionality to download over HTTP already exists, I feel like it should be kept as a backup solution and used conditionally, prioritize IPFS itself first and fallback to HTTP when appropriate.
Maybe we could count peers, providers, or some other metric and fallback there if we encounter IPFS problems, in addition users could set an env var IPFS_USE_GATEWAY, IPFS_USE_HTTP, or something to that effect to always prioritize HTTP.

Since manual steps are documented here: https://github.com/ipfs/fs-repo-migrations/blob/master/run.md
we could probably just link to the repo if a message to the user is needed.

What's the rational for keeping this option?

Nodes on my networks don't really work unless configured for releay due to weird CG-NAT. I have to configure them properly. I might setup a notebook with remote access for someone to test it out if I don't find time to fix it in go-ipfs itself.

Fair enough (although we should eventually try to make this work).

Nodes on my networks don't really work unless configured for releay due to weird CG-NAT.

Can they not even establish outbound connections? That's all that's needed in this case (connect to one of our nodes, fetch the migration over bitswap).

Can they not even establish outbound connections?

Not always if reuseport is enabled.

No always if reuseport is enabled.

Ah. I forgot about that issue... (https://github.com/libp2p/go-tcp-transport/issues/18). We really should find a nice way to fix that.

Say you are upgrading Node A, before the upgrade starts a docker container Node B is spun up, and the repo contents from Node A are then copied over to Node B, after which the migration on Node A is performed. Once the migration is complete, the repo contents are moved back over to Node A, checks are done to ensure everything was moved back successfully, Node B gets destroyed and Node A is brought back online with the migrated repo.

Would a solution like this work? This would mean that we avoid hitting the public gateway, thereby consuming 0 internet bandwidth, allowing all migrations and traffic to happen over the local network. Which not only should be faster than hitting the public gateway, but will also save users $ (albeit small).

In theory this should allow Node A to continue serving requests, only having to be brought down to migrate the data allowing Node B to serve requests when Node A is down, which in theory should allow for 0 down-time migrations.

So, the issue there is that copying repos is slow and requires 2x the disk space. IMO, if you need high availability, you should be using ipfs-cluster. In that case, as long as you have a replication factor > 2, you can simply bring down and migrate each node one-by-one.

Was this page helpful?
0 / 5 - 0 ratings