Currently lnd will spin up the WalletUnlocker gRPC service at startup, allowing the user to unlock the wallet. First when the wallet is unlocked lnd can start up properly, shut down the WalletUnlocker and spin up the regular Lightning gRPC service, on the same host and port.
It was done this way originally because there was an assumption of the rpcServer operating that the wallet was unlocked. This way we could ensure the wallet was in an unlocked state before starting the regular rpcServer, not having to change it.
However, this can lead to a few problems that certain users run into:
wallet_state that can be accessed through a new API endpoint that will indicate whether the wallet is nonexisting, locked, unlocked.server)wallet_state. A possible avenue here is to add gRPC interceptor (similar to what we do for macaroons) that checks the current wallet_state, and rejects all non-WalletUnlocker calle if the wallet is no unlocked.cc @treygriffith Would this cover the issues you were seeing?
This is something I've thought about a little. It seems like a lot of people run into wallet unlocker issues, and more generally there is poor visibility into which rpc servers are currently available (the main rpc when we are locked, and whether or not a certain subserver is active).
A solution to address this could be to add a status rpc service to lnd which is always up, and reports on wallet unlocked status + the rpc servers that are available. It would probably only need one Status endpoint, but it would be important for it to be separate from the current wallet unlocked/ lightning switchover situation.
A rough example of what that would look like:
LockStatus: locked
RpcStatus:
LightningRPC: false
RoutingRPC: false
InvoicesRPC: false
LockStatus: unlocked
RpcStatus:
LightningRPC: true
RoutingRPC: false
InvoicesRPC: true
Rather than _merge_ the services, we could always just keep the WalletUnlocker active, but move it to a different port (or start it out using a distinct port all together). There's also an add-on that we can enable which exposes _service reflection_, which allows callers to query for which services are currently active.
Server reflection for Go specifically: https://github.com/grpc/grpc-go/blob/master/Documentation/server-reflection-tutorial.md#enable-server-reflection
merging the services would probably make this issue easier to solve: https://github.com/lightningnetwork/lnd/issues/3954
Merging the services might not work, as old clients expect two separate services. Will have to look into. Another way would be to just have both services running on the same grpcServer at all times.
@halseth yes this would address what we're seeing.
While it wouldn't solve all of the issues, having a wallet_state RPC that can always be accessed as long as LND is running would resolve most of the issues (even if running as a separate service, or as a duplicate API call on both services).
For reference, the issue with running multiple services on the same port is documented here: https://github.com/grpc/grpc-node/issues/993
The gRPC library is written with the assumption that different server instances on the same port are identical, and that any connection it can establish on that port is the server it's looking for.
There's yet another possible chicken-egg problem: unlocking/initializing the wallet doesn't need macaroon and the macaroon is not created until the wallet is initialized. I don't know enough about GRPC to tell for sure if it's possible for some calls to require the macaroon and not for others. However, skipping the encryption of macaroons and creating them always would solve this issue.
Skipping encryption is not a problem because admin macaroon has to be unencrypted on disk anyway as described in #899
Making the wallet password optional would solve tons of these problems for people who implement automatic wallet unlocking. I spent two days on it and I still get spurious fails of tests.
Investigated what needs to be done in order to have all services (main RPC, walletunlocker, sub-servers) running simultaneously. We will need to register all services with the grpc server before calling Serve, which poses a problem with our current setup: since we require the wallet to be unlocked before creating several of the rpc subcomponents, the RPCServer and sub-servers are not ready to be registered with the grpc server when we want to start serving the wallet unlocker service.
A potential way to work around this is to adapt a pattern of first creating "dummy services", which we later populate. As an example we would create an empty RPC server that would be registered with the grpc server, and only after the wallet was unlocked we would populate all its dependencies such that it could start responding to RPC call. Access to the calls could be gated behind a "RPC status" state machine, which would be useful anyway to implement the required WalletStatus RPC call.
The main thing that needs to happen to allow the above pattern is to allow creating the RPCServer and Subservers without the dependencies being ready. For the RPCServer that is pretty easy, for the sub-servers it would need a bit more work because of the way they are all behind an interface.
@halseth that sounds like a really good idea! Those "dummy services" could always return "wallet locked" error, which could then be detected by the clients. (This is valuable even in the presence of WalletStatus method because it avoids race conditions.)
Did you look at that macaroon issue I mentioned too? Would the solution you described solve it as well?
Did you look at that macaroon issue I mentioned too? Would the solution you described solve it as well?
I think those problems are a bit independent, but to my knowledge there are no problem only requiring macaroons for the non-walletunlocker RPC calls.
I'm considering giving this a try since it's starting to piss me off quite a lot. Is there any reason I shouldn't? I have nearly zero experience with Go but I'm decent when it comes to systems programming in general.
I'm considering giving this a try since it's starting to piss me off quite a lot. Is there any reason I shouldn't?
@Kixunil It depends on what you want to achieve really, and what your problem is. I have already coded up a POC for the unification, that I hope to finish and merge for v0.13: https://github.com/halseth/lnd/tree/walletunlocker-unify-oct20
If you want I can help you rebase it and see if it achieves what you need.
My understanding is that it's a requirement for #4470 because the socket is currently being closed and re-bound when unlocking the wallet. So my goal is to make sure all the listener sockets stay the same during the execution of LND. Then doing #4470 should be pretty easy. I've already seen a library for that.
Going to check if I can rebase it myself.
Hmm, the change is quite big and confusing at some places. If you have the time to rebase it that'd be great since you know the internals.
@Kixunil Yeah, it is very drafty at this point. Just something I coded up to see if it could work. But I've rebased it now, cannot guarantee it is flawless, but integration tests works so it should be a starting point :)
Thanks a lot! I will take a look at it soon.