Output From ipfs version --all
go-ipfs version: 0.4.18-
Repo version: 7
System version: amd64/linux
Golang version: go1.11.1
Bug
When running an ipfs command that is not ipfs daemon, the existence of the file $IPFS_PATH/api decides whether to use the HTTP API to complete the command or not (eg. ipfs id does this). This happens in fsrepo.go
The file $IPFS_PATH/api is created when the ipfs daemon command is started. the problem is that when the daemon process is shut down via a signal it does not remove $IPFS_PATH/api.
This leads to behaviour that after an ipfs daemon process has been started and terminated, other ipfs commands that can be fulfilled with or without the HTTP api (eg. ipfs id) will always fail with the message
Error: api not running
A minimal example that reproduces the failure is shown below:
#!/bin/bash
set -x
set -e
IPFS=/path/to/your/ipfs
function run_it() {
export IPFS_PATH=$(mktemp -d)
$IPFS init
$IPFS id # completes fine
$IPFS daemon &
sleep 5
kill $!
$IPFS id # will always fail
}
run_it
I don't think this is the intended behaviour (?).
If there isn't already a signal handler to remove $IPFS_PATH/api (via FSRepo.Close when the daemon is interrupted then could we add one for this?
Alternatively, if the command fails in this manner but it can be fulfilled without the ipfs HTTP api why not do it without HTTP and remove $IPFS_PATH/api at that point?
Thanks for the bug report, and the script to reproduce. My api directory _does_ get cleaned out on macos but I see the same behavior you experienced on linux. That's definitely messed up.
@cboddy
The ipfs daemon may take a little while to shutdown after you kill it with a signal, so if you try and use a command while it is shutting down you will get this error. Try adding a sleep after the kill and it should work. A better error message could be useful though.
@kevina thanks for the suggestion but the file $IPFS_PATH/api is never removed on linux after being shutdown via a signal so
A simple work-around in Peergos (where we manage an ipfs runtime programmatically) is to remove $IPFS_PATH/api when shutting down the daemon.
However, it is surprising behaviour that requires time to understand, hence the bug-report.
@cboddy Thanks so much for reporting this in such a clear and descriptive way. I'm personally assigning myself here because this was the error I encountered when trying IPFS for the very first time and drove me (out of my frustration as a newbie who wants it all and wants it now) pretty close to leaving the project altogether.
go-ipfs version: 0.4.19-dev-af73c50
Repo version: 7
System version: amd64/linux
Golang version: go1.11.1
This works for me:
#!/bin/bash
set -x
set -e
IPFS=ipfs
function run_it() {
export IPFS_PATH=$(mktemp -d)
$IPFS init
$IPFS id # completes fine
$IPFS daemon &
sleep 5
kill $!
#$IPFS id # will fail
sleep 10
$IPFS id # ok
}
run_it
@cboddy; @kevina is right in my case. My api directory is consistently deleted, and ipfs id succeeds some brief time after the signal is sent. @schomatis are you able to reproduce the behavior in this bug?
@cboddy - if you run the daemon (in you script for instance) with the -D option and look at the last dozen lines of output do you see logging from the cleanup code like?
10:57:25.463 DEBUG dht: Error unmarshaling data: context canceled dht_net.go:69
Received interrupt signal, shutting down...
10:57:25.463 DEBUG bitswap: bitswap task worker shutting down... workers.go:88
(Hit ctrl-c again to force-shutdown the daemon.)
10:57:25.464 DEBUG bitswap: bitswap task worker shutting down... workers.go:88
10:57:25.464 DEBUG bitswap: bitswap task worker shutting down... workers.go:85
10:57:25.464 DEBUG bitswap: bitswap task worker shutting down... workers.go:88
10:57:25.464 DEBUG bitswap: bitswap task worker shutting down... workers.go:85
10:57:25.464 INFO core/serve: server at /ip4/127.0.0.1/tcp/8080 terminating... corehttp.go:107
10:57:25.464 DEBUG bitswap: bitswap task worker shutting down... workers.go:85
10:57:25.464 DEBUG bitswap: bitswap task worker shutting down... workers.go:85
10:57:25.464 DEBUG bitswap: bitswap task worker shutting down... workers.go:88
10:57:25.464 WARNI swarm2: swarm listener accept error: process closing swarm_listen.go:77
10:57:25.464 INFO core/serve: server at /ip4/127.0.0.1/tcp/5001 terminating... corehttp.go:107
10:57:25.464 DEBUG bitswap: provideKeys channel closed workers.go:126
10:57:25.464 INFO core/serve: server at /ip4/127.0.0.1/tcp/8080 terminated corehttp.go:127
10:57:25.464 DEBUG core: core is shutting down... core.go:632
10:57:25.464 INFO core/serve: server at /ip4/127.0.0.1/tcp/5001 terminated corehttp.go:127
10:57:25.464 DEBUG blockservi: blockservice is shutting down... blockservice.go:324
10:57:25.464 WARNI swarm2: swarm listener accept error: accept tcp [::]:4001: use of closed network connection swarm_listen.go:77
10:57:25.465 WARNI swarm2: swarm listener accept error: accept tcp 0.0.0.0:4001: use of closed network connection swarm_listen.go:77
10:57:25.507 DEBUG mdns: mdns query complete mdns.go:142
10:57:25.507 DEBUG mdns: mdns service halting mdns.go:148
10:57:25.654 INFO cmd/ipfs: Gracefully shut down daemon daemon.go:351
@schomatis are you able to reproduce the behavior in this bug?
Yes, if we really kill the process it won't cleanup the api (whatever that means). Granted, this can be fixed by running ipfs daemon again but the user has no way of knowing that and any other command will fail with that cryptic message. At the very least this is a documentation issue but we should find a more elegant way of handling kills/crashes like this one.
#!/bin/bash
set -x
set -e
IPFS=ipfs
function run_it() {
export IPFS_PATH=$(mktemp -d)
$IPFS init
$IPFS id # completes fine
$IPFS daemon &
sleep 10
kill -SIGKILL $! # Die fast, don't wait on any cleanup.
! $IPFS id # Allow this command to bypass `-e` check.
while [ $? -eq 0 ]; do
sleep 3
! $IPFS id
done # This never ends.
}
run_it
I think I鈥檇 say if SIGTERM (or like SIGINT) doesn鈥檛 clean the api directory that鈥檚 a bug and if SIGKILL doesn鈥檛 clean it then we could do better but it鈥檚 debatable. What鈥檚 the right thing? Should we treat api like a pidfile and exclusive lock it to indicate the presence of the daemon instead of create/delete?
I removed the bug label and changed this it to enhancement. It is a bug if killing SIGTERM/SIGINT doesn't (eventually) remove the $IPFS_PATH/api file. When killed with SIGKILL there is no way to clean this up. We should be more intelligent about what to do afterwards. As previously mentioned if ipfs daemon is run again it will clean up the stale state, we should perhaps do that all for any command that requires the repo or at least return a better error message.
Actually @cboddy if you can confirm again that your api directory isn't removed on SIGTERM then it's a bug of some sort. We just can't reproduce it ourselves. I mean at least I can't. Did you check the debug logging output to see if cleanup operations are taking place?
This is basically a duplicate of https://github.com/ipfs/go-ipfs/issues/2395.
I'm personally assigning myself here because this was the error I encountered when trying IPFS for the very first time and drove me (out of my frustration as a newbie who wants it all and wants it now) pretty close to leaving the project altogether.
If this issue (which is labeled as a bug in the dupe'd issue) is so painful that some of our dedicated community members almost dropped the project over it - imagine how many people ran into this issue and didn't have the perseverance/faith to push through that frustration to become a long term contributor. I think that is sufficient reason to prioritize resolving this issue (aka clean this file up consistently) or at least add much better error handling to enable easy debugging and not burn contributor time on this rough edge.
So, the bug here isn't that the API file isn't getting cleaned up on KILL (@kevina's right, there's nothing we can do about that). The bug is that we're not cleaning it up later. This came up here as well:
https://github.com/ipfs-shipyard/ipfs-desktop/pull/722#issuecomment-444310856
The tricky part is that having an API file without running a daemon is actually a feature: This is how users tell IPFS to use a remote IPFS instance.
However, while fixing this in ipfs-desktop is a bit tricky, it's actually pretty easy to fix here:
Step 2 will fail if the local repo is either locked (a daemon is running) or the local repo hasn't been initialized (we're using a remote daemon).
The downside is that this prevents users from switching back and forth between a local and a remote daemon. I'm not sure if that's something we really need to support.
So, the bug here _isn't_ that the API file isn't getting cleaned up on KILL (@kevina's right, there's nothing we can do about that). The bug is that we're not cleaning it up later.
Yes, the KILL was just an example, the system can also crash and we would be in the same scenario.
The tricky part is that having an API file without running a daemon is actually a _feature_: This is how users tell IPFS to use a _remote_ IPFS instance.
That may be so, but the first steps any user will perform don't acknowlegde that, nor the importance of having a daemon running, nor the basic client-server architecture we implement here:
From my own personal interpretation of reading the docs, the idea we're normally selling here (and that's something we may need to revise) is that you can just run ipfs <any-command> without caring about the daemon; and we also have the feature to share your personal IPFS piece of land (repo) with the rest of the world by doing all of the commands you were already doing while the daemon is running.
- Try connecting to the remote daemon.
- If that fails, just run the command locally (ignoring the API file).
Step 2 will fail if the local repo is either locked (a daemon is running) or the local repo hasn't been initialized (we're using a remote daemon).
If we all agree on this I'll try to apply this fix.
From my own personal interpretation of reading the docs
In all cases where we have a valid local repo, yes.
The downside is that this prevents users from switching back and forth between a local and a remote daemon. I'm not sure if that's something we really need to support.
We can instruct users to create a separate IPFS_PATH with api file referencing remote daemon end use that.
We should also prevent ipfs init from working if api file is preset
go-ipfs version: 0.4.19-dev-af73c50 Repo version: 7 System version: amd64/linux Golang version: go1.11.1This works for me:
#!/bin/bash set -x set -e IPFS=ipfs function run_it() { export IPFS_PATH=$(mktemp -d) $IPFS init $IPFS id # completes fine $IPFS daemon & sleep 5 kill $! #$IPFS id # will fail sleep 10 $IPFS id # ok } run_it
@kevina thanks for that; yes that works for me as well.
I'd also suggest pinging the remote-endpoint defined in the api file to ensure it is available (as first suggested by @Stebalien) since:
that should not be a large overhead for each command
It shouldn't really add any overhead. If the api file exists, we have to try the API anyways. Today, if that fails, we abort. Instead, we could just fallback on running the command locally.
Unfortunately, I don't understand enough about the go-ipfs-cmds.http.Client interface to code @Stebalien's proposal, that is, I understand we should try to connect to the daemon to check if it's running but I'm not even sure how to use the Send method, I don't think I can just send the original Request in commandShouldRunOnDaemon because that would result in the same command being executed twice, I suppose we should craft a special "ping" Request if that even exists, but I'm just speculating here.
Hopefully, https://github.com/ipfs/go-ipfs-cmds/pull/138 should mitigate this issue providing enough information to the user to bypass this (blocking) problem.
@schomatis the only way you're going to better understand how all this works is by reading the code and trying to fix bugs like this.