Go-ipfs: ipfs commands such as id fail when daemon has been started and stopped.

Created on 19 Nov 2018 · 21Comments · Source: ipfs/go-ipfs

Version information:

Output From ipfs version --all

go-ipfs version: 0.4.18-
Repo version: 7
System version: amd64/linux
Golang version: go1.11.1

Type:

Bug

Description:

When running an ipfs command that is not ipfs daemon, the existence of the file $IPFS_PATH/api decides whether to use the HTTP API to complete the command or not (eg. ipfs id does this). This happens in fsrepo.go

The file $IPFS_PATH/api is created when the ipfs daemon command is started. the problem is that when the daemon process is shut down via a signal it does not remove $IPFS_PATH/api.

This leads to behaviour that after an ipfs daemon process has been started and terminated, other ipfs commands that can be fulfilled with or without the HTTP api (eg. ipfs id) will always fail with the message

Error: api not running

A minimal example that reproduces the failure is shown below:

#!/bin/bash
set -x
set -e

IPFS=/path/to/your/ipfs
function run_it() {
    export IPFS_PATH=$(mktemp -d)
    $IPFS init
    $IPFS id # completes fine
    $IPFS daemon &
    sleep 5
    kill $!
    $IPFS id # will always fail
}

run_it

I don't think this is the intended behaviour (?).

If there isn't already a signal handler to remove $IPFS_PATH/api (via FSRepo.Close when the daemon is interrupted then could we add one for this?

Alternatively, if the command fails in this manner but it can be fulfilled without the ipfs HTTP api why not do it without HTTP and remove $IPFS_PATH/api at that point?

kinbug topidocs-ipfs

Source

cboddy

❤2

All 21 comments

Thanks for the bug report, and the script to reproduce. My api directory _does_ get cleaned out on macos but I see the same behavior you experienced on linux. That's definitely messed up.

eingenito on 29 Nov 2018

@cboddy

The ipfs daemon may take a little while to shutdown after you kill it with a signal, so if you try and use a command while it is shutting down you will get this error. Try adding a sleep after the kill and it should work. A better error message could be useful though.

kevina on 29 Nov 2018

@kevina thanks for the suggestion but the file $IPFS_PATH/api is never removed on linux after being shutdown via a signal so

no amount of sleeping will help
this is a problems since there is no "clean" shutdown command AFAIK.

A simple work-around in Peergos (where we manage an ipfs runtime programmatically) is to remove $IPFS_PATH/api when shutting down the daemon.

However, it is surprising behaviour that requires time to understand, hence the bug-report.

cboddy on 29 Nov 2018

@cboddy Thanks so much for reporting this in such a clear and descriptive way. I'm personally assigning myself here because this was the error I encountered when trying IPFS for the very first time and drove me (out of my frustration as a newbie who wants it all and wants it now) pretty close to leaving the project altogether.

schomatis on 29 Nov 2018

go-ipfs version: 0.4.19-dev-af73c50
Repo version: 7
System version: amd64/linux
Golang version: go1.11.1

This works for me:

#!/bin/bash
set -x
set -e

IPFS=ipfs
function run_it() {
    export IPFS_PATH=$(mktemp -d)
    $IPFS init
    $IPFS id # completes fine
    $IPFS daemon &
    sleep 5
    kill $!
    #$IPFS id # will fail
    sleep 10
    $IPFS id # ok
}

run_it

kevina on 29 Nov 2018

@cboddy; @kevina is right in my case. My api directory is consistently deleted, and ipfs id succeeds some brief time after the signal is sent. @schomatis are you able to reproduce the behavior in this bug?

@cboddy - if you run the daemon (in you script for instance) with the -D option and look at the last dozen lines of output do you see logging from the cleanup code like?

10:57:25.463 DEBUG dht: Error unmarshaling data: context canceled dht_net.go:69 Received interrupt signal, shutting down... 10:57:25.463 DEBUG bitswap: bitswap task worker shutting down... workers.go:88 (Hit ctrl-c again to force-shutdown the daemon.) 10:57:25.464 DEBUG bitswap: bitswap task worker shutting down... workers.go:88 10:57:25.464 DEBUG bitswap: bitswap task worker shutting down... workers.go:85 10:57:25.464 DEBUG bitswap: bitswap task worker shutting down... workers.go:88 10:57:25.464 DEBUG bitswap: bitswap task worker shutting down... workers.go:85 10:57:25.464 INFO core/serve: server at /ip4/127.0.0.1/tcp/8080 terminating... corehttp.go:107 10:57:25.464 DEBUG bitswap: bitswap task worker shutting down... workers.go:85 10:57:25.464 DEBUG bitswap: bitswap task worker shutting down... workers.go:85 10:57:25.464 DEBUG bitswap: bitswap task worker shutting down... workers.go:88 10:57:25.464 WARNI swarm2: swarm listener accept error: process closing swarm_listen.go:77 10:57:25.464 INFO core/serve: server at /ip4/127.0.0.1/tcp/5001 terminating... corehttp.go:107 10:57:25.464 DEBUG bitswap: provideKeys channel closed workers.go:126 10:57:25.464 INFO core/serve: server at /ip4/127.0.0.1/tcp/8080 terminated corehttp.go:127 10:57:25.464 DEBUG core: core is shutting down... core.go:632 10:57:25.464 INFO core/serve: server at /ip4/127.0.0.1/tcp/5001 terminated corehttp.go:127 10:57:25.464 DEBUG blockservi: blockservice is shutting down... blockservice.go:324 10:57:25.464 WARNI swarm2: swarm listener accept error: accept tcp [::]:4001: use of closed network connection swarm_listen.go:77 10:57:25.465 WARNI swarm2: swarm listener accept error: accept tcp 0.0.0.0:4001: use of closed network connection swarm_listen.go:77 10:57:25.507 DEBUG mdns: mdns query complete mdns.go:142 10:57:25.507 DEBUG mdns: mdns service halting mdns.go:148 10:57:25.654 INFO cmd/ipfs: Gracefully shut down daemon daemon.go:351

eingenito on 29 Nov 2018

@schomatis are you able to reproduce the behavior in this bug?

Yes, if we really kill the process it won't cleanup the api (whatever that means). Granted, this can be fixed by running ipfs daemon again but the user has no way of knowing that and any other command will fail with that cryptic message. At the very least this is a documentation issue but we should find a more elegant way of handling kills/crashes like this one.

#!/bin/bash
set -x
set -e

IPFS=ipfs
function run_it() {
    export IPFS_PATH=$(mktemp -d)
    $IPFS init
    $IPFS id # completes fine
    $IPFS daemon &
    sleep 10

    kill -SIGKILL $! # Die fast, don't wait on any cleanup.

    ! $IPFS id # Allow this command to bypass `-e` check.
    while [ $? -eq 0 ]; do
        sleep 3
        ! $IPFS id
    done # This never ends.
}

run_it

schomatis on 29 Nov 2018

I think I’d say if SIGTERM (or like SIGINT) doesn’t clean the api directory that’s a bug and if SIGKILL doesn’t clean it then we could do better but it’s debatable. What’s the right thing? Should we treat api like a pidfile and exclusive lock it to indicate the presence of the daemon instead of create/delete?

eingenito on 29 Nov 2018

I removed the bug label and changed this it to enhancement. It is a bug if killing SIGTERM/SIGINT doesn't (eventually) remove the $IPFS_PATH/api file. When killed with SIGKILL there is no way to clean this up. We should be more intelligent about what to do afterwards. As previously mentioned if ipfs daemon is run again it will clean up the stale state, we should perhaps do that all for any command that requires the repo or at least return a better error message.

kevina on 30 Nov 2018

👍1

Actually @cboddy if you can confirm again that your api directory isn't removed on SIGTERM then it's a bug of some sort. We just can't reproduce it ourselves. I mean at least I can't. Did you check the debug logging output to see if cleanup operations are taking place?

eingenito on 30 Nov 2018

This is basically a duplicate of https://github.com/ipfs/go-ipfs/issues/2395.

kevina on 3 Dec 2018

I'm personally assigning myself here because this was the error I encountered when trying IPFS for the very first time and drove me (out of my frustration as a newbie who wants it all and wants it now) pretty close to leaving the project altogether.

If this issue (which is labeled as a bug in the dupe'd issue) is so painful that some of our dedicated community members almost dropped the project over it - imagine how many people ran into this issue and didn't have the perseverance/faith to push through that frustration to become a long term contributor. I think that is sufficient reason to prioritize resolving this issue (aka clean this file up consistently) or at least add much better error handling to enable easy debugging and not burn contributor time on this rough edge.

momack2 on 6 Dec 2018

❤1

So, the bug here isn't that the API file isn't getting cleaned up on KILL (@kevina's right, there's nothing we can do about that). The bug is that we're not cleaning it up later. This came up here as well:

https://github.com/ipfs-shipyard/ipfs-desktop/pull/722#issuecomment-444310856

The tricky part is that having an API file without running a daemon is actually a feature: This is how users tell IPFS to use a remote IPFS instance.

However, while fixing this in ipfs-desktop is a bit tricky, it's actually pretty easy to fix here:

Try connecting to the remote daemon.
If that fails, just run the command locally (ignoring the API file).

Step 2 will fail if the local repo is either locked (a daemon is running) or the local repo hasn't been initialized (we're using a remote daemon).

The downside is that this prevents users from switching back and forth between a local and a remote daemon. I'm not sure if that's something we really need to support.

Stebalien on 6 Dec 2018

So, the bug here _isn't_ that the API file isn't getting cleaned up on KILL (@kevina's right, there's nothing we can do about that). The bug is that we're not cleaning it up later.

Yes, the KILL was just an example, the system can also crash and we would be in the same scenario.

The tricky part is that having an API file without running a daemon is actually a _feature_: This is how users tell IPFS to use a _remote_ IPFS instance.

That may be so, but the first steps any user will perform don't acknowlegde that, nor the importance of having a daemon running, nor the basic client-server architecture we implement here:

From my own personal interpretation of reading the docs, the idea we're normally selling here (and that's something we may need to revise) is that you can just run ipfs <any-command> without caring about the daemon; and we also have the feature to share your personal IPFS piece of land (repo) with the rest of the world by doing all of the commands you were already doing while the daemon is running.

Try connecting to the remote daemon.

If that fails, just run the command locally (ignoring the API file).

Step 2 will fail if the local repo is either locked (a daemon is running) or the local repo hasn't been initialized (we're using a remote daemon).

If we all agree on this I'll try to apply this fix.

schomatis on 6 Dec 2018

From my own personal interpretation of reading the docs

In all cases where we have a valid local repo, yes.

Stebalien on 6 Dec 2018

The downside is that this prevents users from switching back and forth between a local and a remote daemon. I'm not sure if that's something we really need to support.

We can instruct users to create a separate IPFS_PATH with api file referencing remote daemon end use that.

magik6k on 9 Dec 2018

We should also prevent ipfs init from working if api file is preset

magik6k on 9 Dec 2018

go-ipfs version: 0.4.19-dev-af73c50
Repo version: 7
System version: amd64/linux
Golang version: go1.11.1

This works for me:

#!/bin/bash
set -x
set -e

IPFS=ipfs
function run_it() {
    export IPFS_PATH=$(mktemp -d)
    $IPFS init
    $IPFS id # completes fine
    $IPFS daemon &
    sleep 5
    kill $!
    #$IPFS id # will fail
    sleep 10
    $IPFS id # ok
}

run_it

@kevina thanks for that; yes that works for me as well.

I'd also suggest pinging the remote-endpoint defined in the api file to ensure it is available (as first suggested by @Stebalien) since:

that should not be a large overhead for each command
it would eliminate this source of ambiguity/confusion.

cboddy on 12 Dec 2018

that should not be a large overhead for each command

It shouldn't really add any overhead. If the api file exists, we have to try the API anyways. Today, if that fails, we abort. Instead, we could just fallback on running the command locally.

Stebalien on 12 Dec 2018

Unfortunately, I don't understand enough about the go-ipfs-cmds.http.Client interface to code @Stebalien's proposal, that is, I understand we should try to connect to the daemon to check if it's running but I'm not even sure how to use the Send method, I don't think I can just send the original Request in commandShouldRunOnDaemon because that would result in the same command being executed twice, I suppose we should craft a special "ping" Request if that even exists, but I'm just speculating here.

Hopefully, https://github.com/ipfs/go-ipfs-cmds/pull/138 should mitigate this issue providing enough information to the user to bypass this (blocking) problem.

schomatis on 24 Dec 2018

@schomatis the only way you're going to better understand how all this works is by reading the code and trying to fix bugs like this.

Stebalien on 27 Dec 2018

Was this page helpful?

0 / 5 - 0 ratings