go-ipfs version: 0.4.23-6ce9a355f
Repo version: 7
System version: amd64/linux
Golang version: go1.14
I tried to list a directory via the local web-gateway and couldn't get an answer via the browser. Doing the same via the ipfs command is on the other hand very fast:
$ time ipfs ls QmdQogukw7icepHraCEgrqUiYRehqXbGiXkRXYX1i8rUZq > /dev/null
real 0m0,839s
user 0m0,191s
sys 0m0,312s
using an (already cached IPNS is also very fast):
$ time ipfs ls /ipns/pkg.pacman.store/arch/x86_64/default/community > /dev/null
real 0m0,806s
user 0m0,197s
sys 0m0,299s
While this wget requests connects but never gets a code back (within 2 minutes).
$ time wget http://127.0.0.1:8080/ipfs/QmdQogukw7icepHraCEgrqUiYRehqXbGiXkRXYX1i8rUZq
After a restart of the node this issue disappeared. Any idea how to debug this in the future?
Follow the guide in https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md. The goroutine dump will be especially helpful in figuring out where the request is stuck.
@Stebalien
I had a similar occurrence, where the Web-API basically stalled for 2-3 minutes, for no apparent reason.
I wrote a goroutine dump for you while it happened (the same computer/setup but a different IPFS-version, this time build from the git repo: 729e0533f7e91c67c4c9c589bd7ab937145345b6)
Could you take the rest of the dumps? One thing I'm noticing is that zap (the logging framework) appears to be blocking. Do you have debug logging enabled?
Yes, debug logging was on.
When it occurs the next time, I'll do all dumps :)
Well, my testing installation (for 0.5.0-dev-84724a360) now got completely stuck. I started this time with a completely empty repo/block storage and just tried to cluster follow the ipfs-websites cluster.
This is what the cluster-follower has logged on the console until I terminated it:
Nothing else was running - just the cluster follower and the WebUI.
The log of ipfs hasn't printed any errors (haven't changed the log level to debug this time) - and I cannot get any of the dumps from the API... the curls are running for 10 Minutes now with 0 bytes fetched.
Any idea how to debug this? :)
A overview of the processes:
$ pstree -acnp | grep ipfs | grep -v grep
| | | `-ipfs,324159 diag sys
| | | |-{ipfs},324160
| | | |-{ipfs},324161
| | | |-{ipfs},324162
| | | |-{ipfs},324163
| | | |-{ipfs},324164
| | | |-{ipfs},324165
| | | |-{ipfs},324166
| | | `-{ipfs},324167
|-ipfs,243084 daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment --enable-gc
| |-{ipfs},243085
| |-{ipfs},243086
| |-{ipfs},243087
| |-{ipfs},243088
| |-{ipfs},243089
| |-{ipfs},243090
| |-{ipfs},243091
| |-{ipfs},243092
| |-{ipfs},243207
| |-{ipfs},243297
| |-{ipfs},244563
| `-{ipfs},244564
$ ps -eLfa | grep -E "324159|324160|324161|324162|324163|324164|324166|324165|324167|243084|243085|243086|243087|243088|243089|243090|243091|243092|243207|243297|244563|244564|^UID" | grep -v "grep"
UID PID PPID LWP C NLWP STIME TTY TIME CMD
ruben 243084 1 243084 0 13 06:28 ? 00:00:00 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment --enable-gc
ruben 243084 1 243085 3 13 06:28 ? 00:24:02 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment --enable-gc
ruben 243084 1 243086 0 13 06:28 ? 00:00:02 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment --enable-gc
ruben 243084 1 243087 0 13 06:28 ? 00:00:00 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment --enable-gc
ruben 243084 1 243088 0 13 06:28 ? 00:00:02 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment --enable-gc
ruben 243084 1 243089 0 13 06:28 ? 00:00:00 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment --enable-gc
ruben 243084 1 243090 0 13 06:28 ? 00:00:02 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment --enable-gc
ruben 243084 1 243091 0 13 06:28 ? 00:00:03 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment --enable-gc
ruben 243084 1 243092 0 13 06:28 ? 00:00:00 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment --enable-gc
ruben 243084 1 243207 0 13 06:29 ? 00:00:02 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment --enable-gc
ruben 243084 1 243297 98 13 06:30 ? 10:26:05 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment --enable-gc
ruben 243084 1 244563 0 13 06:35 ? 00:00:00 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment --enable-gc
ruben 243084 1 244564 1 13 06:35 ? 00:12:12 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment --enable-gc
ruben 324159 324140 324159 0 9 15:50 pts/13 00:00:00 ipfs diag sys
ruben 324159 324140 324160 0 9 15:50 pts/13 00:00:00 ipfs diag sys
ruben 324159 324140 324161 0 9 15:50 pts/13 00:00:00 ipfs diag sys
ruben 324159 324140 324162 0 9 15:50 pts/13 00:00:00 ipfs diag sys
ruben 324159 324140 324163 0 9 15:50 pts/13 00:00:00 ipfs diag sys
ruben 324159 324140 324164 0 9 15:50 pts/13 00:00:00 ipfs diag sys
ruben 324159 324140 324165 0 9 15:50 pts/13 00:00:00 ipfs diag sys
ruben 324159 324140 324166 0 9 15:50 pts/13 00:00:00 ipfs diag sys
ruben 324159 324140 324167 0 9 15:50 pts/13 00:00:00 ipfs diag sys
So just one thread is busy looping, how can we get a stack trace of it? :)
I wrote with ttocslliwDiscor4 on IRC, he assumed it's a bug in go-yamux and go-libp2p 0.7.2 should have fixed it (my build is g84724a360).
I used SIGABRT to kill it and that's the stacktrace:
So, there _was_ a bug in go-libp2p but it's unrelated (potentially causes streams to fail).
The bug here looks like https://github.com/golang/go/issues/38070 which will be fixed in go 1.14.2.
If you want to work around the issue _now_, you can use https://github.com/libp2p/go-yamux/tree/fix/re-disable-write-coalescing. It doesn't fix the bug in go, but this branch reduces the likelihood that the bug will be triggered.
Closing in favor of https://github.com/libp2p/go-libp2p/issues/858.
If you want to work around the issue _now_, you can use https://github.com/libp2p/go-yamux/tree/fix/re-disable-write-coalescing. It doesn't fix the bug in go, but this branch reduces the likelihood that the bug will be triggered.
No it's not needed, I just want to test this build until it gets released to make sure I can point out some bugs early. :)