Go-ipfs: Daemon crases after requesting specific listing of links

Created on 10 May 2017  路  19Comments  路  Source: ipfs/go-ipfs

Version information:

go-ipfs version: 0.4.8-
Repo version: 5
System version: amd64/darwin
Golang version: go1.8

Type:

Bug

Severity:

High

Description:

IPFS Daemon crases on requesting the Turkish wikipedia "wiki" folder listing of links.

> ipfs ls /ipfs/QmRNXpMRzsTHdRrKvwmWisgaojGKLPqHxzQfrXdfNkettC
Error: Post http://127.0.0.1:5001/api/v0/ls?D=true&arg=%2Fipfs%2FQmRNXpMRzsTHdRrKvwmWisgaojGKLPqHxzQfrXdfNkettC&encoding=json&stream-channels=true: EOF

Daemon log: https://pastebin.com/Gks7tqWE

kinbug

Most helpful comment

It was because of 2k concurrent dials, which can't happen anymore due to fix in go-libp2p-swarm.

All 19 comments

This isn't complete log, and I can't reproduce it. Could you capture the log straight to a file and to ls then.

Here is complete log. I can perfectly reproduce this bug every time I need.
ipfs-daemon.crash-report.zip

Looks like goroutine explosion (error is pthreads unable to create thread).
Would not be a problem if we had https://github.com/ipfs/go-ipfs/issues/3762

gx/ipfs/QmQvbWzZPGpoppaAvBtj6QmyBZPw4ivFD7ryyHesxuYYDa/yamux.(*Session).keepalive                                   286
gx/ipfs/QmQvbWzZPGpoppaAvBtj6QmyBZPw4ivFD7ryyHesxuYYDa/yamux.(*Session).send                                        286
gx/ipfs/QmTU8NWsDYNShMA3hjPfEZTg3pD7YgX62sFmZdEgbjtWq2/go-libp2p-swarm.(*Swarm).dialAddrs                           414
gx/ipfs/QmQvbWzZPGpoppaAvBtj6QmyBZPw4ivFD7ryyHesxuYYDa/yamux.(*Stream).Read                                         432
sync.runtime_SemacquireMutex                                                                                        484
net.runtime_pollWait                                                                                                547
gx/ipfs/QmW832cCfBWbTV2vRPzMyQuZAaUuEEWveVsVJm7U7h7HhT/go-libp2p-conn.(*Dialer).Dial                                1794
syscall.Syscall6                                                                                                    2033
gx/ipfs/QmTU8NWsDYNShMA3hjPfEZTg3pD7YgX62sFmZdEgbjtWq2/go-libp2p-swarm.(*activeDial).wait                           3239

But I am not sure why there are 2k active dials going on.
^^ @whyrusleeping

@Zaijo how much ram does your machine have? Its a macbook right?

It isn't ram, it is thread count. The crash is due to pthereads being unable to spawn thread for goroutine.

The interesting thing here is 2k dials in progress.

@Kubuxu right, but i've seen thread death occur once ipfs starts swapping. Things start happening really slowly, and then go decides to create more threads.

I did the same

ipfs ls /ipfs/QmRNXpMRzsTHdRrKvwmWisgaojGKLPqHxzQfrXdfNkettC

Output from the daemon: https://gist.github.com/anonymous/9b40eec6552d63ef253fdb531fc73c6d

No results were ever returned, the daemon didn't crash. I hope that helps.

Macbook Pro, 8GB ram, 0.4.9 official OS X build, poor internet connection.

I think this will be resolved in latest master, there was an issue in dial rate limiting. @mattseh could you try again using a build from latest master and let us know if things are still broken?

@whyrusleeping It's MacBook Pro 8 GB RAM. Menawhile I upgraded to MacOS Sierra.

It was because of 2k concurrent dials, which can't happen anymore due to fix in go-libp2p-swarm.

I ran this again with 0.4.11-rc2, on both my Macbook Pro 8GB ram and a linux server with 16 cores and 128GB ram.

After 20 minutes, the ls command has failed to return, so it still seems broken to me.

On my Macbook the CPU is still maxed, and on the server, it is using 12 cores (low nice value, so using as much CPU as it can, that more important things are not using).

On both machines, IPFS is using 500-600MB of RAM.

Cheers

Edit: After two hours, the server IPFS is still using 10-12 cores, with no result, I have killed it.

@mattseh how did you create that object or where is this object from?

EDIT: disregard that, it is Turkish Wiki snapshot

So the problem probably is that ipfs ls buffers the output and returns everything as one JSON object and the --resolve-type is true by default.

Turkish snapshot will have 512k objects in it. The fact that ls by default resolves types doesn't help.


@mattseh can you try running it with --resolve-type=false because the default is true and you will end up downloading most of the snapshot unfortunately.

(We really need new, better format for files and directories).

time ipfs ls /ipfs/QmRNXpMRzsTHdRrKvwmWisgaojGKLPqHxzQfrXdfNkettC --resolve-type=false is still running 5 mins after being started, using more than 10 cores, on the big server previously mentioned. I will let this run overnight and see if it completes.

If possible, could you take a CPU profile for us?

curl -o profile 'http://127.0.0.1:5001/debug/pprof/profile'

(along with a copy of the ipfs binary youre using, more details here: https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md )

It finally successfully completed:

real 232m17.259s
user 0m4.616s
sys 0m1.088s

Will run again and gather the above requested info.

IPFS Binary is the official 0.4.11-rc2 for linux 64 bit.

ipfs debug data.zip

Try to repeat this. Started ipfs daemon and try to get http://127.0.0.1:8080/ipfs/QmRNXpMRzsTHdRrKvwmWisgaojGKLPqHxzQfrXdfNkettC/Anasayfa.html

In log...
`
Daemon is ready
23:13:45.843 ERROR core/serve: invalid ipfs path: cid too short gateway_handler.go:584
23:13:45.844 ERROR core/serve: invalid ipfs path: cid too short gateway_handler.go:584
23:13:45.844 ERROR core/serve: invalid ipfs path: cid too short gateway_handler.go:584
...

`

Was this page helpful?
0 / 5 - 0 ratings