go-ipfs version: 0.5.1-8431e2e
Repo version: 9
System version: amd64/linux
Golang version: go1.13.10
runs as docker container, taken from ipfs/go-ipfs:v0.5.1
Repo of 3TB that has many different pins. No ongoing download in progress, but the repo still grows at speed of 20-30 KBytes/s (BTRFS, noatime,compress=zlib:3,space_cache). At the same time, average IO is 20-30 MegaBytes (!) per second.
The IO is pretty random that slow down magnetic media tremendously, with average 2-15 ipfs processes constantly sitting in I/O blocked state:
# vmstat -n 60
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
...
0 2 422624 194356 21900 19943532 0 1 215 13104 7642 17343 3 2 52 43 0
0 15 422880 203296 21896 19935368 1 1 217 13420 7754 18294 1 2 51 45 0
0 12 423136 211772 21768 19929644 1 5 271 13350 7774 18120 3 2 47 48 0
Hey @skliarie,
has the situation changed in the last 3 days?
We need some additional information to your configuration:
Can you provide the debug information from the daemon when the daemon runs the unexpected IO workload?
I use flatfs.
No cluster, standalone machine
ipfs started using "ipfs daemon --migrate=true", see attached my config file
NumObjects: 27482408
RepoSize: 2712552599390
StorageMax: 4000000000000
RepoPath: /data/ipfs
Version: fs-repo@9
Thanks!
Do you run any operations on your node via API or access any data via the http gateway?
Do you have any service running, like a system monitoring agent, which supports IPFS and accesses the API to collect any metrics?
Does this issue persists when you restart your daemon, if so, does this issue start right away or after some delay?
I have no http gateway, only the API.
No system monitoring like prometheus (if you mean it). Just basic checks.
Yes, the issue starts right away after daemon restart.
The repo will grow as the DHT stores data (and as you send out providers for data), but 20-30KB/s is way too fast. How are you calculating disk usage? Are you seeing the disk usage reported by ipfs repo stat go up by 20KB/s?
In terms of the disk reads, this is probably bitswap. You're connected to a lot of peers that are probably looking for data. You may also want to set Datastore.BloomFilterSize (in the config) to 67108864 (64MiB). This will construct a bloom filter of 64MiB on start and every time you GC (expensive) so you don't have to check your disk to see if you have something every time a peer asks.
"ipfs repo stat" takes too long time to execute, I do df for that.
I have BloomFilterSize 0, I guess it is unlimited, no? If yes, this explains huge size of the IPFS process:
# ps awux | head
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
user 3223 37.0 13.6 17767376 3360848 pts/0 Sl+ Jun15 1964:04 ipfs daemon --migrate=true
And I do see (using iotop) that it is ipfs process that causes the writes and not BTRFS:
Total DISK READ : 218.76 K/s | Total DISK WRITE : 40.88 M/s
Actual DISK READ: 204.18 K/s | Actual DISK WRITE: 35.32 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
32037 be/4 user 29.17 K/s 40.88 M/s 0.00 % 99.99 % ipfs daemon --migrate=true
The writes don't have constant speed, it fluctuates between 0 to 40MB/s. I tried to find correlation between the writes (vmstat -n 1) and bitswap operation counters (watch ipfs stats bitswap), to no avail.
One thing I noticed, is that the writes are coming in bursts. See attached output of vmstat -n 1 for 10 mins.
It looks like there are some buffers that are flushed once in a while. Keep in mind that the node has huge number of peers (8755 at the moment) and 3TB worth of various popular pinsets.
Find attached imon.sh script I wrote to monitor the ipfs node and output for several hours that it produces.
I have BloomFilterSize 0, I guess it is unlimited, no? If yes, this explains huge size of the IPFS process:
0 means 0 (disabled).
IPFS memory usage is usually correlated with the number of peers you have.
Looking at the profile, I see what's going on. The problem is the DHT. Your node is backed up trying to write incoming provider records to disk: https://github.com/libp2p/go-libp2p-kad-dht/issues/675. It could just be that your disk is too slow, but I'm wondering if your node just _happens_ to be close to some really popular content.
For now, I'd try running your daemon with "dhtclient".
cc @aschmahmann
ipfs 0.6.0, switched to dhtclient, the args are now "--routing=dhtclient --migrate=true --enable-pubsub-experiment". BloomFilterSize": 536870912.
write speed dropped to 5MB average per minute, fluctuating between 1 to 15MB/s in much smoother way than before.
Number of partners dropped from 8k to about 700, no idea why..
The node feels faster though.
It dropped because peers are no longer trying to connect to you to advertise/find content in the DHT. Now, all of your peers are: