Geth version:
Version: 1.8.7-stable
Git Commit: 66432f3821badf24d526f2d9205f36c0543219de
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.10
Operating System: linux
GOPATH=
GOROOT=/usr/local/go
OS & Version: Linux
Memory usage of go-ethereum should be stable.
Something happens which causes memory usage to increase substantially.
Unknown
This wasn't hugely useful to us, but hoping it means something to the folks in here. Happy to provide additional information. This is currently happening ~1/day
````
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /root/pprof/pprof.geth.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
File: geth
Build ID: fb815f00e769fd84f7336a86cf929a66ba8ab66f
Type: inuse_space
Time: May 9, 2018 at 9:40pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 5332.62MB, 84.39% of 6318.78MB total
Dropped 349 nodes (cum <= 31.59MB)
Showing top 10 nodes out of 145
flat flat% sum% cum cum%
1186.93MB 18.78% 18.78% 1186.93MB 18.78% runtime.malg
861.97MB 13.64% 32.43% 861.97MB 13.64% github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/util.(BufferPool).Get
708.04MB 11.21% 43.63% 708.04MB 11.21% math/big.nat.setBytes
529.06MB 8.37% 52.00% 529.06MB 8.37% reflect.unsafe_New
509.52MB 8.06% 60.07% 1251.55MB 19.81% github.com/ethereum/go-ethereum/rlp.decodeBigInt
453.14MB 7.17% 67.24% 464.65MB 7.35% github.com/ethereum/go-ethereum/trie.(hasher).store
388MB 6.14% 73.38% 388MB 6.14% github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/memdb.New
264.43MB 4.18% 77.56% 264.43MB 4.18% github.com/ethereum/go-ethereum/vendor/gopkg.in/fatih/set%2ev0.(Set).Add
228.51MB 3.62% 81.18% 482.26MB 7.63% github.com/ethereum/go-ethereum/eth.(peer).SendTransactions
203.02MB 3.21% 84.39% 203.02MB 3.21% github.com/ethereum/go-ethereum/event.(*Feed).Send
````
Last month:

Last 2 months:

For reference, this host was migrated to run:
v1.8.2 on March 5
v1.8.3 on March 27
v1.8.6 on Apr 26
v1.8.6 with PR #16576 on May 1
v1.8.7 on May 4
We run our node with the following cmd line arguments --rpc --rpcapi "debug,personal,eth,web3" --targetgaslimit 1000000 --cache 2048 --gpopercentile 30 --maxpeers 25
On May 1st we added --metrics --pprof --pprofaddr=127.0.0.1 and started tracking various metrics. We thought this was the source of the problem so earlier today we restarted the node without the additional flags, but unfortunately the memory leak happened again.
This is the memory breakdown the last 48 hours:

And since upgrading to 1.8.7:

Note: geth is basically the only thing running on this box and we have made sure the memory increases in the box are all geth. The sawtooth shape in the memory graphs are geth restarts.
We are experiencing a similar issue:
Geth started in docker container on Ubuntu with:
docker run \
--memory=8g \
-v /mnt/ethereum:/root/.ethereum \
-p 8545:8545 \
-p 30303:30303 \
-p 6060:6060 \
-d \
ethereum/client-go:v1.8.7 \
--pprof --pprofaddr=0.0.0.0 --rpc --rpcapi=eth,personal,txpool,debug,admin,net --rpcaddr=0.0.0.0 --cache=1024 --datadir=/root/.ethereum --gpopercentile=50 --syncmode=full --gcmode=archive --rpcvhosts=*
We introduced the 8GB memory limit as work around for docker to auto restart the container.
Output of memStats (time in UTC+2)

Output of /debug/pprof/heap at 2018-05-10 17:00 UTC+2:

Output of /debug/pprof/heap at 2018-05-11 02:00 UTC+2:

We just downgraded to v1.8.6 with PR #16576. Will keep you posted
+1, we are also experiencing this issue.
Confirmed that 1.8.6 with PR #16576 also sees this issue. We're downgrading to 1.8.3 which is the last stable version we ran.
@ryanschneider suspects that the memory issues were always there (at least in 1.8.x) but have been exacerbated by the recent uptick in txs.
@vogelito Just wondering, are you using filter queries? (experimental feature)
@mykolasmith - no we are most definitely not.
This is the node's memory consumption since Friday morning.

1.8.3 seems to be doing better, but if the issue is indeed caused by an uptick in txs, then it's possible things only look like this because it's the weekend.
Update on memory usage. 1.8.3 still doing ok:

@karalabe - I saw some memory-management related improvements on the latest v1.8.8 release
Do you think there's a chance they address the memory-related problems we're seeing here?
Unfortunately not.
@vogelito Out of curiosity. Could you try reverting https://github.com/ethereum/go-ethereum/pull/16494 locally for yourself from master and seeing if that fixes it. I don't see how it would, just playing guessing game as to what we've changed in 1.8.4 that might cause this.
@karalabe - happy to. However, I was planning on waiting a bit more to see if the problem is indeed gone in 1.8.3
We ran 1.8.6 without problems for a while until there was an uptick in network transactions.
Another interesting experiment might be to run 1.8.7 or 1.8.8 with --pprof, and when you see the memory spiking, load http://localhost:6060/memsize and generate a report (Scan "node"). This will use some experimental memory stats gathering written by @fjl to try and report how much memory different parts of our code use. That might help pinpoint what part is leaky.
Do note however that 'scanning all objects and measuring their memory use' will probably hit the node hard for 30+ seconds. So, take care with doing this on a production system.
Happy to do whichever one you find more useful, just let me know. (and thanks for the heads up!)
I can confirm that this is still an issue in 1.8.3. We just started seeing the leak happening on 1.8.3 today.

We are restarting the node on 1.8.8 and will run with --pprof
our node (now running 1.8.8) is not exhibiting the parabolic memory usage behavior, yet. However, memory usage is already quite high (~14GB) and this is the report.

Edit: for some reason this message wasn't posted when I first intended to, so please check the date timestamp on the image.
The node is now at 18.69 and this is what the report shows now:

last 24 hours running 1.8.8.

Process killed by the OS:
# dmesg | grep oom
[14196605.693635] geth invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
[14196605.693664] [<ffffffff81192e72>] oom_kill_process+0x202/0x3c0
[14196605.693750] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
We're running a few tests ourselves too now. A PR from Gary https://github.com/ethereum/go-ethereum/pull/16720 could help with the tons of goroutines by making less of them and also fixes some pending block generation issues that made tx posting slow. I can't give you meaningful numbers since the syncs just finished now, but lets see how the two behave in comparison after a day or so.
After this is in, I also have an update lined up that removes a blockage from block and transaction propagation into the network. That should avoid any goroutine leak caused by slow remote peers. Will set up a benchmark for it on Monday (want to run the above PR until then).
Thanks. I will try to provide you with a memory report of the node later today. If there's anything else I can do to help you track this problem, please let me know.
@karalabe We're going to try this out as well. Will report back results in a few hours!
Current situation and memory report


Update. Node exhausted memory again.
Memory scan

Last month

Last week

Last 2 days

Last 24 hours


Still seeing memory leak as of https://github.com/ethereum/go-ethereum/pull/16720.
Per @holiman's post in #16673, I'm attaching the output of http://localhost:6060/debug/pprof/goroutine?debug=1
This was with 1.8.8 at over 24GB. The node had 2.3M goroutines on the event feed (99.9934% of all goroutines).
goroutine profile: total 2364583
2364427 @ 0x44272a 0x4427de 0x419c42 0x4198fb 0x6c1fec 0x4706e1
# 0x6c1feb github.com/ethereum/go-ethereum/event.(*Feed).Send+0xcb /home/ubuntu/geth/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/event/feed.go:133
17 @ 0x44272a 0x4427de 0x453344 0x45305d 0x4871b8 0x4880bd 0x886b44 0x8868ce 0xa8f821 0xa8f0a6 0xa9d90b 0x985ce6 0x4706e1
# 0x45305c sync.runtime_SemacquireMutex+0x3c /usr/local/go/src/runtime/sema.go:71
# 0x4871b7 sync.(*Mutex).Lock+0x107 /usr/local/go/src/sync/mutex.go:134
# 0x4880bc sync.(*RWMutex).Lock+0x2c /usr/local/go/src/sync/rwmutex.go:93
# 0x886b43 github.com/ethereum/go-ethereum/core.(*TxPool).addTxs+0x53 /home/ubuntu/geth/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/core/tx_pool.go:801
Master currently contains some optimizations that should help avoid one potential bottleneck from the transaction feed. Trying that might be a good data point for us.
On top of master, I have a PR https://github.com/ethereum/go-ethereum/pull/16769 which removes another potential congestion points from the transaction feed. Trying out this might also be a valuable data point.
No promise that these will fix the issue, but is should most definitely fix some issue that could lead to this scenario in insanely busy nodes.
Quick question. Could you detail a bit your setup? How many peers do you run with? What is your API usage? What kind of requests are you serving? Anything else that might be helpful to find congested pathways?
I see that #16769 has been merged into master. I'll try a new build off of master and report back.
Regarding our setup, here are the details:
--rpc --rpcapi "debug,personal,eth,web3" --targetgaslimit 1000000 --cache 2048 --gpopercentile 30 --maxpeers 25 --metrics --pprof --pprofaddr=127.0.0.1I hope this is useful. We're restarting our node now with a build from master and will report back.
For reference, this is what we're running now
# /home/ubuntu/geth/current/geth version
Geth
Version: 1.8.9-unstable
Git Commit: 415969f534d8122e717e18abafdc8ec7bb913a84
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.10
Operating System: linux
GOPATH=
GOROOT=/usr/local/go
I've been having this issue on both 1.8.1 and 1.8.8, but it only started appearing recently (last couple of weeks I think) and has gotten to a point where I have to restart geth on a daily basis to prevent it from crashing unexpectedly. FWIW I dont think 1.8.x should be considered 'stable' until this is addressed.
@crypt1d I don't think it's 1.8.x being unstable, rather the network usage pattern is changing and surfacing a dormant bug.
Things are looking better but perhaps still too early to tell.
Reminder:
Memory graph of last week:

Daily update: Memory still looking ok here.

Memory looking OK for us as well.
Daily update: Memory still looking ok here.
Update: memory seems to have been stable as of https://github.com/ethereum/go-ethereum/pull/16769, however we experienced several spikes in memory usage prior to 18:00EST yesterday, after which all peers began to timeout and the node never recovered.


May 23 14:46:04 geth[22809]: WARN [05-23|14:46:04] Synchronisation failed, dropping peer peer=4994163696be9f52 err=timeout
May 23 15:51:21 geth[22809]: WARN [05-23|15:51:21] Synchronisation failed, dropping peer peer=90bccbf1fccc80ae err=timeout
May 23 16:49:58 geth[22809]: WARN [05-23|16:49:58] Synchronisation failed, dropping peer peer=6011df96954d3708 err=timeout
Things still looking OK for us memory-wise and no known other issues at the moment.

Closing this as latest code fixes it.
@mykolasmith We may have just found the hang issue you experienced: https://github.com/ethereum/go-ethereum/issues/16840.
Most helpful comment
This is the node's memory consumption since Friday morning.
1.8.3 seems to be doing better, but if the issue is indeed caused by an uptick in txs, then it's possible things only look like this because it's the weekend.