The application/Postgres memory usage should be consistent, this ensures the garbage collection is happening consistently and the heap never grows to avoid the memory leak.
The application consumes all the memory when the network(200 nodes) is syncing and forging the blocks, this causes the forging nodes to crash and the network doesn't grow.
Run 200 nodes(lisk-core) and enable forging on 10 nodes and sync all other nodes, post all types of transactions and at one particular point the heap starts to grow and leads to memory leaks and node crash.
1.0.0
@ManuGowda While spending time on debugging, make sure that we are looking at a memory leak not a memory bloat.
@ManuGowda Why performance label?
@MaciejBaj Here is the details observation of Memory leak/ bloat during block processing in the local environment.
Setup:
[x] Lisk-core running on local environment with syncing disabled, broadcasting disabled and only enabling forging.
[x] Running node application in inspect mode to capture the heap snapshot and setting max_old_space parameter to 500MB or 1000MB for different tests. and also enabling expose_gc parameter to observer the garbage collection, here is an example node-inspect --trace_gc --max_old_space_size=1000 --expose_gc app.js
[x] Then run the stress test of transactions type(0, 1, 2, 3, 4, 5) using lisk-core-qa for the Memory leak/ bloat observations.
[x] Here is the g-drive link which contains the heap snapshots captured during heap_used grows extensively over 50MB consistently.
[x] After much detailed investigation on process memory usage and the garbage collection process, here is the conclusion I could arrive at. Before that, I would like to establish my understanding of memory usage and garbage collection to give the conclusion. Memory usage of the Node.js process consists of Resident Set Size(RSS), is the amount of space occupied in the main memory device (that is a subset of the total allocated memory) for the process, which includes the heap, code segment, and stack. heapTotal and heapUsed refer to V8's memory usage. external refers to the memory usage of C++ objects bound to JavaScript objects managed by V8.
Theheap is where objects, strings, and closures are stored. Variables are stored in the stack and the actual JavaScript code resides in the code segment.
[x] Conclusion made so far: The HeapUsed and HeapTotal memory is consumed when the stress test was run and when the memory spike goes up the garbage collector is running the scavenge for garbaging collecting in the new_space and later point of time mark-sweep is run to collect the old_space memory. So this makes one thing clear that there is no memory leak in during the forging and even syncing process(Thanks to @nazarhussain for running this test and conforming). And also @jondubois also confirmed in this test against socketcluster is leaking any memory. However, the only thing which needs to be evaluated is the Resident Set Size(RSS) which is growing constantly(RSS grew 1.3GB when the max_old_space_size was set to 1GB), so doing further investigation on RSS to conclude finally what caused the node to restart every time on the betanet
[ ] TODO: Investigating the RSS memory growth
A good reference on the current issue we are tackling:
https://github.com/nodejs/node/issues/12805
https://github.com/nodejs/node/issues/13917
https://groups.google.com/forum/#!topic/nodejs/KM0Yis-LNpg
https://github.com/nodejs/node/issues/11077
I debug the low level memory usage with valgrind with following options;
--leak-check=full --show-leak-kinds=all --trace-children=yes
and found following summary while doing syncing of blocks from network for 15 minutes;
==2901== LEAK SUMMARY:
==2901== definitely lost: 728 bytes in 1 blocks
==2901== indirectly lost: 704 bytes in 5 blocks
==2901== possibly lost: 1,640 bytes in 10 blocks
==2901== still reachable: 1,429,553 bytes in 6,664 blocks
==2901== of which reachable via heuristic:
==2901== stdstring : 61 bytes in 1 blocks
==2901== newarray : 49,880 bytes in 46 blocks
==2901== suppressed: 0 bytes in 0 blocks
Detail for definite leaks are:
==2901== 1,432 (728 direct, 704 indirect) bytes in 1 blocks are definitely lost in loss record 1,146 of 1,228
==2901== at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2901== by 0xC421CB7: void createGroup<true>(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/nazar/lisk/node_modules/uws/uws_linux_48.node)
==2901== by 0x98F7D1: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) (in /home/nazar/.nvm/versions/node/v6.12.3/bin/node)
==2901== by 0x9EE7FD: v8::internal::(anonymous namespace)::HandleApiCallHelper(v8::internal::Isolate*, v8::internal::(anonymous namespace)::BuiltinArguments<(v8::internal::BuiltinExtraArguments)3>) (in /home/nazar/.nvm/versions/node/v6.12.3/bin/node)
==2901== by 0x9EF09D: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) (in /home/nazar/.nvm/versions/node/v6.12.3/bin/nod
So for sure there is a memory leak in uws library. Within 15 minutes it leaked 1.4kb, so as time grows that leak can grow. I suggest to switch from that dependent library.
Closed by https://github.com/LiskHQ/lisk/pull/2018
Most helpful comment
@MaciejBaj Here is the details observation of
Memory leak/ bloatduring block processing in the local environment.Setup:
[x] Lisk-core running on local environment with syncing disabled, broadcasting disabled and only enabling forging.
[x] Running node application in inspect mode to capture the heap snapshot and setting
max_old_spaceparameter to500MBor1000MBfor different tests. and also enablingexpose_gcparameter to observer the garbage collection, here is an examplenode-inspect --trace_gc --max_old_space_size=1000 --expose_gc app.js[x] Then run the stress test of transactions type(0, 1, 2, 3, 4, 5) using
lisk-core-qafor theMemory leak/ bloatobservations.[x] Here is the g-drive link which contains the heap snapshots captured during
heap_usedgrows extensively over 50MB consistently.[x] After much detailed investigation on process memory usage and the garbage collection process, here is the conclusion I could arrive at. Before that, I would like to establish my understanding of memory usage and garbage collection to give the conclusion. Memory usage of the Node.js process consists of
Resident Set Size(RSS), is the amount of space occupied in the main memory device (that is a subset of the total allocated memory) for the process, which includes the heap, code segment, and stack.heapTotalandheapUsedrefer to V8's memory usage. external refers to the memory usage of C++ objects bound to JavaScript objects managed by V8.The
heapis whereobjects,strings, andclosuresare stored.Variablesare stored in thestackand the actualJavaScript coderesides in thecode segment.[x] Conclusion made so far: The
HeapUsedandHeapTotalmemory is consumed when the stress test was run and when the memory spike goes up the garbage collector is running thescavengefor garbaging collecting in thenew_spaceand later point of timemark-sweepis run to collect theold_spacememory. So this makes one thing clear that there is nomemory leakin during theforgingand evensyncingprocess(Thanks to @nazarhussain for running this test and conforming). And also @jondubois also confirmed in this test against socketcluster is leaking any memory. However, the only thing which needs to be evaluated is theResident Set Size(RSS) which is growing constantly(RSS grew 1.3GB when themax_old_space_sizewas set to1GB), so doing further investigation onRSSto conclude finally what caused the node to restart every time on thebetanet[ ] TODO: Investigating the RSS memory growth