Please, check if this is reproducible:
Install/download last nightly or v8-canary build.
Run npm update
or npm update -g
for a package with many dependencies (see an example in https://github.com/nodejs/node/issues/19405#issuecomment-373922180).
I have 1.5 GB memory consumption, ~100% CPU, and never-ending process without any output messages.
Installing / using Node.js v9.8.0 with the same npm
fixes the issue.
The issue seems to appear if something does need to be updated: after successfully updating (npm update -g
) with Node.js v9.8.0, npm update -g
returns immediately with Node.js v10.0.0, but npm update
with outdated packages hangs.
Narrowed case.
https://github.com/eslint/eslint/releases — last versions: 4.18.2, 4.19.0.
md test-npm && cd test-npm
npm install [email protected]
npm update
The last command hangs trying to update to 4.19.0.
can you run it in verbose mode so we have a better idea about when it starts hanging?
j:\temp\test-npm>npm update --verbose
npm info it worked if it ends with ok
npm verb cli [ 'C:\\Program Files\\nodejs\\node.exe',
npm verb cli 'C:\\Program Files\\nodejs\\node_modules\\npm\\bin\\npm-cli.js',
npm verb cli 'update',
npm verb cli '--verbose' ]
npm info using [email protected]
npm info using [email protected]
npm verb npm-session 5aebf4c4722b2950
npm verb update computing outdated modules to update
npm verb request uri https://registry.npmjs.org/eslint
npm verb request no auth needed
npm info attempt registry request try #1 at 16:17:48
npm verb request using bearer token for auth
npm verb request id f1120122477c2bf9
npm http request GET https://registry.npmjs.org/eslint
npm http 200 https://registry.npmjs.org/eslint
npm verb headers { 'content-type': 'application/json; charset=UTF-8',
npm verb headers server: 'UploadServer',
npm verb headers 'cache-control': 'max-age=300',
npm verb headers 'last-modified': 'Sat, 17 Mar 2018 14:17:19 GMT',
npm verb headers etag: '"5aad236f-8c747"',
npm verb headers 'x-npm-region': 'EU-East',
npm verb headers 'content-encoding': 'gzip',
npm verb headers 'content-length': '35168',
npm verb headers 'accept-ranges': 'bytes',
npm verb headers date: 'Sat, 17 Mar 2018 14:17:50 GMT',
npm verb headers via: '1.1 varnish',
npm verb headers age: '21',
npm verb headers connection: 'keep-alive',
npm verb headers 'x-served-by': 'cache-hhn1528-HHN',
npm verb headers 'x-cache': 'HIT',
npm verb headers 'x-cache-hits': '1',
npm verb headers 'x-timer': 'S1521296270.344240,VS0,VE1',
npm verb headers vary: 'Accept-Encoding, Accept' }
Hangs here.
Some debugging data:
v10.0.0-nightly2018011585739b6c5b is OK
v10.0.0-nightly20180116f75bc2c1a5 hangs
So the cause seems to be here: https://github.com/nodejs/node/compare/85739b6c5b...f75bc2c1a5
These frames are the beginning of the infinite spin: https://github.com/nodejs/node/blob/1329844a0808705091891175a6bee58358380af6/deps/npm/lib/update.js#L53 ->
https://github.com/nodejs/node/blob/1329844a0808705091891175a6bee58358380af6/deps/npm/node_modules/npmlog/log.js#L286 -> https://github.com/nodejs/node/blob/1329844a0808705091891175a6bee58358380af6/deps/npm/node_modules/npmlog/log.js#L194
From there, the data in recursive chains of util.format()
/util.inspect()
and their sub-calls begins to differ.
So maybe https://github.com/nodejs/node/pull/17907 is the cause.
We may need to fix this before v10 release.
cc @nodejs/util
This seems to be a formerly hidden bug in npm and not a bug in Node.js.
I do understand that this is something we have to find a solution for soon though. We might just use a high depth limit to circumvent issue like these but that is more like a hack than anything else and could only be a intermediate step. It will still be a recursive call up to the maximum limit and anyone who changes the default to unlimited is also going to run into this.
We could set the limit to e.g., 1000 for now to circumvent the issue. I am just not a fan of doing so because it will prevent anyone else from realizing that there is an issue with their code if they also have a unlimited recursion.
@vsemozhetbyt are you fine with opening a PR against npm to fix the issue? You already know how this works, so I guess it should be relatively straight forward?
@BridgeAR Sorry, I've tried to debug to the end but got more and more confused. I am afraid I do not know enough to propose a fix. I've posted an issue in npm, but it has no traction there yet.
@fishrock123, @MylesBorins, can you suggest somebody from npm
team to cc here for looking into this issue? Otherwise, we may release v10 with broken npm
.
@nodejs/npm please take a look at this. We have to find a solution for this problem before the next major Node.js release and this is therefore urgent.
/cc @zkat @iarna
FWIW, after some time node crashes:
Output:
> npm update -g
<--- Last few GCs --->
[3996:0000000000289770] 46265 ms: Mark-sweep 1409.4 (1458.9) -> 1409.4 (1458.9) MB, 1574.8 / 0.0 ms allocation failure GC in old space requested
[3996:0000000000289770] 47881 ms: Mark-sweep 1409.4 (1458.9) -> 1409.3 (1442.9) MB, 1615.7 / 0.0 ms last resort GC in old space requested
[3996:0000000000289770] 49469 ms: Mark-sweep 1409.3 (1442.9) -> 1409.4 (1442.9) MB, 1588.8 / 0.0 ms last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
0: ExitFrame [pc: 0000007ADC9041AF]
Security context: 00000000020A0439 <JSObject>
1: formatValue(aka formatValue) [000002D42128EFC9] [util.js:~388] [pc=0000007ADCA9F87E](this=0000014EA97022E1 <undefined>,ctx=00000245E14C8C91 <Object map = 00000318E8AC39C1>,value=00000259C14B2DE9 <Object map = 000002CE2A2023B9>,recurseTimes=0000014EA9702201 <null>,ln=0000014EA9702381 <true>)
2: formatProperty(aka formatProperty) [000002D42128F3...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: node::DecodeWrite
2: node_module_register
3: v8::internal::FatalProcessOutOfMemory
4: v8::internal::FatalProcessOutOfMemory
5: v8::internal::Heap::MaxHeapGrowingFactor
6: v8::internal::Factory::NewJSPromise
7: v8::internal::compiler::CodeAssembler::TailCallBytecodeDispatch<v8::internal::compiler::Node * __ptr64,v8::internal::compiler::Node * __ptr64,v8::internal::compiler::Node * __ptr64,v8::internal::compiler::Node * __ptr64>
8: v8::internal::compiler::NodeCache<std::pair<__int64,char>,v8::base::hash<std::pair<__int64,char> >,std::equal_to<std::pair<__int64,char> > >::Resize
9: v8::internal::compiler::NodeCache<std::pair<__int64,char>,v8::base::hash<std::pair<__int64,char> >,std::equal_to<std::pair<__int64,char> > >::Resize
10: v8::internal::compiler::NodeCache<std::pair<__int64,char>,v8::base::hash<std::pair<__int64,char> >,std::equal_to<std::pair<__int64,char> > >::Resize
11: v8::internal::compiler::NodeCache<std::pair<__int64,char>,v8::base::hash<std::pair<__int64,char> >,std::equal_to<std::pair<__int64,char> > >::Resize
12: 0000007ADC9041AF
To make sure: will it be blocking issue for the release?
@vsemozhetbyt I actually plan on having some further changes to the util.inspect
defaults and hope to get them ready soon. Can you please check if the issue is also already immanent if the depth
is set to 100 as a default?
Where should I change this? Is this env variable or is it module code and I should recompile?
Ping?
@vsemozhetbyt looks like the default depth
for util.inspect
is set here:
@AyushG3112 Thank you! I will try to build and check after https://github.com/nodejs/node/issues/19918 is clarified.
I've set depth
for util.inspect
to 100
here, build and test npm update
again.
Without this change, I get node
crashed after ~ a minute of max memory and CPU consumption.
With this change, I get an infinite loop (or at least several minutes loop) with max memory and CPU consumption.
@vsemozhetbyt thanks for checking this. It is bad that this does not solve the problem. Can you retry with 25? I would like to get something like this as a intermediate fix in. 25 should still be enough for pretty much all objects.
Refs: https://github.com/npm/npm/issues/20091#issuecomment-380897087
If this breaks npm then I would expect it to break other modules and test suites in the ecosystem. IMO Node should revert this change until it can land without breaking canary.
APIs like util.inspect should be considered largely untouchable unless they can be changed with an absolute assurance of backwards compatibility.
Refs https://github.com/nodejs/node/pull/17907#issuecomment-380902159.
I believe to have a fix: https://github.com/npm/npmlog/pull/58. If anyone would be so kind and give this a try, I do not have the time to test the fix right now :/.
The regression bit is from this change. Other methods, likeutil.format
, that defer to util.inspect
should probably be able to forward options, like depth
, to inspect
without juggling the global-ish util.inspect.defaultOptions
object.
@jdalton that it is that commit is clear. And I agree that util.format is lacking options. I am still wrapping my head around improving util.inspect options in general.
The point is just that it is weird that changing the default depth breaks any code. That seems to indicate an issue in the callee as far as I can tell. One thing that breaks is in case huge deep nested objects were passed in that can not be inspected in total. I am going to try to improve the memory overhead for that case.
The point is just that it is weird that changing the default depth breaks any code.
Makes sense to me. It went from a default depth of smth like 2
to Infinity
which will surely cause slowdowns or hangs on larger data sets.
@BridgeAR Unfortunately, I have the same issue with depth 25.
@vsemozhetbyt it will probably already cause issues one or two layers down from 2.
10 is OK)
@vsemozhetbyt @BridgeAR .... this may be a silly question but regardless of the depth
, do we know what about this particular data structure is causing the loop? Perhaps one approach here is to identify that and work around it rather than attempting to switch up the depth... The reason is straightforward... while this particular data structure from npm might work with a depth of 10, another that replicates the problem at a smaller depth may still fail.
@jasnell I was trying to debug:
https://github.com/nodejs/node/issues/19405#issuecomment-373966063
but I was stuck and get confused at some deep iteration. Maybe somebody more competent in npm
can go deeper with clearer understanding what is going on.
@jasnell that is absolutely correct. The same problem can happen with a depth of 2. I definitely want to look at the object but I did not get to it yet.
@vsemozhetbyt ... presumably the object being formatted includes a circular reference that's not currently being watched for. That's a good place to start digging.
Ok, did some further digging. I decided to set the util.inspect.defaultOptions.depth = 8
and see what happens. I modified npm's update.js to spit the serialized string out to the console. I then redirected that into a file that ended up being 6.6 M with over 81k lines. The javascript object includes the full details of every dependency down to the full content of the README.md files for each. So, essentially what is happening here is npm is taking an excessively massive single javascript object and pushing that into util.inspect()
and relying on the default value of 2
never changing. However, pushing an equally massive object with a flatter structure at depth < 2 would yield precisely the same "breakage".
The challenge here is not the depth default. It's the size of the object being pushed in and the fact that string generated is way too large. Rather than reverting the depth change, we should investigate how to deal with the root problem, which is user code pushing objects into inspect that are just generally too large.
Landed the revert of the change in v10.x-staging
Note: the revert landed in 10.x but this is still an open issue in master.
Most helpful comment
@vsemozhetbyt @BridgeAR .... this may be a silly question but regardless of the
depth
, do we know what about this particular data structure is causing the loop? Perhaps one approach here is to identify that and work around it rather than attempting to switch up the depth... The reason is straightforward... while this particular data structure from npm might work with a depth of 10, another that replicates the problem at a smaller depth may still fail.