Node: Should internal modules be affected by monkey-patching?

Created on 15 Feb 2018 · 49Comments · Source: nodejs/node

Many internal modules call methods defined on prototype objects that can be modified (or monkey-patched) in userland. For example, by overriding Object.prototype.getOwnProperty, it is possible to affect async hooks.

One way to fix this is to keep copies of these functions at boostrap, and use these untampered copies inside internal modules.

Optionally, these copies can be exported through a new internal module so that user code can get hold of untampered copies if they intend to.

Advantages of this approach:

The way internal modules are implemented remain implementation details and can be changed without risking breaking user code that somehow rely on monkey-patching.
Internal modules behave predictably.
Usage of JavaScript builtins no longer introduce hidden APIs to override behavior to internal modules.

Advantages of current code:

Internal modules can be monkey-patched, e.g. for instrumentation.
Internal modules implementation are easier to read.
No additional overhead during bootstrap.

I think it is desirable to have consensus over whether internal modules should be affected by monkey-patching. This does not have to be a either/or decision.

Coming from the JavaScript spec, there are JavaScript builtins that are affected by monkey-patching too, intentionally. For example, by overriding RegExp.prototype.exec, the behavior of other RegExp builtins can be changed. Yet another example is the species constructor. Generally though, JavaScript builtins call into internal builtins that cannot be altered.

I think it is important to clearly specify and document which global state (i.e. result of monkey-patching) affect behavior of internal modules rather than having implicit APIs though monkey-patching. Personally, I think that - unless specified otherwise - monkey-patching should not affect internal modules. It needs to be a conscious design choice.

FWIW this problem would become very apparent if Node.js had a formal specification for its internal modules :)

Ref: https://github.com/nodejs/node/pull/18773
CC: @benjamingr @devsnek

discuss lib / src

Source

hashseed

Most helpful comment

I'm in favor of Node core's internals being resistant to monkey patching. I don't think we need to expose the untampered functions to users though.

cjihrig on 15 Feb 2018

👍3

All 49 comments

I think one of the most important points that you made in https://github.com/nodejs/node/pull/18773 is:

All APIs that V8 offer in C++ use the original builtin. Including v8::Object::GetOwnPropertyNames. Let's say, just hypothetically, Node.js core wants to move the implementation of some functions from JS to C++, it then must not use v8::Object::GetOwnPropertyNames, but instead load and call global.Object.getOwnPropertyNames. Otherwise it would observably change behavior.

That is, every time we move code from/to C++ we break any patching userland has for the said call.

Now, another good thing to consider is that there have been cases where overriding built-ins has been tremendously useful when polyfilling - for example https://github.com/drses/weak-map/blob/master/weak-map.js (the WeakMap polyfill).

benjamingr on 15 Feb 2018

if users want to patch fs.readFile or something that's cool (although I'm personally apposed to it without hooks) but changing Object.prototype.toString shouldnt change how fs.readFile works

devsnek on 15 Feb 2018

👍2

I'm in favor of Node core's internals being resistant to monkey patching. I don't think we need to expose the untampered functions to users though.

cjihrig on 15 Feb 2018

👍3

@devsnek

if users want to patch fs.readFile or something that's cool (although I'm personally apposed to it without hooks) but changing Object.prototype.toString shouldnt change how far.readFile works

Probably, but it's more subtle than that - going back to the the WeakMap example case - users changed GetOwnPropertyNames since a new ("hidden") key was added to an object. That key had to be "secret" and not impact other methods - so if util.inspect was "tamper proof" then it would have been impossible to polyfill WeakMap entirely.

Note: Another thing we can do is to explicitly specify that changing built ins is dangerous and doing so is not supported in Node.js .

I'm not sure that's a good idea as it violates "least surprise" and the current behavior is unspecified anyway. Note that fixing this and ensuring builtins are used should be a pretty big code change and will create overhead in maintaining the code since basically built-in instance methods won't be used.

If we do this we need to make sure it doesn't impact performance (or improves it, since technically it's provable which function it is now), and that it's not problematic maintenance wise.

benjamingr on 15 Feb 2018

👍2

I feel like this is a deep rabbit hole. Anyone could override anything, so now any and all object/prototype methods need to be saved up front, which is crazy because there are so many and they're used throughout the codebase. For example: string.trim(), string.toLowerCase(), string.slice(), Object.keys(), Number.isNaN(), etc.

Even if it was decided it's a good idea to do this, we'd need to also maintain a giant list of methods we use anywhere in node core.

mscdex on 15 Feb 2018

@mscdex are you concerned about memory use or time spent upon startup? One way to do it is to simply shallow clone these prototype objects.

hashseed on 15 Feb 2018

@hashseed I'm not even considering performance implications at this point. Just having to touch almost every line of code in node core, getting new collaborators to understand the change, maintaining these methods/prototypes, etc.

mscdex on 15 Feb 2018

I wonder if there is a way from V8 land to "lock" the way Node core uses these objects - but I can't think of anything out of the box.

benjamingr on 15 Feb 2018

Just having to touch almost every line of code in node core

I don't have a problem with that if it's for the greater good. Most of it is rote search-and-replace, perhaps even mechanical replace.

bnoordhuis on 15 Feb 2018

👍1

Could make for a good Code-and-Learn exercise :-)

jasnell on 15 Feb 2018

👍1

My position on this aligns with @mscdex. There's some parts that are not completely unintuitive because people have likely seen hasOwnProperty.call(obj, 'something') before, but then there's stuff like Promise.prototype.then monkey-patching and needing to have PromiseThen that is honestly going to be nightmare to explain to new contributors.

Also, it's very likely that uncurryThis carries a performance penalty so switching over the entirety of core to use it will lead to a broad performance penalty for the whole of Node.js.

if we're going down this route, my preference is to try & find a path that leads to the core being able to lock or have our own versions of these primitives, as mentioned by @benjamingr.

I wonder if there is a way from V8 land to "lock" the way Node core uses these objects - but I can't think of anything out of the box.

I know this is a more complicated route (that might not be feasible with what we have available right now) but at least it would keep the code maintainable and easy to understand for new contributors. Having to write PromiseThen(promise, fn); is — to me — not an appealing proposition.

(And if we're comparing this to the C++ layer, having our own versions of Object, Promise, etc. would more closely resemble the C++ side than storing references to all the namespaced & prototype functions.)

Also, to further elaborate, this would still need to go hand in hand with being more defensive on anything being passed in from the users. For example, if a user is allowed to pass in a settings object then we shouldn't assume that it has hasOwnProperty or that it even has a non-null prototype.

This aligns with the recent transition to using Reflect.apply on user provided callbacks.

apapirovski on 15 Feb 2018

👍1

having our own versions of Object, Promise, etc. would more closely resemble the C++ side than storing references to all the namespaced & prototype functions

This does not work, if you take user-defined objects as arguments and leak internal versions via return values.

hashseed on 15 Feb 2018

This does not work, if you take user-defined objects as arguments

As I mention above, we would still need to be more defensive for user-provided objects but that's a much lesser level of change than having to be defensive about the entirety of internal Object, String, etc. usage. User-provided callbacks were one of the main culprits and then we have some infrequent hasOwnProperty access, but not much more than that.

leak internal versions via return values.

Hence me saying this might not be possible currently. That said, if our internal versions could be frozen but without the currently present performance impact, then it would technically be possible, no?

The fact is that it's hard to know where to draw the line. What about Buffer? It's something that's provided by node itself but it's crucial to the inner workings of at least half of core. And if Buffer is included, what else is?

apapirovski on 15 Feb 2018

in my mind i kind of split this into two example cases:

a) util.inspect getting property keys
b) repl using string replacement

for A i would use unsaved prototype methods to allow users to heck with it, but for B there is no benefit to the repl (in fact it can become impossible to use) if someone overrides String.prototype.replace for some reason, so we would use a saved proto method here.

like hashseed said its ok to continue to have planned monkey patching but it shouldn't just randomly happen.

devsnek on 15 Feb 2018

in my mind i kind of split this into two example cases:

I think @hashseed's point is that this is generally underspecified in our whole API and that we should probably consider specifying this. (feel free to comment if I misrepresented your intention)

That is, that an effort to go over Node's API in its entirety and specify (at least for internal use) what methods may impact it would be beneficial. I agree with that.

I think we should also discuss how we should do this (split work?) and get some initial data.

b) repl using string replacement

I'm really not sure, I honestly can't think of a case where I'd want to override this but I wouldn't forsee the WeakMap for util.inspect either.

The issue is that this effects _older_ versions of Node (new versions of Node rarely need polyfills).

benjamingr on 15 Feb 2018

One other thing to keep in mind here is the non-trivial burden that going all in with this will creating on backporting changes to LTS lines if this mechanism is not also backported.

jasnell on 15 Feb 2018

👍1

I don't know how the shallow cloning of prototypes would work, but currently even if you save hasOwnProperty early, someone could still override Function.prototype.call after the fact and still completely disrupt your hasOwnProperty.call(...).

mscdex on 15 Feb 2018

@mscdex call(hasOwnProperty, ...) :D Although I see your point

benjamingr on 15 Feb 2018

One other thing to keep in mind here is the non-trivial burden that going all in with this will creating on backporting changes to LTS lines if this mechanism is not also backported.

Yes, this was kind of what I was hinting at earlier with having to touch the entire codebase.

mscdex on 15 Feb 2018

@mscdex call(hasOwnProperty, ...) :D Although I see your point

Right, and having to code like that would be pretty absurd IMHO, even if you wrote a wrapper for it.

mscdex on 15 Feb 2018

👍1

as a reference I created https://gist.github.com/devsnek/76aab1dcd169c96b952fbe8c74404475. around 350 functions are saved. L27 exposes it to users.

devsnek on 15 Feb 2018

We would also need to store the constructors to avoid global.Array = function() {/* user code */} and pass them into the internal wrappers.

apapirovski on 15 Feb 2018

Another way I can think of is to store copies of builtin methods under symbols. In internal modules you would invoke builtins via symbols.

hashseed on 15 Feb 2018

One possible approach that carries along a whole range of it's own complexities is to use a separate context (e.g. as in vm.runInNewContext()) for internal code vs. user code. While it would be disruptive in a number of cases, it would prevent a number of issues being discussed here.

jasnell on 15 Feb 2018

One possible approach that carries along a whole range of it's own complexities is to use a separate context

Unfortunately that does not work. When you call arg.getOwnProperty, you call the one passed with the arg object on the prototype, not the one on the object prototype on from the current context.

hashseed on 15 Feb 2018

I just checked with @bmeurer that TurboFan handles Reflect.apply. With the following micro-benchmark I observe some regression on Node 8, but no difference at all on current Node master.

const ReflectApply = Reflect.apply;

function uncurryThis(func) {
  return (thisArg, ...args) => ReflectApply(func, thisArg, args);
}
var StringStartsWith = uncurryThis(String.prototype.startsWith);

var s = "Hi Matteo";
function test1() {
  var r = 0;
  console.time("monkey-patchable");
  for (var i = 0; i < 1E7; i++) {
    if (s.startsWith(i)) r++;
  }
  console.timeEnd("monkey-patchable");
  return r;
}

function test2() {
  var r = 0;
  console.time("cached original");
  for (var i = 0; i < 1E7; i++) {
    if (StringStartsWith(s, i)) r++;
  }
  console.timeEnd("cached original");
  return r;
}

test1();
test2();

Is it really that much more unreadable to rename the function and move the receiver to the first argument? You don't have to use some .call pattern here, and there seems to be no performance penalty.

Having to write PromiseThen(promise, fn); is — to me — not an appealing proposition.

I'd argue that you should be writing async/await in most cases anyways.

hashseed on 16 Feb 2018

❤1 👍1

is anyone still -1 on this? @mcollina is this still a tsc agenda item?

devsnek on 27 Feb 2018

@devsnek I have not put this as tsc-agenda (nor has anyone). Maybe you are referring to a different issue?

I'm +0 with this change. It's unrealistic to think we could put Node into a complete lockdown, as there will be exceptions. It will also complicate our codebase significantly, and it will make backporting harder.

I'm 👍 in exposing untampered versions of the globals. This can be helpful anyway.
I would also recommend using them when there is a need, rather than doing this _everywhere_.

mcollina on 27 Feb 2018

I think what we should do at this point is _research_ about the impact of overriding globals at the moment and then formulate a more concrete opinion that will lead to a resolution of either:

Explicitly documenting the current status or something similar
Explicitly documenting that tampering with globals is not supported
Figuring out a technical solution for the tampered globals from the V8 side
Locking down Node.js functions one by one.

benjamingr on 27 Feb 2018

I took a slightly different approach for my implementation. I tried adding builtin generics, much like the primordials concept. It took a while and did complicate things. There was also some perf overhead for all the indirection.

I unwound it a considerable bit by having two contexts instead one. One for internals and the the public exposed one. I then only used the Generic classes, likeGenericArray, on things that are public facing. This allowed me to write in the familiar JS style as before while isolating the code from tampering. Along with Generic classes for interacting with user-tamperable data I also expose an __external__ to the isolated context so that I can create values like new ExObject or new ExArray for things exposed to users.

This has allowed me to avoid rewriting hundreds of method calls, and enabled me to include third-party code, like acorn, while having less than a dozen points where I handle the internal/external boundary.

jdalton on 28 Feb 2018

@jdalton that's a good idea, however as @hashseed mentioned:

Unfortunately that does not work. When you call arg.getOwnProperty, you call the one passed with the arg object on the prototype, not the one on the object prototype on from the current context.

If there is a way around that that would be optimal.

benjamingr on 28 Feb 2018

@benjamingr

Unfortunately that does not work. When you call arg.hasOwnProperty, you call the one passed with the arg object on the prototype, not the one on the object prototype on from the current context.

That's not a problem. That's a value coming from one context coming into another. In this case I have a has helper that is created in the isolated context so I can do has(arg, "foo") and it doesn't matter if arg is from the exposed context or the isolated context or if arg has its own custom hasOwnProperty method or if arg has a null [[Prototype]].

One of the other nice things is primitives handled in the isolated context, even if created from a method in the exposed context will be boxed in the isolated context too.

jdalton on 28 Feb 2018

I'm 👍 in exposing untampered versions of the globals. This can be helpful anyway.

-0 on this, maybe if they were abstracted through util or something (util.primordials.ArrayMap), and definitely need to be locked down as much as possible

I would also recommend using them when there is a need, rather than doing this everywhere.

definitely. i would like to get a team of people together to audit the codebase and find places where we want this.

generics

generics are a tempting idea but as you said yourself they have a perf hit and their behavior might be less clear than explicit calls of well-known methods from javascript's primordials

devsnek on 28 Feb 2018

generics are a tempting idea but as you said yourself they have a perf hit and their behavior might be less clear than explicit calls of well-known methods from javascript's primordials

The terminology of primordials and generics are interchangeable in this bit. Instead of coining mind "primordials" I went with "generics", because it's easier to type and has historical precedent. Their implementation is essentially the same. So when I say there was performance overhead with generics you can read that as performance overhead with primordials.

jdalton on 28 Feb 2018

👍1

In the context of this conversation i think primordials are the big global objects that javascript (and we) create (Array, Symbol) and generics are the methodology from js1.6 where you could use Array.every (which doesn't exist anymore) on anything iterable. In this case generics are going to be much slower than the uncurryThis we are looking at currently because v8 appears has specific perf case for uncurryThis and generics have to work with many types calling down to other behaviors

devsnek on 28 Feb 2018

I'm referring to primordials in the context of the Node helpers written to handle handle methods from unknown origins. The term used for those helpers was primordials. I pointed to the docs for the source of the name generics, not using the actual Array.every (that doesn't exist in v8). I'm saying I implemented generics in the same way as uncurryThis. Who knew the name would throw folks off here :yum:

jdalton on 28 Feb 2018

👍1

I think I'll pick this up at the next open source event we're doing and get 10 or so community members messing with functions and reporting how Node behaves in different scenarios - as well as searching GitHub and NPM for initial data.

I think that should get us the initial data we need to present to the TSC in order to make a push for a more consistent API in terms of overriding builtins.

This will likely be mid-late March (and sounds like a fun activity people unfamiliar with our code base can help with), if anyone has an earlier time frame feel free to "steal" this.

benjamingr on 28 Feb 2018

👍1

I'm going to self assign, I don't think this needs TSC attention until we have said data but feel free to disagree or bring other opinions.

benjamingr on 28 Feb 2018

@benjamingr is this something you're interested in doing at a community event?

devsnek on 22 Apr 2018

@devsnek still interested in this but have been a little overwhelmed.

benjamingr on 22 Apr 2018

There is a proposal by @domenic that would address this.

hashseed on 31 May 2018

👍2

@hashseed thanks, that's useful. I should have probably gotten around to this by now but I ended up working on promise-use-cases in the last event. I'll do my best to get to this in the next event (June 16th).

(Random note: I _think_ domenic asked not to be pinged in the nodejs repos by the way - so we should probably not ping him without checking)

benjamingr on 31 May 2018

That proposal isn’t solely necessary, fwiw - node could load a single file at the start of the process that caches all the intrinsics and provides a module to expose them internally. The challenge - with that or with domenic’s proposal - would be enforcing the pattern’s usage via linting and code review.

ljharb on 31 May 2018

👍1

@ljharb see https://github.com/domenic/get-originals/issues/14 - I'm thinking about automatic conversion here rather than changing our code.

The proposal is something we'd potentially want to use after we've mapped how our internals react to these changes.

benjamingr on 1 Jun 2018

I'm not sure how you could reliably automatically convert prototype methods; babel has the same challenge.

ljharb on 1 Jun 2018

@ljharb I'm not sure what you mean? If you have a concern regarding automatic conversion please do speak up in the get-originals repo.

benjamingr on 1 Jun 2018

👍1

Should this remain open? Or is this a discussion that has run its course and can be closed?

Trott on 26 Oct 2018

I’d still like to see a policy decision that trends towards robustness, if possible.

ljharb on 26 Oct 2018

Given that there has not been any further discussion on this in 1.5 years and given that we've started moving to the use of primordials and protecting key paths from monkeypatching, I believe this issue can be closed.

jasnell on 26 Jun 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings