Node: setTimeout callback executed before delay elapses

Created on 6 Dec 2016  Â·  34Comments  Â·  Source: nodejs/node

  • Version: 7.2.0
  • Platform: Ubuntu 16.10 64-bit

When the delay is set to 1, for about 5% of cases the logged time between executing setTimeout() and the callback execution is 0.

// main.js
const timeJustBeforeCallingSetTimeout = Date.now();
setTimeout(() => {
  const callbackExecutionTime = Date.now();
  console.log(callbackExecutionTime - timeJustBeforeCallingSetTimeout);
}, 1);
# bash script that runs above code 1000 times and counts cases when 0 is logged
rm -f results.txt
for ((n=0; n<1000; n++))
do
  node main.js >> results.txt
done
cat results.txt | grep 0 | wc -l

When the delay value is set to 2, there are no cases where the logged time is less than 2, which is an expected behavior.

blocked libuv question timers

All 34 comments

Cannot reproduce locally on a FC25 system.

A possible explanation is that Date.now() is backed by a CLOCK_REALTIME time source (gettimeofday) whereas node/libuv uses a CLOCK_MONOTONIC clock. If ntpdate is adjusting the system clock, you might see <= 0 intervals between two Date.now() calls.

What happens when you use process.hrtime() instead?

For code with process.hrtime() used instead of Date.now(), the percentage of cases with measured delay that is less than 1ms grows to 30%. The lowest time observed is ~930000ns.

// main.js
const timeJustBeforeCallingSetTimeout = process.hrtime();
setTimeout(() => {
  const delayBetweenMainCodeAndCallbackExecutionInNanoSec =
    process.hrtime(timeJustBeforeCallingSetTimeout)[1];
  console.log(delayBetweenMainCodeAndCallbackExecutionInNanoSec);
}, 1);
# bash script that runs js code 1000 times
rm -f results.txt
for ((n=0;n<1000;n++))
do
  node main.js >> results.txt
done
cat results.txt | grep '^.\{6\}$' | wc -l

@marzelin Could you try process.binding('timer_wrap').Timer.now() and see if it makes a difference? That is what timers use internally. (I think process.hrtime also uses that but not 100% sure.)

When time difference is measured with process.binding('timer_wrap').Timer.now() there are no cases where the difference is less than 1, which is the expected behavior.

// main.js
const timeJustBeforeCallingSetTimeout = process.binding('timer_wrap').Timer.now();
setTimeout(() => {
  const callbackExecutionTime = process.binding('timer_wrap').Timer.now();
  console.log(callbackExecutionTime - timeJustBeforeCallingSetTimeout);
}, 1);
# bash script that runs js code 1000 times and counts cases where time difference is 0
rm -f results.txt
for ((n=0;n<1000;n++))
do
  node main.js >> results.txt
done
cat results.txt | grep 0 | wc -l

@bnoordhuis any idea?

@marzelin Are you running Ubuntu inside a VM or a container? I know VirtualBox has (had?) clock skew bugs in the past, perhaps other products do too.

@bnoordhuis Ubuntu is loaded directly by UEFI, no VMs are involved. On Windows 10 (same PC and node version) results are similar.

I have been able to reproduce this a few times now (over 100,000s of runs) but so far I've only been able to establish that the kernel seems to return early from the epoll_wait() system call for no discernible reason whatsoever. This is with a stock 4.8.8-300.fc25.x86_64 kernel.

@marzelin Re: Windows: did you test with Date.now(), Timer.now() or process.hrtime()?

@bnoordhuis All of them, but I did only a few tests there. setTimeout() seems to behave in the same way as on Ubuntu.

I'm coming around to the idea that this might be a hardware issue. I've so far only been able to reproduce on one system and I noticed that this system's hardware clock drifts by one or two seconds every few hours.

@marzelin Can you post the output of:

$ cat /sys/devices/system/clocksource/clocksource*/available_clocksource
# And
$ cat /sys/devices/system/clocksource/clocksource*/current_clocksource

Your system probably only has one hardware clock. Try switching to different sources (e.g. from tsc to hpet) and see if the problem goes away.

EDIT: Oh, and please disable anything that syncs the clock; ntpd, ntpdate cron jobs, etc.

@bnoordhuis

$ cat /sys/devices/system/clocksource/clocksource*/available_clocksource
tsc hpet acpi_pm

$ cat /sys/devices/system/clocksource/clocksource*/current_clocksource
tsc

I've got the same results when clocksource is set to hpet or acpi_pm.

I did some other tests and it seems like process.binding('timer_wrap').Timer.now() takes some arbitrary point of time and looks only at number of milliseconds, omitting the fractional part.

So for example if we have point in time:
123 ms and 990,000 ns
after 10,000 ns Timer.now() will tell 1ms has passed, but in reality only 0,01 ms has passed as process.hrtime will show.

Node's timers have always been borken, check it out:
https://gist.github.com/xk/eb2e75c11ef0435a43438a373f8c0113~~

Edit:
Ooops, I was testing this with an old node v0.10.26 which errs all the time but newer nodes do not, at least since v0.12.0, or so it seems... Great! Thanks! :-)

@bnoordhuis any further thoughts?

I'm still investigating this although I probably won't revisit it until January. You can assign it to me if you want.

I suspect that this is because we track loop->time in libuv in milliseconds.
loop->time = uv__hrtime(UV_CLOCK_FAST) / 1000000
So the difference between 2 different loop->time values could potentially show a 1 millisecond difference even when 1 nanosecond has actually elapsed and thus trigger timers too early.
This could be fixed by tracking loop->time in nanoseconds instead of milliseconds.
I tested this change and it seems to fix the issue (no more 0 second delays).
Let me know if this solution makes sense.
https://github.com/jBarz/node-core/commit/84fbba59883d8d1cd09215b33793e3de7d8a4e58

Can you bring this up on the libuv repo? Thanks!

On Thu, Jan 12, 2017 at 10:53 PM, John Barboza notifications@github.com
wrote:

I suspect that this is because we track loop->time in libuv in
milliseconds.
loop->time = uv__hrtime(UV_CLOCK_FAST) / 1000000
So the difference between 2 different loop->time values could potentially
show a 1 millisecond difference even when 1 nanosecond has passed.
This could be fixed by tracking loop->time in nanoseconds instead of
milliseconds.
I tested this change and it seems to fix the issue (no more 0 second
delays).
Let me know if this solution makes sense.
jBarz@84fbba5
https://github.com/jBarz/node-core/commit/84fbba59883d8d1cd09215b33793e3de7d8a4e58

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/nodejs/node/issues/10154#issuecomment-272360892, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AATYGBCyzKHaiXbrsoMEy9vfQ6ne5YjIks5rRwNJgaJpZM4LFyZq
.

--
/Saúl
bettercallsaghul.com

I have submitted a PR on the libuv repo
https://github.com/libuv/libuv/pull/1191

Should this be closed (or at least labeled blocked) unless and until the libuv issue gets resolved?

Labeled blocked as (IIUC) nothing can happen until https://github.com/libuv/libuv/pull/1191 lands first.

Looks like I have the same issue which stability reproduced on the following test:

'use strict';

const assert = require('assert');

const DELAY = 10;

function call() {
    const start = process.hrtime();
    setTimeout(() => {
        const diff = process.hrtime(start);
        const [seconds, nanoseconds] = diff;
        const ms = seconds * 1e3 + nanoseconds * 1e-6;
        console.log(ms);
        assert(ms >= DELAY);
        call();
    }, DELAY);
}

call();
[kostya@arch ~]$ node -v
v8.1.3
[kostya@arch ~]$ node t
11.467041
9.704634
assert.js:60
  throw new errors.AssertionError({
  ^

AssertionError [ERR_ASSERTION]: false == true
    at Timeout.setTimeout [as _onTimeout] (/home/kostya/t.js:14:9)
    at ontimeout (timers.js:488:11)
    at tryOnTimeout (timers.js:323:5)
    at Timer.listOnTimeout (timers.js:283:5)


Next 3 consecutive runs:

[kostya@arch ~]$ node t
12.213059
9.567727999999999
assert.js:60
  throw new errors.AssertionError({
  ^

AssertionError [ERR_ASSERTION]: false == true
    at Timeout.setTimeout [as _onTimeout] (/home/kostya/t.js:14:9)
    at ontimeout (timers.js:488:11)
    at tryOnTimeout (timers.js:323:5)
    at Timer.listOnTimeout (timers.js:283:5)
[kostya@arch ~]$ node t
11.104375
9.571886
assert.js:60
  throw new errors.AssertionError({
  ^

AssertionError [ERR_ASSERTION]: false == true
    at Timeout.setTimeout [as _onTimeout] (/home/kostya/t.js:14:9)
    at ontimeout (timers.js:488:11)
    at tryOnTimeout (timers.js:323:5)
    at Timer.listOnTimeout (timers.js:283:5)
[kostya@arch ~]$ node t
10.487898999999999
10.68757
10.291964
10.234295999999999
10.246357999999999
10.16991
10.187873
10.203797999999999
9.151824
assert.js:60
  throw new errors.AssertionError({
  ^

AssertionError [ERR_ASSERTION]: false == true
    at Timeout.setTimeout [as _onTimeout] (/home/kostya/t.js:14:9)
    at ontimeout (timers.js:488:11)
    at tryOnTimeout (timers.js:323:5)
    at Timer.listOnTimeout (timers.js:283:5)

@Trott Could you revisit this issue? Looks like this issue can be solved on nodejs side https://github.com/libuv/libuv/pull/1191#issuecomment-322819771.

@ikokostya Your repro triggers the assertion for me on Node.js 8.4.0 but not on 9.0.0-pre (current master branch). Any chance you can compile from the master branch or grab the most recent nightly and see if it's fixed for you there?

The same error:

[kostya@arch node-v9.0.0-nightly20170904b24e269a48-linux-x86]$ ./bin/node -v
v9.0.0-nightly20170904b24e269a48
[kostya@arch node-v9.0.0-nightly20170904b24e269a48-linux-x86]$ ./bin/node t
10.539683
10.501904
10.166082
10.186332
10.193605
10.178288
10.197446
10.175659
10.184133
10.203318
10.162325
10.159507
10.157172
9.153143
assert.js:41
  throw new errors.AssertionError({
  ^

AssertionError [ERR_ASSERTION]: false == true
    at Timeout.setTimeout [as _onTimeout] (/home/kostya/Downloads/node-v9.0.0-nightly20170904b24e269a48-linux-x86/t.js:14:9)
    at ontimeout (timers.js:471:11)
    at tryOnTimeout (timers.js:305:5)
    at Timer.listOnTimeout (timers.js:265:5)

Interesting that I can't replicate on macOS.

@Fishrock123 @misterdjules Any thoughts on the fix for https://github.com/nodejs/node/issues/10154#issuecomment-321894774 suggested by @JBarz at https://github.com/libuv/libuv/pull/1191#issuecomment-322819771?

Any news? I still get error for nodejs 9.0.0.

@bnoordhuis Can anything be done here?

@jBarz Per https://github.com/libuv/libuv/pull/1191#issuecomment-322819771, do you want to open a pull request here?

@bnoordhuis @jBarz I'm not sure the proposed fix there solves the issue. We already have something very similar (setting timeRemaining = 1) which works fine for most cases but it only runs when diff < msecs which in the reproduction above from @ikokostya is not actually the case.

I've tried the fix myself and it seems to resolve the issue.

@apapirovski Are you able to recreate the error using @ikokostya test case with the proposed fix?

You're right, the case is not diff < msecs
However, the case fails because (TimerWrap.now() - timer._idleStart) has at most a 1 millisecond rounding error because they both capture time by dividing the high resolution timer by 1000000.
The fix is simply adding 1ms to timeRemaining for all cases.
I can open a PR if this sounds ok.

@jBarz yeah, I see what you mean now. Not a huge fan of adding 1ms to all timers (the magic number is a bit of a code smell) that run through that block but I'm not really seeing another solution. I suppose it depends on whether we consider this important to fix or if infrequent 1ms inaccuracy is acceptable. FWIW it should happen a lot less now than it used to when this issue was made.

Yea, I agree that adding 1ms to all timers is not ideal.
I guess we could consider that test case as invalid because it is using the high resolution process.hrtime to measure accuracy of the low res uv_now.
So nothing to fix here.

Okay, I'll close this out then.

This can be fixed by: https://github.com/nodejs/node/pull/20555

(I've had the test case running for an hour with no failures. Pretty sure it resolves the issue.)

Edit: I've taken out that bit because it made that PR too big but it'll be fixed in a follow-up PR. Either way, I'm keeping this open since it can be solved by Node.js.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dfahlander picture dfahlander  Â·  3Comments

Icemic picture Icemic  Â·  3Comments

stevenvachon picture stevenvachon  Â·  3Comments

loretoparisi picture loretoparisi  Â·  3Comments

seishun picture seishun  Â·  3Comments