Jest: Jest without --runInBand slow under Travis

Created on 15 Apr 2018 · 33Comments · Source: facebook/jest

Do you want to request a _feature_ or report a _bug_?

Bug

What is the current behavior?

Tests run fine locally or with -i under travis.
However running without -i in travis, tests take much longer (timing out).

If the current behavior is a bug, please provide the steps to reproduce and
either a repl.it demo through https://repl.it/languages/jest or a minimal
repository on GitHub that we can yarn install and yarn test.

Locally tests (with --ci) take Time: 7.319s, estimated 8s
Locally with --ci -i tests take Time: 16.517s, estimated 21s
Tests on travis-ci with -i completes in 67 seconds: https://travis-ci.org/fengari-lua/fengari/jobs/366723336#L378
Tests on travis-ci without -i take over 20 minutes and travis-ci times out: https://travis-ci.org/fengari-lua/fengari/jobs/366419164#L540

What is the expected behavior?

Tests shouldn't take > 20 minutes to run under travis when they take < 10 seconds locally.

Please provide your exact Jest configuration

No configuration. See repository for any details. https://github.com/fengari-lua/fengari/compare/v0.1.1...ebf18e2

Run npx envinfo --preset jest in your project directory and paste the
results here

I don't think this is the output you wanted...

$ npx envinfo --preset jest
npx: installed 1 in 1.326s
(node:11650) UnhandledPromiseRejectionWarning: TypeError: Cannot read property '1' of null
    at e.darwin.process.platform.linux.process.platform.c.run.then.e (/home/daurnimator/.npm/_npx/11650/lib/node_modules/envinfo/dist/cli.js:2:94055)
    at <anonymous>
(node:11650) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:11650) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Help Wanted

Source

daurnimator

👍1

Most helpful comment

It would nice if this were in the jest documentation as a behavior of CircleCI. I was getting the Error: spawn ENOMEM failures in CircleCI, but not locally, and the fix of running jest --maxWorkers=2 worked perfectly, but it look almost an hour for me to figure that out, and the suggestion to add that flag that is buried deep in another (closed) bug about the memory problems.

philvarner on 9 Aug 2019

👍2

All 33 comments

@daurnimator I've typically seen happen from Jest trying to use too many workers (we see the 32 on the machine, but only 2 or four are given to the VM). Afaik we don't have a way to see the number of CPUs given to the VM

Try updating the settings to use --maxWorkers with the number of CPUs Travis gives you

rickhanlonii on 15 Apr 2018

👍1

Have you tried running with --maxWorkers=2 for example? We run tests in band on Travis and the difference is not that big (maybe it's some temporary shortage of resources what apply to your repository). Probably nothing we can take care of unfortunately.

thymikee on 15 Apr 2018

Afaik we don't have a way to see the number of CPUs given to the VM

Can you use native APIs (or /proc/cpuinfo) to count the number of available CPU cores? However, that might not be the full story, as the process could probably be ulimited too.

Have you tried running with --maxWorkers=2 for example?

I wonder if Jest should detect common resource-limited environments (like Travis) and automatically reduce the number of workers.

Daniel15 on 15 Apr 2018

👍1

I wonder if Jest should detect common resource-limited environments (like Travis) and automatically reduce the number of workers.

Certainly something I would stamp.

thymikee on 15 Apr 2018

Yeah, that would be awesome. Might wanna reach out to the big CIs (travis, circle, appveyor, jenkins etc) and ask how to best find this information. And potentially get it into a module like https://www.npmjs.com/package/env-ci

SimenB on 15 Apr 2018

👍2

Have you tried running with --maxWorkers=2 for example?

That seemed to work. https://travis-ci.org/fengari-lua/fengari/jobs/366930759#L544

Can you use native APIs (or /proc/cpuinfo) to count the number of available CPU cores? However, that might not be the full story, as the process could probably be ulimited too.

No I don't think cpuinfo is correct. However I think the sched_getaffinity syscall would work.
I think nproc uses that syscall, which is part of coreutils, so that might be the easiest way?

daurnimator on 16 Apr 2018

I think nproc uses that syscall, which is part of coreutils, so that might be the easiest way?

Yep. --maxWorkers=$(nproc) was successful: https://travis-ci.org/fengari-lua/fengari/jobs/366937403

daurnimator on 16 Apr 2018

👍1

sysctl -n hw.physicalcpu should work for macOS builds. Unsure what the Windows equivalent is.

I still think we should reach out to CI providers and ask for this information to be stuck in the environment so we as consumers don't have to sniff out OS and then make system calls.

SimenB on 16 Apr 2018

sysctl -n hw.physicalcpu should work for macOS builds. Unsure what the Windows equivalent is.

We don't want physical cpus (that's the issue in the first place); we want available cpus inside of the container/cgroup/limitation in use.
However for OSX, it looks like no such limiting ability exists (see e.g. http://jesperrasmussen.com/2013/03/07/limiting-cpu-cores-on-the-fly-in-os-x/). so we can probably get away with using the sysctl (though sysctl -n hw.logicalcpu might be a better choice)

daurnimator on 17 Apr 2018

👍1

I think this can be closed now.

thymikee on 25 May 2018

I think this can be closed now.

How so?
It wasn't fixed.

daurnimator on 25 May 2018

👍2

I don't think there's anything sane we can do about it and there are known workarounds.

thymikee on 25 May 2018

I don't think there's anything sane we can do about it

The default for maxWorkers should be the result of nproc (the number of available processors), rather thant the current "number of processors".

daurnimator on 28 May 2018

PR welcome, place to change is in https://github.com/facebook/jest/blob/a1adaff21db93517e60be840303518490cd5d51f/packages/jest-config/src/get_max_workers.js

SimenB on 28 May 2018

Played around with it, and nproc works fine on travis, but on Circle in reports 32.

My script, which worked fine on travis (mac+linux) and appveyor (windows), but as mentioned _not_ circle:

'use strict';

const cp = require('child_process');
const os = require('os');

function getFromPlatform() {
    switch (process.platform) {
        case 'linux': {
            const result = cp.spawnSync('nproc');
            return Number(result.stdout.toString().trim());
        }
        case 'darwin': {
            const result = cp.spawnSync('sysctl', ['-n', 'hw.logicalcpu']);
            return Number(result.stdout.toString().trim());
        }
        case 'win32': {
            return Number(process.env.NUMBER_OF_PROCESSORS);
        }
    }
}

function processingUnits() {
  try {
    const result = getFromPlatform();

    if (typeof result === 'number' && !Number.isNaN(result) && result > 0) {
      return result
    }
  } catch (err) {
    // ignore
  }

  return null;
};

console.log({ res: processingUnits(), os: os.cpus().length });

Travis (linux): { res: 2, os: 32 }
Travis (mac): { res: 2, os: 2 }
Appveyor (windows): { res: 2, os: 2 }
CircleCI (linux): { res: 32, os: 32 }

So I'm back at wanting to reach out to CI providers and ask them to stick it in an environment variable so we can read it.

SimenB on 29 May 2018

Played around with it, and nproc works fine on travis, but on Circle in reports 32.

So I'm back at wanting to reach out to CI providers and ask them to stick it in an environment variable so we can read it.

Seems like CircleCI is a serial offender on this issue:

https://circleci.com/docs/2.0/configuration-reference/#resource_class

Java, Erlang and any other languages that introspect the /proc directory for information about CPU count may require additional configuration to prevent them from slowing down when using the CircleCI 2.0 resource class feature. Programs with this issue may request 32 CPU cores and run slower than they would when requesting one core. Users of languages with this issue should pin their CPU count to their guaranteed CPU resources

https://github.com/elm/compiler/issues/1473#issuecomment-247355474

https://discuss.circleci.com/t/use-libconfsyscpus-to-correctly-report-cpu-count/7128

https://discuss.circleci.com/t/how-many-cores-are-available-in-the-build-environment/3834/2

From those issues, it appears that their correct course of behaviour would not be to add an env var, but to make sure nproc returns the correct number.

I think for now, we should start using nproc as it even makes sense for local usage.
It's up to CircleCI as an outlier to fix their infrastructure.

daurnimator on 30 May 2018

nproc only works on linux, and having to spawn up a process sucks. It would be better if it was passed as config, IMO.

(although I agree nproc shouldn't lie 🙂)

SimenB on 30 May 2018

nproc only works on linux, and having to spawn up a process sucks.

You could create a node C library to call the sched_getaffinity syscall.

t would be better if it was passed as config, IMO.

It can! nproc is the sensible default: but if you provide your own value, we have no reason to fetch the default.

daurnimator on 30 May 2018

same problem here, we added the --runInBand and we went from 17 mins to 4 mins 😨

goldo on 17 Jul 2018

Help landing a module which reports the actual cpus available would be awesome, then we can use that over os.cpus().length. In the meantime you can do -w=2 instead of -i (or call nproc to be safe)

SimenB on 17 Jul 2018

I put together a repo testing it out: https://github.com/SimenB/available-cpu Help very much welcome!

I tried using http://npmjs.com/physical-cpu-count but that didn't help. Other ideas? The repo I linked above has CI set up, so feel free to open PRs to test ideas

SimenB on 30 Aug 2018

@SimenB a simple way to figure out how is to work backwards from the syscalls.
For linux you need something that calls the sched_getaffinity syscall: what do you have at your disposal that can do that? (simple answer: nproc). As a syscall its not going to be directly callable from node, and would need to be built in or be shipped as a native extension. I grepped the node and libuv sources and there is nothing in there that mentions sched_getaffinity.
So your answers are: shell out to nproc or depend on a node C library.

daurnimator on 31 Aug 2018

nproc doesn't work on circle as mentioned, that's what it's using now (https://github.com/SimenB/available-cpu/blob/3a18d76a9984ba086805ab02af1db1c006d5c123/index.js#L8-L10). It reports 32, same as oc.cpus().length

SimenB on 31 Aug 2018

nproc doesn't work on circle

Okay? I'm not sure what I didn't cover in my response here: https://github.com/facebook/jest/issues/5989#issuecomment-392996021

I say we just use nproc. We could add an override for circle (detected via env vars). But otherwise I say it's up to circle ci to fix their own infrastructure.

daurnimator on 31 Aug 2018

I can shoot them an email asking about why nproc behaves weirdly.

EDIT: https://circleci.com/ideas/?idea=CCI-I-578

SimenB on 31 Aug 2018

👍1

Is it due to running in a docker container? We've had similar issues to #5239 on https://buildkite.com running in node:10 docker container

azz on 2 Mar 2019

Yeah - when we ask for number of CPUs we get the underlying machine instead of the amount allocated to the vm/container. And there's no good way of getting the correct count

SimenB on 2 Mar 2019

I seem to always get the number of CPUs assigned to docker when running nproc, not the number of CPUs my machine has, even when setting --ulimit nproc=1. (macOS host)

$ docker run --rm node:10-alpine nproc
2
$ docker run --rm --ulimit nproc=1 node:10-alpine nproc
2

Is that expected?

azz on 2 Mar 2019

I don't think it's expected - it's a bit beyond what I know of ulimit, though. I (still) think the best would be if CI environments injected the number of available cores - inspecting it from the inside is not really predictable

SimenB on 2 Mar 2019

philvarner on 9 Aug 2019

👍2

I get similar behavior on my laptop with a hyperthreaded CPU (Intel(R) Core(TM) i7-9750H). Running Jest with defaults completely locks up all available CPU and slows the system to a crawl when it spawns 15 processes. I really think Jest should be using something like the logic in https://www.npmjs.com/package/physical-cpu-count as the default, not os.cpus().length.

ChrisCrewdson on 5 Mar 2020

@ChrisCrewdson physical-cpu-count reports 16 cores on Travis CI, which is 14 too many. Of course os.cpus().length reports 32, so it would be an improvement over what we have today, but still wrong. Might be worth it though

SimenB on 8 Apr 2020

👍1

@SimenB We have the same issue, but the advantage is we control our own build agents, so we have an environment variable with the correct number of CPUs.

Are you open to controlling the --max-workers option with an environment variable as well? This way we can set it and all our consumers will automatically get it.