Jest: Jest without --runInBand slow under Travis

Created on 15 Apr 2018  路  33Comments  路  Source: facebook/jest

Do you want to request a _feature_ or report a _bug_?

Bug

What is the current behavior?

Tests run fine locally or with -i under travis.
However running without -i in travis, tests take much longer (timing out).

If the current behavior is a bug, please provide the steps to reproduce and
either a repl.it demo through https://repl.it/languages/jest or a minimal
repository on GitHub that we can yarn install and yarn test.

What is the expected behavior?

Tests shouldn't take > 20 minutes to run under travis when they take < 10 seconds locally.

Please provide your exact Jest configuration

No configuration. See repository for any details. https://github.com/fengari-lua/fengari/compare/v0.1.1...ebf18e2

Run npx envinfo --preset jest in your project directory and paste the
results here

I don't think this is the output you wanted...

$ npx envinfo --preset jest
npx: installed 1 in 1.326s
(node:11650) UnhandledPromiseRejectionWarning: TypeError: Cannot read property '1' of null
    at e.darwin.process.platform.linux.process.platform.c.run.then.e (/home/daurnimator/.npm/_npx/11650/lib/node_modules/envinfo/dist/cli.js:2:94055)
    at <anonymous>
(node:11650) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:11650) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Help Wanted

Most helpful comment

It would nice if this were in the jest documentation as a behavior of CircleCI. I was getting the Error: spawn ENOMEM failures in CircleCI, but not locally, and the fix of running jest --maxWorkers=2 worked perfectly, but it look almost an hour for me to figure that out, and the suggestion to add that flag that is buried deep in another (closed) bug about the memory problems.

All 33 comments

@daurnimator I've typically seen happen from Jest trying to use too many workers (we see the 32 on the machine, but only 2 or four are given to the VM). Afaik we don't have a way to see the number of CPUs given to the VM

Try updating the settings to use --maxWorkers with the number of CPUs Travis gives you

Have you tried running with --maxWorkers=2 for example? We run tests in band on Travis and the difference is not that big (maybe it's some temporary shortage of resources what apply to your repository). Probably nothing we can take care of unfortunately.

Afaik we don't have a way to see the number of CPUs given to the VM

Can you use native APIs (or /proc/cpuinfo) to count the number of available CPU cores? However, that might not be the full story, as the process could probably be ulimited too.

Have you tried running with --maxWorkers=2 for example?

I wonder if Jest should detect common resource-limited environments (like Travis) and automatically reduce the number of workers.

I wonder if Jest should detect common resource-limited environments (like Travis) and automatically reduce the number of workers.

Certainly something I would stamp.

Yeah, that would be awesome. Might wanna reach out to the big CIs (travis, circle, appveyor, jenkins etc) and ask how to best find this information. And potentially get it into a module like https://www.npmjs.com/package/env-ci

Have you tried running with --maxWorkers=2 for example?

That seemed to work. https://travis-ci.org/fengari-lua/fengari/jobs/366930759#L544

Can you use native APIs (or /proc/cpuinfo) to count the number of available CPU cores? However, that might not be the full story, as the process could probably be ulimited too.

No I don't think cpuinfo is correct. However I think the sched_getaffinity syscall would work.
I think nproc uses that syscall, which is part of coreutils, so that might be the easiest way?

I think nproc uses that syscall, which is part of coreutils, so that might be the easiest way?

Yep. --maxWorkers=$(nproc) was successful: https://travis-ci.org/fengari-lua/fengari/jobs/366937403

sysctl -n hw.physicalcpu should work for macOS builds. Unsure what the Windows equivalent is.

I still think we should reach out to CI providers and ask for this information to be stuck in the environment so we as consumers don't have to sniff out OS and then make system calls.

sysctl -n hw.physicalcpu should work for macOS builds. Unsure what the Windows equivalent is.

We don't want physical cpus (that's the issue in the first place); we want available cpus inside of the container/cgroup/limitation in use.
However for OSX, it looks like no such limiting ability exists (see e.g. http://jesperrasmussen.com/2013/03/07/limiting-cpu-cores-on-the-fly-in-os-x/). so we can probably get away with using the sysctl (though sysctl -n hw.logicalcpu might be a better choice)

I think this can be closed now.

I think this can be closed now.

How so?
It wasn't fixed.

I don't think there's anything sane we can do about it and there are known workarounds.

I don't think there's anything sane we can do about it

The default for maxWorkers should be the result of nproc (the number of available processors), rather thant the current "number of processors".

Played around with it, and nproc works fine on travis, but on Circle in reports 32.

My script, which worked fine on travis (mac+linux) and appveyor (windows), but as mentioned _not_ circle:

'use strict';

const cp = require('child_process');
const os = require('os');

function getFromPlatform() {
    switch (process.platform) {
        case 'linux': {
            const result = cp.spawnSync('nproc');
            return Number(result.stdout.toString().trim());
        }
        case 'darwin': {
            const result = cp.spawnSync('sysctl', ['-n', 'hw.logicalcpu']);
            return Number(result.stdout.toString().trim());
        }
        case 'win32': {
            return Number(process.env.NUMBER_OF_PROCESSORS);
        }
    }
}

function processingUnits() {
  try {
    const result = getFromPlatform();

    if (typeof result === 'number' && !Number.isNaN(result) && result > 0) {
      return result
    }
  } catch (err) {
    // ignore
  }

  return null;
};

console.log({ res: processingUnits(), os: os.cpus().length });

Travis (linux): { res: 2, os: 32 }
Travis (mac): { res: 2, os: 2 }
Appveyor (windows): { res: 2, os: 2 }
CircleCI (linux): { res: 32, os: 32 }

So I'm back at wanting to reach out to CI providers and ask them to stick it in an environment variable so we can read it.

Played around with it, and nproc works fine on travis, but on Circle in reports 32.

So I'm back at wanting to reach out to CI providers and ask them to stick it in an environment variable so we can read it.

Seems like CircleCI is a serial offender on this issue:

Java, Erlang and any other languages that introspect the /proc directory for information about CPU count may require additional configuration to prevent them from slowing down when using the CircleCI 2.0 resource class feature. Programs with this issue may request 32 CPU cores and run slower than they would when requesting one core. Users of languages with this issue should pin their CPU count to their guaranteed CPU resources

From those issues, it appears that their correct course of behaviour would not be to add an env var, but to make sure nproc returns the correct number.


I think for now, we should start using nproc as it even makes sense for local usage.
It's up to CircleCI as an outlier to fix their infrastructure.

nproc only works on linux, and having to spawn up a process sucks. It would be better if it was passed as config, IMO.

(although I agree nproc shouldn't lie 馃檪)

nproc only works on linux, and having to spawn up a process sucks.

You could create a node C library to call the sched_getaffinity syscall.

t would be better if it was passed as config, IMO.

It can! nproc is the sensible default: but if you provide your own value, we have no reason to fetch the default.

same problem here, we added the --runInBand and we went from 17 mins to 4 mins 馃槰

Help landing a module which reports the actual cpus available would be awesome, then we can use that over os.cpus().length. In the meantime you can do -w=2 instead of -i (or call nproc to be safe)

I put together a repo testing it out: https://github.com/SimenB/available-cpu Help very much welcome!

I tried using http://npmjs.com/physical-cpu-count but that didn't help. Other ideas? The repo I linked above has CI set up, so feel free to open PRs to test ideas

@SimenB a simple way to figure out how is to work backwards from the syscalls.
For linux you need something that calls the sched_getaffinity syscall: what do you have at your disposal that can do that? (simple answer: nproc). As a syscall its not going to be directly callable from node, and would need to be built in or be shipped as a native extension. I grepped the node and libuv sources and there is nothing in there that mentions sched_getaffinity.
So your answers are: shell out to nproc or depend on a node C library.

nproc doesn't work on circle as mentioned, that's what it's using now (https://github.com/SimenB/available-cpu/blob/3a18d76a9984ba086805ab02af1db1c006d5c123/index.js#L8-L10). It reports 32, same as oc.cpus().length

nproc doesn't work on circle

Okay? I'm not sure what I didn't cover in my response here: https://github.com/facebook/jest/issues/5989#issuecomment-392996021

I say we just use nproc. We could add an override for circle (detected via env vars). But otherwise I say it's up to circle ci to fix their own infrastructure.

I can shoot them an email asking about why nproc behaves weirdly.

EDIT: https://circleci.com/ideas/?idea=CCI-I-578

Is it due to running in a docker container? We've had similar issues to #5239 on https://buildkite.com running in node:10 docker container

Yeah - when we ask for number of CPUs we get the underlying machine instead of the amount allocated to the vm/container. And there's no good way of getting the correct count

I seem to always get the number of CPUs assigned to docker when running nproc, not the number of CPUs my machine has, even when setting --ulimit nproc=1. (macOS host)

$ docker run --rm node:10-alpine nproc
2
$ docker run --rm --ulimit nproc=1 node:10-alpine nproc
2

Is that expected?

I don't think it's expected - it's a bit beyond what I know of ulimit, though. I (still) think the best would be if CI environments injected the number of available cores - inspecting it from the inside is not really predictable

It would nice if this were in the jest documentation as a behavior of CircleCI. I was getting the Error: spawn ENOMEM failures in CircleCI, but not locally, and the fix of running jest --maxWorkers=2 worked perfectly, but it look almost an hour for me to figure that out, and the suggestion to add that flag that is buried deep in another (closed) bug about the memory problems.

I get similar behavior on my laptop with a hyperthreaded CPU (Intel(R) Core(TM) i7-9750H). Running Jest with defaults completely locks up all available CPU and slows the system to a crawl when it spawns 15 processes. I really think Jest should be using something like the logic in https://www.npmjs.com/package/physical-cpu-count as the default, not os.cpus().length.

@ChrisCrewdson physical-cpu-count reports 16 cores on Travis CI, which is 14 too many. Of course os.cpus().length reports 32, so it would be an improvement over what we have today, but still wrong. Might be worth it though

@SimenB We have the same issue, but the advantage is we control our own build agents, so we have an environment variable with the correct number of CPUs.

Are you open to controlling the --max-workers option with an environment variable as well? This way we can set it and all our consumers will automatically get it.

Was this page helpful?
0 / 5 - 0 ratings