Cypress: Cypress sometimes stalls/hangs with no output when running in Jenkins with Docker

Created on 6 Aug 2020 · 22Comments · Source: cypress-io/cypress

Current behavior:

I feel like this kind of issue has been referenced many times before, but even after trying all the solutions I've found, we still continue to run into this problem. We've been seeing this issue for many months now, with early versions of Cypress 4.x and even with the most recent version today.

Long-running (multi-hour) test suites often "stall": the output stops appearing, but nothing actually fails or crashes. We actually have to rely on the timeout mechanism in Jenkins to kill the task because Cypress is just stuck.

I have enabled debug logs and honestly I don't see anything helpful in them. I'll upload the relevant portion next, but there are no mentions of failures, crashes, or even being out of memory. We're already following all the guidance in the Cypress docs about disabling /dev/shm and using the correct IPC setting.

Desired behavior:

Tests should either run to completion or Cypress should fail with a clear error.

Versions

Cypress: 4.12.1
OS: Linux
Browser: Chrome 83 (headless)

jenkins needs information

Source

kaiyoma

👍16

Most helpful comment

We have the same issue and we are using Cypress 5.2.0 as it seems to bring in several performance fixes. We are now using Cypress since the 2.X days and I feel that this is getting worse over time.

mayrbenjamin92 on 18 Sep 2020

👍5

All 22 comments

Here's an abridged version of the full Cypress debug log: cypress-debug-abridged.log

In our Jenkins output, this is the last message I see:

06:43:15        ✓ should render a read & write view for Image Management (3909ms)

You can see this mocha test at the top of the logfile. But after that, no more mocha tests are triggered for some reason. Nothing runs out of memory, nothing crashes, nothing fails, nothing exits. Everything just stops responding and Cypress seems to be waiting for an event that'll never happen.

I can furnish the full debug logfile (13.2 MB) upon request.

kaiyoma on 6 Aug 2020

We are facing similar issue, Cypress just hangs indefinitely during UI test runs, no error, no logs. The only option is to close and run the tests again. We are running them locally, have not integrated with CI yet.

AnkitPhalnikerTR on 7 Aug 2020

Also seeing this, about 1 every 50 test runs, Cypress will hang, occasionally for 12 hours(!) if no one notices

cellog on 17 Aug 2020

The debug logs don't indicate anything that we can go off of. We'll need a way to reproduce this in order to investigate.

jennifer-shehane on 18 Aug 2020

I'm also experiencing the same issue. It seems that it's a recent issue. At least I was able to experience it with Cypress 4.11 and also 4.12.X. I even had to write down a script that cancels a test if it's running for more than 30 min and trigger it again. In our particular case we have been running 3 instances of Cypress in parallel in the same machine and it's been working for months. But now, as someone said, from time to time one of the tests hangs indefinitely, with no output indicating if there was a failure or something. I will try to gather some info of what's going on and post it here.
About our environment, we are not running Jenkins nor Docker. It's just a linux machine running tests against an external website.

mblasco on 18 Aug 2020

👍2

The debug logs don't indicate anything that we can go off of. We'll need a way to reproduce this in order to investigate.

Run multi-hour test suites in a Jenkins/Docker environment until you start to see Cypress hanging. 😄

On a serious note, it sounds like several people are seeing this problem, and we all have access to such environments, so maybe you'll have to just leverage that rather than trying to repro yourself. Is there any other logging or debugging instrumentation we should turn on that would be helpful?

kaiyoma on 18 Aug 2020

I've also experienced this.
I don't have a long-running/multi-hour test suite though, I only have 2 test files so far.
The tests have always passed locally. But when running inside a docker container, tests hanged most of the times, without any feedback, and I had to manually stop the Jenkins pipeline.

After a few days of trying things out, making these 3 changes helped me.

Not using Chrome. Before I was using --browser chrome. When I removed it, I saw a helpful output, at the same stage where the tests used to hang.

Screenshot 2020-08-18 at 19 47 35

I could see the error was related to a memory leak. So I added ipc and shm_size to the docker-compose file, as everyone was advising. (Had done it before, but seemed to have worked only after not using Chrome.)

version: '3.5'
services:
  app-build:
    ipc: host
    shm_size: '2gb'

I'm using fixtures, and I trimmed them to use smaller .json files.

Please note I'm just sharing what seemed to have worked for me, but I don't consider this a solution. It sounds like an issue that should be looked into.

anaisamp on 18 Aug 2020

Interesting. To the best of my knowledge, we've seen the opposite trend: using Electron was worse and switching to Chrome has slightly improved things. But we still run into this hanging problem with both browsers. (This would make sense if the problem is actually with mocha, which is kind of what the logs would suggest.)

kaiyoma on 19 Aug 2020

FWIW, we were seeing this nearly every test suite execution (>90%) in our CI runner (AWS CodeBuild with parallelization, using Chrome), with no issues running locally. Today we upgraded to v5.0.0 and the hanging/stalling has seemingly stopped

bradyolson on 20 Aug 2020

🎉1

@bradyolson Good to know! We're attempting to upgrade Cypress this week, so maybe in a few days I can report back with our results.

kaiyoma on 20 Aug 2020

I have an update on this:

I noticed that Cypress tests were downloading a few static assets in a continuous stream. The site logo, a font, etc. They would be downloaded hundreds of times in a single test, and in some cases completely overwhelm the browser, causing the test to fail. This happened locally too.

I downgraded to version 4.9.0 from 4.12.0 and the issue went away. I hope this gives some context to help.

For some other context: this happened in end-to-end mode and in mock mode (we run our tests in both modes) AND in a new mode using mock service worker, but NOT when running the app in development or in production through a regular Chrome browser.

cellog on 20 Aug 2020

👍1

We finally got Cypress 5 to run and hit the same exact problem. No improvement.

kaiyoma on 24 Aug 2020

I spent the afternoon trying to run our tests with Firefox (which I realize is in "beta") and noticed that this problem seems a lot worse with Firefox. It's the same symptoms as before: console output stops appearing, mocha events appear to stop coming in, but the debug log keeps chugging along with memory and CPU updates.

When trying to run our usual battery of ~50 test suites, we couldn't get further than 5 suites, sometimes failing at even the first one. With Chrome, we could always get at least 25 or 30 suites in before the stall happens.

kaiyoma on 27 Aug 2020

I've tried a new approach/workaround in our project where instead of passing all our test suites to Cypress all at once, I pass the test suites one at a time.

Before:

cypress run --browser chrome --headless --spec foo.spec.js,bar.spec.js,baz.spec.js

After:

cypress run --browser chrome --headless --spec foo.spec.js
cypress run --browser chrome --headless --spec bar.spec.js
cypress run --browser chrome --headless --spec baz.spec.js

This seems to have helped a little bit, though last night we hit an error even with this approach. About 2 hours into the testing, a test suite stalled after running for only a couple minutes and Jenkins had to kill the task a couple hours later.

kaiyoma on 28 Aug 2020

We encountered this failure again last night with the same symptoms. Now that we're running our test suites one at a time, the logs are smaller and easier to digest. Still seems like Cypress is at fault here.

In the Jenkins output, we can see that the test suite stalled after executing for only a minute:

03:20:20    (Run Starting)
03:20:20  
03:20:20    ┌────────────────────────────────────────────────────────────────────────────────────────────────┐
03:20:20    │ Cypress:    5.0.0                                                                              │
03:20:20    │ Browser:    Chrome 84 (headless)                                                               │
03:20:20    │ Specs:      1 found (topology/overlay.spec.js)                                                 │
03:20:20    │ Searched:   cypress/integration/topology/overlay.spec.js                                       │
03:20:20    └────────────────────────────────────────────────────────────────────────────────────────────────┘
03:20:20  
03:20:20  
03:20:20  ────────────────────────────────────────────────────────────────────────────────────────────────────
03:20:20                                                                                                      
03:20:20    Running:  topology/overlay.spec.js                                                        (1 of 1)
03:20:20    Adding --disable-dev-shm-usage...
03:20:20    Adding --disable-gpu...
03:20:20    Adding --window-size=1920,1080...
03:20:38  
03:20:38  
03:20:38    Topology overlay tests
03:21:24      ✓ should navigate to Topology application (40059ms)
03:21:24      ✓ should have the right options in the select (352ms)
04:01:08  Cancelling nested steps due to timeout
04:01:08  Sending interrupt signal to process
04:01:20  Terminated

In the full debug logfile, I can see that this is the last event we get from mocha:

2020-09-03T10:21:18.281Z cypress:server:reporter got mocha event 'hook end' with args: [ { id: 'r5', title: '"before each" hook', hookName: 'before each', hookId: 'h2', body: ...

There are no more mocha events and the test never starts execution. This is also verified by the video file that Cypress writes out. I can see in the video that the sidebar never "expands" for the doomed test. Instead of seeing the list of get and assert calls, the spinner next to the test simply spins forever.

I can furnish the Jenkins log, Cypress debug log, and Cypress video upon request (in a DM or some other private channel).

kaiyoma on 3 Sep 2020

We have the same issue and we are using Cypress 5.2.0 as it seems to bring in several performance fixes. We are now using Cypress since the 2.X days and I feel that this is getting worse over time.

mayrbenjamin92 on 18 Sep 2020

👍5

Any update on this issue?

bahunov on 23 Sep 2020

Any update on this issue?

+1. Also waiting for updates. We continue to run into this issue often, even with the latest version (5.2.0). I've offered to furnish full logs, but no one's taken me up on it. Kind of feels like the Cypress folks are hoping this issue will just magically disappear.

kaiyoma on 24 Sep 2020

We are facing the same issue. Tried to update to 5.2.0 and all kind of workarounds but this is still happening.

danielcaballero on 25 Sep 2020

Similar issues here. There has been a runner change as well as upgrade to Cypress 5.2.0 . We have parallelization as well, and we are using the cypress included docker image.

Firefox tests hang indefinitely. With Electron, we can get the same crashes, but rarely the tests can work.

The trends seems to be tests with ui-login. For the time being we disabled them in CI.

This might help someone.

const ignoreOnEnvironment = (environment: 'linux' | 'darwin' | 'win32', fn) => {
  if (Cypress.platform !== environment) {
    fn();
  }
};

Then, in the test wrap any describe/context/it block

ignoreOnEnvironment('linux', () => {
  describe(....)
});

muratkeremozcan on 25 Sep 2020

If you are running Chrome or Chromium it can be more useful to use --disable-dev-shm-usage as a flag rather than trying to set the --shm-size for Docker or Docker-Compose, especially when running in Docker-in-Docker on Jenkins or other CI providers.

https://developers.google.com/web/tools/puppeteer/troubleshooting#tips

https://github.com/cypress-io/cypress/issues/5336#issue-505290346

dragon788 on 19 Oct 2020

👍3

If you are running Chrome or Chromium it can be more useful to use --disable-dev-shm-usage as a flag rather than trying to set the --shm-size for Docker or Docker-Compose, especially when running in Docker-in-Docker on Jenkins or other CI providers.

https://developers.google.com/web/tools/puppeteer/troubleshooting#tips

#5336 (comment)

This helped. Thank you!

The info resource can also be found that at Cypress docs. If your tests are hanging in the CI, take a look.

There are plenty of questions about configuring the plugins/index.js file. Here is a complex example, not even using the new --config-file pattern, instead using getConfigurationByFile pattern.

const fs = require('fs-extra')
const path = require('path')
// tasks go here
const percyHealthCheck = require('@percy/cypress/task')
const mailosaurTasks = require('./mailosaur-tasks')
// merge all tasks
const all = Object.assign({},
  percyHealthCheck,
  mailosaurTasks
)

function getConfigurationByFile(file) {
  const pathToConfigFile = path.resolve('cypress/config', `${file}.json`)

  return fs.readJson(pathToConfigFile)
}

module.exports = (on, config) => {
  on("task", all);
  // needed to address issues related to tests hanging in the CI
  on('before:browser:launch', (browser = {}, launchOptions) => {
    if (browser.family === 'chromium' && browser.name !== 'electron') {
      launchOptions.args.push('--disable-dev-shm-usage')
    }
    return launchOptions
  });

  const file = config.env.configFile || 'dev'
  return getConfigurationByFile(file)
}

muratkeremozcan on 20 Oct 2020

👍4

Was this page helpful?

0 / 5 - 0 ratings