Selenium: Grid Node timeout value differs from value set on the hub

Created on 5 Nov 2016 · 19Comments · Source: SeleniumHQ/selenium

Meta -

OS:
Latest Docker Image
Selenium Version:
3.0.1
Browser:
Chrome
Browser Version:
53.0.2785.101-1

Expected Behavior -

timeout value on node matches the timeout value set for hub. So if hub timeout is set to 300000, I expect the node timeout to also be 300000.

Actual Behavior -

timeout value for each node on the grid intermittently shows 1800 even though I set the timeout on the hub to 300000.

Steps to reproduce -

On Grid console page on each node I see the following configuration (note the timeout value)
browserTimeout: 900000
debug: false
help: false
jettyMaxThreads: -1
port: 5555
role: node
timeout: 1800 _(should be 300000)_
cleanUpCycle: 5000
host: 659103bcc9f9
maxSession: 1
capabilities: Capabilities [{seleniumProtocol=WebDriver, browserName=chrome, maxInstances=1, platform=LINUX}]
hub: http://hub:4444/grid/register
id: http://659103bcc9f9:5555
hubHost: hub
hubPort: 4444
nodeStatusCheckTimeout: 5000
proxy: org.openqa.grid.selenium.proxy.DefaultRemoteProxy
register: true
registerCycle: 5000
remoteHost: http://659103bcc9f9:5555

On the same grid console page, clicking on view config link, I can see the hub has the correct values including value for timeout.
Config for the hub :
browserTimeout : 900000
debug : false
help : false
jettyMaxThreads : -1
port : 4444
role : hub
timeout : 300000
cleanUpCycle : 5000
host : 10.255.0.5
maxSession : 1
hubConfig : /opt/selenium/config.json
capabilityMatcher : org.openqa.grid.internal.utils.DefaultCapabilityMatcher
newSessionWaitTimeout : -1
throwOnCapabilityNotPresent : true
Config details :
hub launched with : -browserTimeout 900000 -debug false -help false -jettyMaxThreads -1 -port 4444 -role hub -timeout 300000 -cleanUpCycle 5000 -host 10.255.0.5 -maxSession 1 -hubConfig /opt/selenium/config.json -capabilityMatcher org.openqa.grid.internal.utils.DefaultCapabilityMatcher -newSessionWaitTimeout -1 -throwOnCapabilityNotPresent true
the final configuration comes from :
the default :
browserTimeout : 0
debug : false
help : false
jettyMaxThreads : -1
role : hub
timeout : 1800
cleanUpCycle : 5000
maxSession : 1
capabilityMatcher : org.openqa.grid.internal.utils.DefaultCapabilityMatcher
newSessionWaitTimeout : -1
throwOnCapabilityNotPresent : true

updated with params :
browserTimeout : 900000
debug : false
help : false
jettyMaxThreads : -1
role : hub
timeout : 300000
cleanUpCycle : 5000
maxSession : 1
capabilityMatcher : org.openqa.grid.internal.utils.DefaultCapabilityMatcher
newSessionWaitTimeout : -1
throwOnCapabilityNotPresent : true

C-server

Source

testphreak

Most helpful comment

Still happens with 3.6.0

sherlock1982 on 1 Nov 2017

👍4

All 19 comments

Here's an example of 2 nodes on the same grid having 2 different timeout values as shown in the screenshot below. The node on the right with 300000 ms timeout is what I set, but for some reason, some nodes in the grid show the default timeout which is 1800 ms.

screen shot 2016-11-05 at 7 45 48 pm

testphreak on 6 Nov 2016

Likely what's happening here is the node is failing to communicate with the hub, early in its registration process, to retrieve the timeout value from the hub. Do you see a WARN with Exception in the node logs and/or stdout for the node which ends up with the default value (1800)?

mach6 on 6 Nov 2016

code in question here --
https://github.com/SeleniumHQ/selenium/blob/master/java/server/src/org/openqa/grid/internal/utils/SelfRegisteringRemote.java#L75-L88
https://github.com/SeleniumHQ/selenium/blob/master/java/server/src/org/openqa/grid/internal/utils/SelfRegisteringRemote.java#L326-L340

mach6 on 6 Nov 2016

side note related to the code I linked above;

IMO, browserTimeout _should be_ node specific and the hub has no business overriding a browserTimeout value specified by a node. That said, I've been resistant to make a change until I get consensus on this.

mach6 on 6 Nov 2016

Here's the log for the node with the wrong timeout value during initialization. For some reason I see a connection refused warning as you suggested.

appending selenium options: -host 28327d3e373b
 18:28:59.112 INFO - Selenium build info: version: '3.0.1', revision: '1969d75'
 18:28:59.113 INFO - Launching a Selenium Grid node
 18:29:00.374 WARN - error getting the parameters from the hub. The node may end up with wrong timeouts.Connect to hub:4444 [hub/10.0.0.2] failed: Connection refused
 2016-11-05 18:29:00.388:INFO::main: Logging initialized @1583ms
 18:29:00.418 INFO - Driver provider org.openqa.selenium.ie.InternetExplorerDriver registration is skipped:
  registration capabilities Capabilities [{ensureCleanSession=true, browserName=internet explorer, version=, platform=WINDOWS}] does not match the current platform LINUX
 18:29:00.419 INFO - Driver provider org.openqa.selenium.edge.EdgeDriver registration is skipped:
  registration capabilities Capabilities [{browserName=MicrosoftEdge, version=, platform=WINDOWS}] does not match the current platform LINUX
 18:29:00.419 INFO - Driver class not found: com.opera.core.systems.OperaDriver
 18:29:00.419 INFO - Driver provider com.opera.core.systems.OperaDriver registration is skipped:
 Unable to create new instances on this machine.
 18:29:00.419 INFO - Driver class not found: com.opera.core.systems.OperaDriver
 18:29:00.419 INFO - Driver provider com.opera.core.systems.OperaDriver is not registered
 18:29:00.420 INFO - Driver provider org.openqa.selenium.safari.SafariDriver registration is skipped:
  registration capabilities Capabilities [{browserName=safari, version=, platform=MAC}] does not match the current platform LINUX
 2016-11-05 18:29:00.449:INFO:osjs.Server:main: jetty-9.2.15.v20160210
 2016-11-05 18:29:00.470:INFO:osjsh.ContextHandler:main: Started o.s.j.s.ServletContextHandler@3cc2931c{/,null,AVAILABLE}
 2016-11-05 18:29:00.482:INFO:osjs.ServerConnector:main: Started ServerConnector@ef9296d{HTTP/1.1}{0.0.0.0:5555}
 2016-11-05 18:29:00.482:INFO:osjs.Server:main: Started @1677ms
 18:29:00.482 INFO - Selenium Grid node is up and ready to register to the hub
 18:29:00.491 INFO - Starting auto registration thread. Will try to register every 5000 ms.
 18:29:00.491 INFO - Registering the node to the hub: http://hub:4444/grid/register
 18:29:00.497 INFO - Couldn't register this node: Error sending the registration request: Connect to hub:4444 [hub/10.0.0.2] failed: Connection refused
 18:29:05.512 INFO - Registering the node to the hub: http://hub:4444/grid/register
 18:29:05.520 INFO - The node is registered to the hub and ready to use
 18:29:15.634 INFO - SessionCleaner initialized with insideBrowserTimeout 0 and clientGoneTimeout 1800000 polling every 180000

Messages of interest -
First time with a warning -

2016-11-06T01:29:00.374758143Z  18:29:00.374 WARN - error getting the parameters from the hub. The node may end up with wrong timeouts.Connect to hub:4444 [hub/10.0.0.2] failed: Connection refused

Then again a few ms later, a connection refused, but eventually registers with the default timeout instead of the timeout I set.

2016-11-06T01:29:00.497440999Z  18:29:00.497 INFO - Couldn't register this node: Error sending the registration request: Connect to hub:4444 [hub/10.0.0.2] failed: Connection refused

testphreak on 6 Nov 2016

I think I've been able to get around the issue. Now as soon as I start the hub, I check if /grid/console is reachable. Only when I get a 200 back from the request, I start adding nodes.

Perhaps it's a timing issue with the hub not able to take traffic right after it's started and needs a few seconds before nodes can start connecting to it.

testphreak on 6 Nov 2016

Unrelated, but would be nice if there was an api available that returns the number of nodes registered in the grid and other useful grid stats besides just the console page.

testphreak on 7 Nov 2016

@testphreak You can write your own servlet and add it to the list of servlets in your GridHubConfiguration. In your servlet you can call the API's getRegistry().getAllProxies() and getRegistry().getDesiredCapabilities() to get additional information. I do this to generate a JSON response that is then consumed by a few web pages to display things like number of registered nodes to the hub, nodes currently in use (including browser instance), nodes available, requests in the hubs queue, etc.

schmidtkp on 7 Nov 2016

@testphreak @schmidtkp a custom servlet will definitely get the job done. That said, an api for this information is a request that comes up often. I'm hoping to get one into the main code in the near future.

mach6 on 7 Nov 2016

Thank you @schmidtkp, I didn't know we could add our our own servlet.

@mach6, having apis we could call to get grid stats for hub and nodes would be awesome! There is a /grid/api/hub call, but it doesn't provide all the information.

For now, I have gone ahead with an HTML parser, albeit simple solution, that extracts the information I need (number of nodes) from the console page.

testphreak on 7 Nov 2016

Still happens with 3.6.0

sherlock1982 on 1 Nov 2017

👍4

Experienced this on Grid 3.12, too. Had to set the "timeout" value in the node's JSON to make it take effect.

wbh1 on 15 Jun 2018

@wbh1 I am currently experiencing this in node 3.12 as well. Can you please elaborate on what you did exactly in order to make this stick?

mikimichaeli on 1 Jul 2018

@wbh1 @mikimichaeli Can you please upgrade to 3.13 and share a node log, it contains more information about configuration now that can help us to pinpoint the issue.

barancev on 1 Jul 2018

@barancev after some research I realized tgat my problem not related to selenium itself but to network errors.

mikimichaeli on 12 Jul 2018

After checking the code, I can see that the Node is actually fetching the Hub configuration and updating its own configuration, but only after registering.
The hub is only aware of the node configuration reported at the moment of registration, and since the configuration is updated after registering, the hub does not have the latest configuration. And whatever the hub has, is what is shown in the Grid Console.

This is only a problem when the configuration values are shown in the Grid Console (where is actually what people check). Nevertheless, the values used on the node side to check timeout and browserTimeoutare the correct ones. If you run the node in debug node, you'll see that the configuration is correct.

I've just changed the order to fetch the hub configuration first, update it locally and then send the registration request, therefore the hub will have always the latest configuration (which was the original behaviour), and it will be properly shown in the Grid Console. It should be available in the next release.

diemol on 2 Sep 2018

@diemol are you sure node applies the config correctly?

I'm using docker-compose.yml to deploy to Docker Swarm:

version: '3.7'

services:
  hub:
    image: selenium/hub:3.14.0-krypton
    environment:
      GRID_BROWSER_TIMEOUT: 180
      GRID_TIMEOUT: 300
      GRID_DEBUG: "true"
      JAVA_OPTS: -Xmx512m
    ports:
      - "4444:4444"
    deploy:
      replicas: 1    
      placement:
        constraints:
          - node.role == manager

  chrome:
    image: selenium/node-chrome-debug:3.14.0-krypton
    environment:
      HUB_HOST: hub
      HUB_PORT: 4444
      GRID_DEBUG: "true"
      JAVA_OPTS: -Xmx512m
    ports:
      - "5900:5900"
    volumes:
      - /dev/shm:/dev/shm
    depends_on: 
      - hub
    deploy:
        replicas: 2
        placement:
          constraints:
            - node.role == worker
    entrypoint: bash -c 'SE_OPTS="-host $$HOSTNAME" /opt/bin/entry_point.sh'

Debug logs from the node claim that it fetched the config from the hub and applied it, but the node still behaves like browserTimeout is set to 0 (which is the default). It never times out on browser crashes.

It times out on browser crash only when I change my node entry point to:
entrypoint: bash -c 'SE_OPTS="-timeout 300 -browserTimeout 180 -host $$HOSTNAME" /opt/bin/entry_point.sh'
Which would mean default config is not substituted with the one from hub.

pawelus on 14 Nov 2018

@pawelus yes, I am sure. But to be more specific, yes, the node fetches the configuration and sets it properly.

Nevertheless, browserTimeout was broken for a while, and it got fixed in the 3.141.0 release. So to be safe, please use the latest release 3.141.59

diemol on 18 Nov 2018

So even if the configuration was correct, the browserTimeout parameter was not being used properly.

diemol on 18 Nov 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

FluentWait changes WebDriverWait().until(ExpectedCondition...) fails with java.lang.NoSuchMethodError

NikolayStanoev · 34Comments

JWP actions endpoints do not get translated to W3C by standalone

puneetkaur11 · 65Comments

Error forwarding the new session Error forwarding the request Read timed out

etwxabh · 31Comments

IE throws "Command line server for the IE driver has stopped working" error on Click

BrandonDudek · 33Comments

Selenium Actions drag and drop not working on chrome browser

abh7nav · 39Comments