OS:
Latest Docker Image
Selenium Version:
3.0.1
Browser:
Chrome
Browser Version:
53.0.2785.101-1
timeout value on node matches the timeout value set for hub. So if hub timeout is set to 300000, I expect the node timeout to also be 300000.
timeout value for each node on the grid intermittently shows 1800 even though I set the timeout on the hub to 300000.
On Grid console page on each node I see the following configuration (note the timeout value)
browserTimeout: 900000
debug: false
help: false
jettyMaxThreads: -1
port: 5555
role: node
timeout: 1800 _(should be 300000)_
cleanUpCycle: 5000
host: 659103bcc9f9
maxSession: 1
capabilities: Capabilities [{seleniumProtocol=WebDriver, browserName=chrome, maxInstances=1, platform=LINUX}]
hub: http://hub:4444/grid/register
id: http://659103bcc9f9:5555
hubHost: hub
hubPort: 4444
nodeStatusCheckTimeout: 5000
proxy: org.openqa.grid.selenium.proxy.DefaultRemoteProxy
register: true
registerCycle: 5000
remoteHost: http://659103bcc9f9:5555
On the same grid console page, clicking on view config link, I can see the hub has the correct values including value for timeout.
Config for the hub :
browserTimeout : 900000
debug : false
help : false
jettyMaxThreads : -1
port : 4444
role : hub
timeout : 300000
cleanUpCycle : 5000
host : 10.255.0.5
maxSession : 1
hubConfig : /opt/selenium/config.json
capabilityMatcher : org.openqa.grid.internal.utils.DefaultCapabilityMatcher
newSessionWaitTimeout : -1
throwOnCapabilityNotPresent : true
Config details :
hub launched with : -browserTimeout 900000 -debug false -help false -jettyMaxThreads -1 -port 4444 -role hub -timeout 300000 -cleanUpCycle 5000 -host 10.255.0.5 -maxSession 1 -hubConfig /opt/selenium/config.json -capabilityMatcher org.openqa.grid.internal.utils.DefaultCapabilityMatcher -newSessionWaitTimeout -1 -throwOnCapabilityNotPresent true
the final configuration comes from :
the default :
browserTimeout : 0
debug : false
help : false
jettyMaxThreads : -1
role : hub
timeout : 1800
cleanUpCycle : 5000
maxSession : 1
capabilityMatcher : org.openqa.grid.internal.utils.DefaultCapabilityMatcher
newSessionWaitTimeout : -1
throwOnCapabilityNotPresent : true
updated with params :
browserTimeout : 900000
debug : false
help : false
jettyMaxThreads : -1
role : hub
timeout : 300000
cleanUpCycle : 5000
maxSession : 1
capabilityMatcher : org.openqa.grid.internal.utils.DefaultCapabilityMatcher
newSessionWaitTimeout : -1
throwOnCapabilityNotPresent : true
Here's an example of 2 nodes on the same grid having 2 different timeout values as shown in the screenshot below. The node on the right with 300000 ms timeout is what I set, but for some reason, some nodes in the grid show the default timeout which is 1800 ms.

Likely what's happening here is the node is failing to communicate with the hub, early in its registration process, to retrieve the timeout value from the hub. Do you see a WARN with Exception in the node logs and/or stdout for the node which ends up with the default value (1800)?
side note related to the code I linked above;
browserTimeout _should be_ node specific and the hub has no business overriding a browserTimeout value specified by a node. That said, I've been resistant to make a change until I get consensus on this.Here's the log for the node with the wrong timeout value during initialization. For some reason I see a connection refused warning as you suggested.
appending selenium options: -host 28327d3e373b
18:28:59.112 INFO - Selenium build info: version: '3.0.1', revision: '1969d75'
18:28:59.113 INFO - Launching a Selenium Grid node
18:29:00.374 WARN - error getting the parameters from the hub. The node may end up with wrong timeouts.Connect to hub:4444 [hub/10.0.0.2] failed: Connection refused
2016-11-05 18:29:00.388:INFO::main: Logging initialized @1583ms
18:29:00.418 INFO - Driver provider org.openqa.selenium.ie.InternetExplorerDriver registration is skipped:
registration capabilities Capabilities [{ensureCleanSession=true, browserName=internet explorer, version=, platform=WINDOWS}] does not match the current platform LINUX
18:29:00.419 INFO - Driver provider org.openqa.selenium.edge.EdgeDriver registration is skipped:
registration capabilities Capabilities [{browserName=MicrosoftEdge, version=, platform=WINDOWS}] does not match the current platform LINUX
18:29:00.419 INFO - Driver class not found: com.opera.core.systems.OperaDriver
18:29:00.419 INFO - Driver provider com.opera.core.systems.OperaDriver registration is skipped:
Unable to create new instances on this machine.
18:29:00.419 INFO - Driver class not found: com.opera.core.systems.OperaDriver
18:29:00.419 INFO - Driver provider com.opera.core.systems.OperaDriver is not registered
18:29:00.420 INFO - Driver provider org.openqa.selenium.safari.SafariDriver registration is skipped:
registration capabilities Capabilities [{browserName=safari, version=, platform=MAC}] does not match the current platform LINUX
2016-11-05 18:29:00.449:INFO:osjs.Server:main: jetty-9.2.15.v20160210
2016-11-05 18:29:00.470:INFO:osjsh.ContextHandler:main: Started o.s.j.s.ServletContextHandler@3cc2931c{/,null,AVAILABLE}
2016-11-05 18:29:00.482:INFO:osjs.ServerConnector:main: Started ServerConnector@ef9296d{HTTP/1.1}{0.0.0.0:5555}
2016-11-05 18:29:00.482:INFO:osjs.Server:main: Started @1677ms
18:29:00.482 INFO - Selenium Grid node is up and ready to register to the hub
18:29:00.491 INFO - Starting auto registration thread. Will try to register every 5000 ms.
18:29:00.491 INFO - Registering the node to the hub: http://hub:4444/grid/register
18:29:00.497 INFO - Couldn't register this node: Error sending the registration request: Connect to hub:4444 [hub/10.0.0.2] failed: Connection refused
18:29:05.512 INFO - Registering the node to the hub: http://hub:4444/grid/register
18:29:05.520 INFO - The node is registered to the hub and ready to use
18:29:15.634 INFO - SessionCleaner initialized with insideBrowserTimeout 0 and clientGoneTimeout 1800000 polling every 180000
Messages of interest -
First time with a warning -
2016-11-06T01:29:00.374758143Z 18:29:00.374 WARN - error getting the parameters from the hub. The node may end up with wrong timeouts.Connect to hub:4444 [hub/10.0.0.2] failed: Connection refused
Then again a few ms later, a connection refused, but eventually registers with the default timeout instead of the timeout I set.
2016-11-06T01:29:00.497440999Z 18:29:00.497 INFO - Couldn't register this node: Error sending the registration request: Connect to hub:4444 [hub/10.0.0.2] failed: Connection refused
I think I've been able to get around the issue. Now as soon as I start the hub, I check if /grid/console is reachable. Only when I get a 200 back from the request, I start adding nodes.
Perhaps it's a timing issue with the hub not able to take traffic right after it's started and needs a few seconds before nodes can start connecting to it.
Unrelated, but would be nice if there was an api available that returns the number of nodes registered in the grid and other useful grid stats besides just the console page.
@testphreak You can write your own servlet and add it to the list of servlets in your GridHubConfiguration. In your servlet you can call the API's getRegistry().getAllProxies() and getRegistry().getDesiredCapabilities() to get additional information. I do this to generate a JSON response that is then consumed by a few web pages to display things like number of registered nodes to the hub, nodes currently in use (including browser instance), nodes available, requests in the hubs queue, etc.
@testphreak @schmidtkp a custom servlet will definitely get the job done. That said, an api for this information is a request that comes up often. I'm hoping to get one into the main code in the near future.
Thank you @schmidtkp, I didn't know we could add our our own servlet.
@mach6, having apis we could call to get grid stats for hub and nodes would be awesome! There is a /grid/api/hub call, but it doesn't provide all the information.
For now, I have gone ahead with an HTML parser, albeit simple solution, that extracts the information I need (number of nodes) from the console page.
Still happens with 3.6.0
Experienced this on Grid 3.12, too. Had to set the "timeout" value in the node's JSON to make it take effect.
@wbh1 I am currently experiencing this in node 3.12 as well. Can you please elaborate on what you did exactly in order to make this stick?
@wbh1 @mikimichaeli Can you please upgrade to 3.13 and share a node log, it contains more information about configuration now that can help us to pinpoint the issue.
@barancev after some research I realized tgat my problem not related to selenium itself but to network errors.
After checking the code, I can see that the Node is actually fetching the Hub configuration and updating its own configuration, but only after registering.
The hub is only aware of the node configuration reported at the moment of registration, and since the configuration is updated after registering, the hub does not have the latest configuration. And whatever the hub has, is what is shown in the Grid Console.
This is only a problem when the configuration values are shown in the Grid Console (where is actually what people check). Nevertheless, the values used on the node side to check timeout and browserTimeoutare the correct ones. If you run the node in debug node, you'll see that the configuration is correct.
I've just changed the order to fetch the hub configuration first, update it locally and then send the registration request, therefore the hub will have always the latest configuration (which was the original behaviour), and it will be properly shown in the Grid Console. It should be available in the next release.
@diemol are you sure node applies the config correctly?
I'm using docker-compose.yml to deploy to Docker Swarm:
version: '3.7'
services:
hub:
image: selenium/hub:3.14.0-krypton
environment:
GRID_BROWSER_TIMEOUT: 180
GRID_TIMEOUT: 300
GRID_DEBUG: "true"
JAVA_OPTS: -Xmx512m
ports:
- "4444:4444"
deploy:
replicas: 1
placement:
constraints:
- node.role == manager
chrome:
image: selenium/node-chrome-debug:3.14.0-krypton
environment:
HUB_HOST: hub
HUB_PORT: 4444
GRID_DEBUG: "true"
JAVA_OPTS: -Xmx512m
ports:
- "5900:5900"
volumes:
- /dev/shm:/dev/shm
depends_on:
- hub
deploy:
replicas: 2
placement:
constraints:
- node.role == worker
entrypoint: bash -c 'SE_OPTS="-host $$HOSTNAME" /opt/bin/entry_point.sh'
Debug logs from the node claim that it fetched the config from the hub and applied it, but the node still behaves like browserTimeout is set to 0 (which is the default). It never times out on browser crashes.
It times out on browser crash only when I change my node entry point to:
entrypoint: bash -c 'SE_OPTS="-timeout 300 -browserTimeout 180 -host $$HOSTNAME" /opt/bin/entry_point.sh'
Which would mean default config is not substituted with the one from hub.
@pawelus yes, I am sure. But to be more specific, yes, the node fetches the configuration and sets it properly.
Nevertheless, browserTimeout was broken for a while, and it got fixed in the 3.141.0 release. So to be safe, please use the latest release 3.141.59
So even if the configuration was correct, the browserTimeout parameter was not being used properly.
Most helpful comment
Still happens with 3.6.0