Having ClientAliveCountMax
and/or ClientAliveInterval
set in /etc/ssh/sshd_config
can silently disconnect packer during long running tasks.
In the example listed in the Gist below I am simply running sleep 600
with the ClientAliveInterval
set to 300
and the ClientAliveCountMax
set to 0
.
You can see in the output that after exactly 5 minutes packer skips to the next script without any errors.
A temporary fix is to sed out the values in ClientAlive* to much higher values and then reboot(It seems packer kept the session open during service restart and I didn't know how to force packer to reconnect to get the new timeout values)
Not sure if this is related to #3920, but it seems like it is
Packer version from packer version
$> ./packer version
Packer v0.12.3
Your version of Packer is out of date! The latest version
is 1.0.0. You can update by downloading from www.packer.io
Host platform
$> cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
Gist of all the goodies needed to reproduce
The only things in there that may be troublesome to reproduce will be the base image, but you should be able to use a centos base image with the sshd_config settings set as I listed above
https://gist.github.com/necrolyte2/6d1909724cc5af35e8e4d5aebdba3ac8
Tested with v1.0.0 and had the same result
This might be the case that we don't send any keep alives at all and thus we get disconnected if noting is sent during the period you have in your ClientAliveCountMax
* ClientAliveInterval
.
That sounds about right. So would that be a thing that should be fixed in packer or just left to the user to have to change the sshd config and then restart service/host?
If we don't seen keep alives we should probably add that, if we do we could at least add some trouble shooting advice to the docs.
That would be great
Just run into this as well, didn't realise the go ssh implementation didn't send keepalives out the box - has to be implemented on top of the package it seems.
It would be useful to have keepalives implemented so long running commands don't timeout if the host machine does have ClientAliveCountMax
and/or ClientAliveInterval
set.
Also :+1: to adding trouble shooting documentation. Took me a while longer than I'd like to track this down, given no network interruption between me and the instance packer is operating on I'd not thought to check the ssh keepalives! 🙃
I believe this is solved in #5830, but I would appreciate your help testing if you're able to build that branch.
I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
Just run into this as well, didn't realise the go ssh implementation didn't send keepalives out the box - has to be implemented on top of the package it seems.
It would be useful to have keepalives implemented so long running commands don't timeout if the host machine does have
ClientAliveCountMax
and/orClientAliveInterval
set.Also :+1: to adding trouble shooting documentation. Took me a while longer than I'd like to track this down, given no network interruption between me and the instance packer is operating on I'd not thought to check the ssh keepalives! 🙃