Packer: sshd_config settings can disconnect packer and not be handled correctly

Created on 7 Apr 2017 · 8Comments · Source: hashicorp/packer

Having ClientAliveCountMax and/or ClientAliveInterval set in /etc/ssh/sshd_config can silently disconnect packer during long running tasks.

In the example listed in the Gist below I am simply running sleep 600 with the ClientAliveInterval set to 300 and the ClientAliveCountMax set to 0.

You can see in the output that after exactly 5 minutes packer skips to the next script without any errors.

A temporary fix is to sed out the values in ClientAlive* to much higher values and then reboot(It seems packer kept the session open during service restart and I didn't know how to force packer to reconnect to get the new timeout values)

Not sure if this is related to #3920, but it seems like it is

Packer version from packer version

$> ./packer version
Packer v0.12.3

Your version of Packer is out of date! The latest version
is 1.0.0. You can update by downloading from www.packer.io

Host platform

$> cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)

Gist of all the goodies needed to reproduce
The only things in there that may be troublesome to reproduce will be the base image, but you should be able to use a centos base image with the sshd_config settings set as I listed above

https://gist.github.com/necrolyte2/6d1909724cc5af35e8e4d5aebdba3ac8

communicatossh enhancement

Source

necrolyte2

👍1

Most helpful comment

Just run into this as well, didn't realise the go ssh implementation didn't send keepalives out the box - has to be implemented on top of the package it seems.

It would be useful to have keepalives implemented so long running commands don't timeout if the host machine does have ClientAliveCountMax and/or ClientAliveInterval set.

Also :+1: to adding trouble shooting documentation. Took me a while longer than I'd like to track this down, given no network interruption between me and the instance packer is operating on I'd not thought to check the ssh keepalives! 🙃

caius on 2 Jun 2017

👍2

All 8 comments

Tested with v1.0.0 and had the same result

necrolyte2 on 7 Apr 2017

This might be the case that we don't send any keep alives at all and thus we get disconnected if noting is sent during the period you have in your ClientAliveCountMax * ClientAliveInterval.

rickard-von-essen on 7 Apr 2017

That sounds about right. So would that be a thing that should be fixed in packer or just left to the user to have to change the sshd config and then restart service/host?

necrolyte2 on 7 Apr 2017

If we don't seen keep alives we should probably add that, if we do we could at least add some trouble shooting advice to the docs.

rickard-von-essen on 7 Apr 2017

That would be great

necrolyte2 on 7 Apr 2017

Just run into this as well, didn't realise the go ssh implementation didn't send keepalives out the box - has to be implemented on top of the package it seems.

It would be useful to have keepalives implemented so long running commands don't timeout if the host machine does have ClientAliveCountMax and/or ClientAliveInterval set.

caius on 2 Jun 2017

👍2

I believe this is solved in #5830, but I would appreciate your help testing if you're able to build that branch.

mwhooker on 31 Jan 2018

👍1

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.