Packer version: 1.2.3
Host: Amazon EC2 Ubuntu 14
A similar issue used to happen:
https://github.com/hashicorp/packer/issues/3944
We have a Python process which kicks off a set of Packer builds (6) every day.
Some days, 5 out of 6 will work, other days, 4 or 3. It's hard to reproduce, subsequent runs of the same template will NOT cause the issue, until they randomly do again.
When it fails, the error is always this:
==> amazon-ebs: Connected to SSH!
==> amazon-ebs: Provisioning with shell script: /tmp/packer-shell284519433
==> amazon-ebs: Uploading compass.pex => /tmp/urbancompass/compass.pex
Build 'amazon-ebs' errored: unexpected EOF
==> Some builds didn't complete successfully and had errors:
--> amazon-ebs: unexpected EOF
And can happen almost anytime in the build:
amazon-ebs: [localhost] out: spark-2.2.0-bin-hadoop2.7/data/mllib/sample_multiclass_classification_data.txt
amazon-ebs: [localhost] out: spark-2.2.0-bin-hadoop2.7/data/mllib/sample_linear_regression_data.txt
Build 'amazon-ebs' errored: unexpected EOF
==> Some builds didn't complete successfully and had errors:
--> amazon-ebs: unexpected EOF
I will modify the process to run with PACKER_LOG=1 always such that we can trap the issue but I wanted to open this and get the conversation started.
For example, I just ran the first failing build again (12 hours later) and it worked perfectly.
This will definitely be much easier to troubleshoot once we have logs, and ideally the simplest possible packer template you can use to reproduce.
Ok, so I was able to get the error to happen (many times, but finally one that I was able to isolate).
There are 3 things worth mentioning:
1) The log is 5500 lines long and contains some items I'd prefer only to share with limited people. Perhaps you have another way I could send you this log?
2) the packer template itself is fairly simple (also, I'd like to send that separately) - just a single builder of type amazon-ebs and 3 shell provisioners and a file provisioner (creating and resizing an ebs volume, copying over a file, and finally, running a versioned install suite in our codebase)
3) that last bit - the install suite in our codebase - is what makes it hard - I can share that we install the following: with_java=True,
with_launcher=True,
with_fail2ban=True,
with_base=False,
with_memcached=True,
with_nginx=False,
with_nodejs=True,
with_thrift=False,
with_users=False,
with_oauth2_proxy=False,
with_spark=True,
with_test_pool=True,
mongodb_installation_type='standalone-ci'
And also that it appears to fail as Spark is installing rather consistently.
Feel free to email them to me at [email protected] if you're uncomfortable sharing in a gist, but you're limiting the people who can work on this to me which means you're basically guaranteeing it'll take a while to get solved. If you absolutely cannot share the full debug logs at please at least give me a good chunk (10-20 lines on either side) around the failure as well as the chunk at the end containing the stack trace. Hopefully that'll make it easier to redact anything you're worried about. Basically I need to see the stack trace, where the error occurs, and enough info to orient myself.
I believe I am experiencing the same issue.
Packer: v1.2.4
Host: Amazon Linux v1 (latest)
Sometime the build fails. The point at which is fails seems completely random.
I ran this build 3 times in a row, they all failed. I then restarted the server and ran the build again, which worked. I have had a rebuild work without a restart before so I don't think a restart is required.
Also, it does not unmount the volume.
Logs:
2018/06/28 04:35:32 packer: 2018/06/28 04:35:29 [ERR] yamux: keepalive failed: i/o deadline reached
2018/06/28 04:35:32 packer: 2018/06/28 04:35:32 [INFO] 37525 bytes written for 'stdout'
2018/06/28 04:35:32 packer: 2018/06/28 04:35:32 [INFO] 6326 bytes written for 'stderr'
2018/06/28 04:35:32 packer: 2018/06/28 04:35:29 [ERR] yamux: keepalive failed: i/o deadline reached
2018/06/28 04:35:32 packer: 2018/06/28 04:35:32 waiting for all plugin processes to complete...
2018/06/28 04:35:32 packer: 2018/06/28 04:35:29 [ERR] yamux: keepalive failed: i/o deadline reached
2018/06/28 04:35:32 packer: 2018/06/28 04:35:32 waiting for all plugin processes to complete...
2018/06/28 04:35:32 packer: 2018/06/28 04:35:29 [ERR] yamux: keepalive failed: i/o deadline reached
2018/06/28 04:35:32 packer: 2018/06/28 04:35:32 waiting for all plugin processes to complete...
2018/06/28 04:35:33 packer: 2018/06/28 04:35:29 [ERR] yamux: keepalive failed: i/o deadline reached
2018/06/28 04:35:33 packer: 2018/06/28 04:35:32 waiting for all plugin processes to complete...
2018/06/28 04:35:33 /home/ec2-user/packer: plugin process exited
2018/06/28 04:35:33 /home/ec2-user/packer: plugin process exited
2018/06/28 04:35:33 [WARN] yamux: failed to send ping reply: session shutdown
2018/06/28 04:35:33 [WARN] yamux: failed to send ping reply: session shutdown
2018/06/28 04:35:33 packer: 2018/06/28 04:35:29 [ERR] yamux: keepalive failed: i/o deadline reached
2018/06/28 04:35:33 packer: 2018/06/28 04:35:32 waiting for all plugin processes to complete...
2018/06/28 04:35:33 /home/ec2-user/packer: plugin process exited
2018/06/28 04:35:33 packer: 2018/06/28 04:35:29 [ERR] yamux: keepalive failed: i/o deadline reached
2018/06/28 04:35:33 packer: 2018/06/28 04:35:32 Removing: /mnt/packer-amazon-chroot-volumes/xvdf/etc/resolv.conf
2018/06/28 04:35:33 /home/ec2-user/packer: plugin process exited
2018/06/28 04:35:33 [ERR] yamux: keepalive failed: connection write timeout
2018/06/28 04:35:33 packer: 2018/06/28 04:35:33 [ERR] Error decoding response stream 5: EOF
2018/06/28 04:35:33 packer: 2018/06/28 04:35:33 waiting for all plugin processes to complete...
2018/06/28 04:35:33 [ERR] yamux: Failed to write header: write unix @->/tmp/packer-plugin664685839: use of closed network connection
2018/06/28 04:35:33 [ERR] yamux: Failed to write header: write unix @->/tmp/packer-plugin005817504: use of closed network connection
2018/06/28 04:35:33 [ERR] yamux: keepalive failed: session shutdown
2018/06/28 04:35:33 [ERR] yamux: Failed to write header: write unix @->/tmp/packer-plugin365804909: write: broken pipe
2018/06/28 04:35:33 [ERR] yamux: Failed to write header: write unix @->/tmp/packer-plugin905352658: write: broken pipe
2018/06/28 04:35:33 [ERR] yamux: keepalive failed: session shutdown
2018/06/28 04:35:33 [ERR] yamux: Failed to write header: write unix @->/tmp/packer-plugin788154816: write: broken pipe
2018/06/28 04:35:33 [ERR] yamux: keepalive failed: session shutdown
2018/06/28 04:35:33 [INFO] 37525 bytes written for 'stdout'
2018/06/28 04:35:33 [INFO] 6326 bytes written for 'stderr'
2018/06/28 04:35:33 [ERR] Error decoding response stream 18: EOF
2018/06/28 04:35:33 [INFO] RPC endpoint: Communicator ended with: 123
Build 'amazon-chroot' errored: unexpected EOF
2018/06/28 04:35:33 [INFO] (telemetry) ending amazon-chroot
2018/06/28 04:35:33 ui error: Build 'amazon-chroot' errored: unexpected EOF
==> Some builds didn't complete successfully and had errors:
--> amazon-chroot: unexpected EOF
==> Builds finished but no artifacts were created.
2018/06/28 04:35:33 Builds completed. Waiting on interrupt barrier...
2018/06/28 04:35:33 machine readable: error-count []string{"1"}
2018/06/28 04:35:33 ui error:
==> Some builds didn't complete successfully and had errors:
2018/06/28 04:35:33 machine readable: amazon-chroot,error []string{"unexpected EOF"}
2018/06/28 04:35:33 ui error: --> amazon-chroot: unexpected EOF
2018/06/28 04:35:33 ui:
==> Builds finished but no artifacts were created.
2018/06/28 04:35:33 [ERR] yamux: keepalive failed: session shutdown
2018/06/28 04:35:33 [ERR] yamux: keepalive failed: session shutdown
2018/06/28 04:35:33 [INFO] (telemetry) ending shell
2018/06/28 04:35:33 [ERR] yamux: keepalive failed: session shutdown
2018/06/28 04:35:33 [ERR] yamux: Failed to write header: write unix @->/tmp/packer-plugin014531274: write: broken pipe
2018/06/28 04:35:33 [ERR] yamux: Failed to write header: write unix @->/tmp/packer-plugin419990748: write: broken pipe
2018/06/28 04:35:33 /home/ec2-user/packer: plugin process exited
2018/06/28 04:35:33 /home/ec2-user/packer: plugin process exited
2018/06/28 04:35:33 [INFO] (telemetry) Finalizing.
2018/06/28 04:35:35 packer: 2018/06/28 04:35:35 Error in Ui RPC call: connection is shut down
2018/06/28 04:35:35 packer: 2018/06/28 04:35:35 Error in Ui RPC call: connection is shut down
2018/06/28 04:35:35 packer: 2018/06/28 04:35:35 Error in Ui RPC call: connection is shut down
2018/06/28 04:35:35 packer: 2018/06/28 04:35:35 Waiting for state to become: detached
2018/06/28 04:35:35 packer: 2018/06/28 04:35:35 Using 2s as polling delay (change with AWS_POLL_DELAY_SECONDS)
2018/06/28 04:35:35 packer: 2018/06/28 04:35:35 Allowing 300s to complete (change with AWS_TIMEOUT_SECONDS)
2018/06/28 04:35:35 [WARN] (telemetry) Error finalizing report. This is safe to ignore. Post https://checkpoint-api.hashicorp.com/v1/telemetry/packer: context deadline exceeded
2018/06/28 04:35:35 waiting for all plugin processes to complete...
2018/06/28 04:35:35 /home/ec2-user/packer: plugin process exited
I believe I might have identified the issue! I think it occurs when the packer server's memory is exhausted.
@fenichelar awesome! We generally start to see yamux errors under resource contention.
Going to close this since there's nothing more we can do on this without logs.
@mwhooker I caught this because I was making an AMI in one shell while Go was installing dependencies to build Packer from source in another :) solved this and https://github.com/hashicorp/packer/issues/6434 at the same time!
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
I believe I might have identified the issue! I think it occurs when the packer server's memory is exhausted.