Packer: Incomplete output from shell provisioner

Created on 6 Jul 2015 · 38Comments · Source: hashicorp/packer

While debugging the creation of my container with shell scripts I am experiencing a problem with incomplete output. I am using set -x which shows the steps of the script. I expect the output to stop where the error occurs, but thats not the case. The output stops in random places before it reaches the problematic line.

Have you experienced that before?

bug core

Source

i0rek

Most helpful comment

Running in this as well .

    amazon-ebs: [2017-01-12T12:19:15+00:00] INFO: Processing adcom_supervisor_service[uts-cache-read] action start (dmp_thrift_cache::deploy_uts_cache_read line 55)
    amazon-ebs: [2017-01-12T12:19:16+00:00] INFO: Processing execute[supervisorctl update] action nothing (/var/chef/cache/cookbooks/adcom_supervisor/providers/service.rb line 89)
    amazon-ebs: [2017-01-12T12:19:16+00:00] INFO: Processing file[/data/app/supervisord/ini/uts-cache-read.ini] action create (/var/chef/cache/cookbooks/adcom_supervisor/providers/service.rb line 94)
    amazon-ebs: [2017-01-12T12:19:16+00:00] INFO: Processing template[/data/app/supervisord/ini/uts-cache-read.ini] action create (/var/chef/cache/c
==> amazon-ebs: Provisioning with shell script: scripts/delete-node-from-chef.sh

Is it possible to have another look at this issue please?

aniketbhatnagar on 12 Jan 2017

👍4

All 38 comments

@i0rek Packer runs different processes concurrently and streams the output back to the main packer process, which shows it to you. As a result, the messages show in the packer's output are displayed in non-deterministic order. This is just a quirk of packer's internals.

Beyond this, I'm not sure what if anything might be broken in packer in this case. If this is something you think we should fix, can you share an example log / config that we can use to reproduce this?

cbednarski on 6 Jul 2015

@cbednarski thanks for your response. As far as I understand when a script is running the output should be in order. I found that the output is redirected into a file. When the script fails for me, the output doesn't always reach the problematic line, but stops before that. Thats makes debugging very hard. My guess is that it has to do with flushing the output to the file, or maybe the tailing stops too early or something like that.

I do think this is something that is broken in packer. I will try to provide an example. I am running into this all the time. When I run packer build with the failing script over and over again, the last line of script output is different pretty often.

i0rek on 6 Jul 2015

@cbednarski Atm I am looking at provisioner/shell/provisioner.go and builder/docker/communicator.go. This is where I suspect the cause. I am not sure about that at all.

i0rek on 6 Jul 2015

@i0rek Thanks for clarifying! This does sound like something we can improve.

I am running into this all the time. When I run packer build with the failing script over and over again, the last line of script output is different pretty often.

I can rerun to repro this if you can share a packer config that fails sometimes.

cbednarski on 6 Jul 2015

@cbednarski I was able to reproduce the problem: https://github.com/i0rek/packer_test_2391. While working on that I made an interesting observation. When the script has regular output like echo packer does a great job and shows the relevant line. When there is only output from set -x the problem occurs.

i0rek on 7 Jul 2015

When I dig into the container and check the file that contains the output of the script, everything is in there. So far so good.

i0rek on 7 Jul 2015

I tried to workaround it with dark bash magic. It didn't work. I think packer stops tailing to early.

i0rek on 8 Jul 2015

I think there is a problem with the tailer.

i0rek on 9 Jul 2015

When I remove the tailer and just print the content of the file at the end it works great. It is not as dynamic that way, but at least it works reliable.

diff --git a/builder/docker/communicator.go b/builder/docker/communicator.go
index 6fedf27..323df5c 100644
--- a/builder/docker/communicator.go
+++ b/builder/docker/communicator.go
@@ -272,7 +272,7 @@ func (c *Communicator) run(cmd *exec.Cmd, remote *packer.RemoteCmd, stdin_w io.W
                                return
                        case line := <-tail.Lines:
                                if remote.Stdout != nil {
-                                       remote.Stdout.Write([]byte(line.Text + "\n"))
+                                       // remote.Stdout.Write([]byte(line.Text + "\n"))
                                } else {
                                        log.Printf("Command stdout: %#v", line.Text)
                                }
@@ -282,6 +282,13 @@ func (c *Communicator) run(cmd *exec.Cmd, remote *packer.RemoteCmd, stdin_w io.W
                                // out of the tailed file.
                                select {
                                case <-stopTailCh:
+                                       content, err := ioutil.ReadFile(outputFile.Name())
+                                       if err != nil {
+                                               fmt.Println(err)
+                                       }
+                                       if remote.Stdout != nil {
+                                               remote.Stdout.Write([]byte(content))
+                                       }
                                        return
                                default:
                                }

i0rek on 9 Jul 2015

Would you accept a PR which removes the tailer and replaces it with printing the output at the end? I looked into the tailer, but thats a rabbit hole I don't want to dig into.

i0rek on 9 Jul 2015

The workaround doesn't always work. :( This is confusing.

i0rek on 9 Jul 2015

@cbednarski any news regarding this issue?

i0rek on 10 Jul 2015

@i0rek Thanks for looking into this. I have not had a chance to look into this yet.

cbednarski on 13 Jul 2015

Just ran into this. Rather unpleasant. Basically the script was exiting non-zero later on but since the output had not yet all returned it just looks like it dies mid-script. Sent me on a wild goose chase thinking I had a problem in the middle of the script when it was in fact at the end.

Found it out by throwing sleeps around the script, that seemed to let the output catch up and help me find the problem. packer 0.8.1, docker 1.7.0 using boot2docker on Yosemite FWIW.

:panda_face:

schubert on 14 Jul 2015

@schubert @i0rek thanks for researching this! We have some code in the UI that drains channels from stdout/stdin after a program terminates. This helps us finish reading data that may have been sent but not read before a program terminates.

I have a suspicion we may need to do something similar for the remote communicator as well. This is likely a timing issue so it can be hard to track down by running it. I will take a look at this soonish.

cbednarski on 16 Jul 2015

Thank you @cbednarski! :heart_eyes_cat:

i0rek on 16 Jul 2015

I see this in the packer logs: 2015/07/19 20:33:34 packer-provisioner-shell: 2015/07/19 20:33:34 [ERR] 'stdout' copy error: io: read/write on closed pipe which makes me wonder if the problem lies in here: builder/docker/exec.go.

i0rek on 19 Jul 2015

My workaround atm is to change packer to not delete the logs and tail it outside of packer. :(

i0rek on 21 Jul 2015

I had some difficulty reproducing this with the 50-iteration example @i0rek provided. I only repro'd once, but using the echo example. In that case there was simply no output from the provisioner at all.

When I bumped the iterations up to 5000 I can get it more consistently. I'm speculating that the frequency with which this happens depends on system load and scheduling, so if your system load is low you will need more output to see this.

2015/08/07 09:05:51 ui: ^[[0;32m    docker: + for n in '$(seq 5000)'^[[0m
2015/08/07 09:05:51 ui: ^[[0;32m    docker: + touch 4989^[[0m
2015/08/07 09:05:51 packer-provisioner-shell: 2015/08/07 09:05:51 [INFO] 188611 bytes written for 'stdout'
2015/08/07 09:05:51 packer-provisioner-shell: 2015/08/07 09:05:51 [ERR] 'stdout' copy error: io: read/write on closed pipe
2015/08/07 09:05:51 ui: ^[[0;32m    docker: + for n in '$(seq 5000)'^[[0m
2015/08/07 09:05:51 ui: ^[[0;32m    docker: + touch 4990^[[0m

Also:

Still investigating.

cbednarski on 7 Aug 2015

I think this error is raised in packer/rpc/communicator.go:277, so it may be caused by the shell provisioner plugin process terminating before the parent process has received all of the output. Speculating at this point, though.

cbednarski on 7 Aug 2015

$ ~/code/gotip/bin/go test --race ./packer/rpc
...
Found 22 data race(s)
FAIL    github.com/mitchellh/packer/packer/rpc  1.192s

:disappointed:

I think I will need to defer this one until I have more time to sift through the race conditions.

cbednarski on 7 Aug 2015

thanks for giving it a spin @cbednarski!

i0rek on 7 Aug 2015

If it helps at all I ran into a similar problem with the salt provisioner. There would be an error, but It wouldn't print all of the output from the salt highstate, so it was difficult to track down the error. It looks to have printed about half of the output in my case.

BlaineAtAffirm on 1 Sep 2015

I think this error is raised in packer/rpc/communicator.go:277, so it may be caused by the shell provisioner plugin process terminating before the parent process has received all of the output. Speculating at this point, though.

There was a race condition here that is now fixed. In fact all the race conditions I know about are fixed at this point. If you are still experiencing this can you try with packer built from master?

cbednarski on 5 Nov 2015

@cbednarski thanks for your work! I will give it a spin as soon as possible.

i0rek on 5 Nov 2015

Was facing the same issue (ansible + docker) with v0.8.6,
Build from master works for me :+1:

pdvyas on 8 Nov 2015

@cbednarski I tested it. It seems way better now, but I still run into the problem. I am running docker in coreos with xhyve locally. I build master with git pull; make updatedeps; make; make dev.

$ packer version
Packer v0.8.7.dev (25108c8d13912434d0f32faaf1ea13cdc537b21e+CHANGES)

i0rek on 17 Nov 2015

Since I just ran into this issue again today, I built packer from HEAD and it definitely improves the situation. I received the complete STDOUT I was expecting from a build script and was able to troubleshoot the issue.

So as someone who originally experienced the issue, my use case seems fixed (incomplete output from shell scripts)

schubert on 21 Dec 2015

Hey guys,

I'm still seeing this but probably my use case is pretty unique (I run a Saltstack highstate that takes A LOT, like 3 hours).

Here is my version:

$ packer version
Packer v0.8.7.dev (6587926a2bfdae673f4cecbd36968d718cd40926)

2016/01/27 04:59:36 ui: ==> virtualbox-ovf: Provisioning with shell script: files/saltstack_highstate.sh
==> virtualbox-ovf: Provisioning with shell script: files/saltstack_highstate.sh
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 Opening files/saltstack_highstate.sh for reading
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 opening new ssh session
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 Starting remote scp process:  scp -vt /tmp
2016/01/27 04:59:36 [INFO] 2667 bytes written for 'uploadData'
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 [INFO] 2667 bytes written for 'uploadData'
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 Started SCP session, beginning transfers...
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 Copying input data into temporary file so we can read the length
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 [DEBUG] Uploading script_1903.sh: perms=C0644 size=2667
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 SCP session complete, closing stdin pipe.
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 Waiting for SSH session to complete.
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 scp stderr (length 32): Sink: C0644 2667 script_1903.sh
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 opening new ssh session
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 starting remote command: chmod 0755 /tmp/script_1903.sh
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 remote command exited with '0': chmod 0755 /tmp/script_1903.sh
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 [INFO] RPC endpoint: Communicator ended with: 0
2016/01/27 04:59:36 [INFO] RPC client: Communicator ended with: 0
2016/01/27 04:59:36 [INFO] RPC endpoint: Communicator ended with: 0
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 [INFO] RPC client: Communicator ended with: 0
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 opening new ssh session
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 starting remote command: chmod +x /tmp/script_1903.sh; PACKER_BUILD_NAME='virtualbox-ovf' PACKER_BUILDER_TYPE='virtualbox-ovf' /tmp/script_1903.sh

// LOTS OF OUTPUT

2016/01/27 06:59:44 ui:     virtualbox-ovf: [INFO    ] Fetching file from saltenv 'appserver', ** done ** u'|prince/files/dtd/svg11/svg-opacity-attrib.mod'
    virtualbox-ovf: [INFO    ] Fetching file from saltenv 'appserver', ** done ** u'|prince/files/dtd/svg11/svg-opacity-attrib.mod'
2016/01/27 07:00:00 packer: 2016/01/27 07:00:00 remote command exited with '0': chmod +x /tmp/script_1903.sh; PACKER_BUILD_NAME='virtualbox-ovf' PACKER_BUILDER_TYPE='virtualbox-ovf' /tmp/script_1903.sh
2016/01/27 07:00:00 packer: 2016/01/27 07:00:00 [INFO] RPC endpoint: Communicator ended with: 0
2016/01/27 07:00:00 [INFO] 0 bytes written for 'stderr'
2016/01/27 07:00:00 [INFO] 1225089 bytes written for 'stdout'
2016/01/27 07:00:00 [INFO] RPC client: Communicator ended with: 0
2016/01/27 07:00:00 [INFO] RPC endpoint: Communicator ended with: 0

As you can see the script ends in the middle of the highstate taking ~2 hours.

Here is my builder config:

  "builders": [
    {
      "type": "virtualbox-ovf",
      "source_path": "../../flavors/FreeBSD-10.1_amd64/FreeBSD-10.1_amd64.ovf",
      "ssh_username": "vagrant",
      "ssh_password": "vagrant",
      "ssh_pty": true,
      "ssh_wait_timeout": "10000s",
      "shutdown_command": "echo 'vagrant' | sudo -S -E -H sh -c \"sync && shutdown -p now\"",
      "headless": true,
      "vm_name": "{{user `vm_prefix_name`}}_{{user `vm_version`}}.{{user `build_number`}}",
      "export_opts": ["--manifest", "--vsys", "0", "--vendor", "{{user `vendor`}}", "--vendorurl", "{{user `vendor_url`}}", "--version", "{{user `vm_version`}}.{{user `build_number`}}", "--description", "{{user `vm_description`}}"],
      "vboxmanage": [
        [
          "modifyvm",
          "{{.Name}}",
          "--memory",
          "6144"
        ],
        [
          "modifyvm",
          "{{.Name}}",
          "--cpus",
          "8"
        ]
      ]
    }
  ],

For the record this doesn't exits at the same point in highstate script and on a second run it started at 07:55:13 and finished at 09:47:54, not sure if it's a timeout thing or a race condition.

Some time ago I pushed all highstate output to a log file as a workaround for the race conditions but it was hard to debug because I have to copy the log outside the VM to analyze it. Also I'm starting the build via Jenkins so I can keep track of every build.

Thank you!

amontalban on 27 Jan 2016

@i0rek I've tested this using a slightly tweaked version your problem.json with rerun and the latest packer master, and I am unable to repro. 50 rounds of the build pass. I'm sure there are other race conditions lurking but I think this particular code path is solid.

@amontalban If you can reproduce this problem with packer master, please open a new issue and provide a repro case. There may be some other code path in the salt plugin that has a different race condition but as far as I can tell the original issue here is fixed.

If you are still experiencing this problem with packer built from master, make sure you are using the vendored dependencies. In Go 1.5 you will need to set GO15VENDOREXPERIMENT=1. In Go 1.6 this behavior is automatic.

I'm closing this as fixed but if you have a reliable repro case please open a new issue. Thanks for the feedback, everyone!

cbednarski on 12 Feb 2016

I'm still seeing this. I run salt manually using the remote shell provisioner. Build is Go 1.6, master as of right meow.

kev009 on 4 Mar 2016

Yep, it is still there.

themalkolm on 19 May 2016

Still repros in:

/usr/local/packer --version
2016/06/24 23:21:30 [INFO] Packer version: 0.10.1
2016/06/24 23:21:30 Packer Target OS/Arch: linux amd64
2016/06/24 23:21:30 Built with Go Version: go1.6.2

The qemu builder looks like:

2016/06/24 21:09:18 packer: 2016/06/24 21:09:18 [INFO] 16710 bytes written for 'stdout'
2016/06/24 21:09:18 packer: 2016/06/24 21:09:18 [ERR] 'stdout' copy error: io: read/write on closed pipe
2016/06/24 21:09:18 ui:     qemu: -
    qemu: -
2016/06/24 21:09:18 ui:     qemu: -# Turning this off breaks GSSAPI used
    qemu: -# Turning this off breaks GSSAPI used
2016/06/24 21:09:18 packer: 2016/06/24 21:09:18 opening new ssh session
2016/06/24 21:09:18 packer: 2016/06/24 21:09:18 starting remote command: rm -f /tmp/script_5202.sh
2016/06/24 21:09:18 packer: 2016/06/24 21:09:18 remote command exited with '0': rm -f /tmp/script_5202.sh
2016/06/24 21:09:18 packer: 2016/06/24 21:09:18 [INFO] RPC endpoint: Communicator ended with: 0
2016/06/24 21:09:18 [INFO] RPC client: Communicator ended with: 0
2016/06/24 21:09:18 [INFO] RPC endpoint: Communicator ended with: 0
2016/06/24 21:09:18 packer: 2016/06/24 21:09:18 [INFO] RPC client: Communicator ended with: 0

Whereas the virtualbox-iso builder looks like:

2016/06/24 21:21:35 packer: 2016/06/24 21:21:35 [INFO] 33094 bytes written for 'stdout'
2016/06/24 21:21:35 packer: 2016/06/24 21:21:35 [ERR] 'stdout' copy error: io: read/write on closed pipe
2016/06/24 21:21:35 ui:     virtualbox-iso:
    virtualbox-iso:
2016/06/24 21:21:35 ui:     virtualbox-iso: -# This sshd was compiled with PATH=/usr/local/bin:/usr/bin
    virtualbox-iso: -# This sshd was compiled with PATH=/usr/local/bin:/usr/bin
2016/06/24 21:21:35 ui:     virtualbox-iso: +# Den
    virtualbox-iso: +# Den
2016/06/24 21:21:35 packer: 2016/06/24 21:21:35 opening new ssh session
2016/06/24 21:21:35 packer: 2016/06/24 21:21:35 starting remote command: rm -f /tmp/script_1125.sh
2016/06/24 21:21:35 packer: 2016/06/24 21:21:35 remote command exited with '0': rm -f /tmp/script_1125.sh
2016/06/24 21:21:35 packer: 2016/06/24 21:21:35 [INFO] RPC endpoint: Communicator ended with: 0
2016/06/24 21:21:35 [INFO] RPC client: Communicator ended with: 0
2016/06/24 21:21:35 [INFO] RPC endpoint: Communicator ended with: 0
2016/06/24 21:21:35 packer: 2016/06/24 21:21:35 [INFO] RPC client: Communicator ended with: 0

Notably, the qemu builder pretty consistently dies around ~16000 bytes.

These are all related:

curiositycasualty on 25 Jun 2016

It is still in packer 0.12

    {
      "type": "powershell",
      "inline": [
        "c:\\tmp\\bin\\unzip.exe c:\\tmp\\chef.zip -d c:\\"
      ]
    },

Notice "cookboo" end of the output.

    amazon-ebs: inflating: c:/chef/cookbooks/windows/resources/printer_port.rb
    amazon-ebs: warning:  skipped "../" path component(s) in ../../chef/cookbooks/windows/resources/zipfile.rb
    amazon-ebs: inflating: c:/chef/cookbooks/windows/resources/zipfile.rb
    amazon-ebs: warning:  skipped "../" path component(s) in ../../chef/cookboo
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: Cleaning up any extra volumes...
==> amazon-ebs: No volumes to clean up, skipping
==> amazon-ebs: Deleting temporary keypair...

themalkolm on 12 Dec 2016

👍1

Running in this as well .

    amazon-ebs: [2017-01-12T12:19:15+00:00] INFO: Processing adcom_supervisor_service[uts-cache-read] action start (dmp_thrift_cache::deploy_uts_cache_read line 55)
    amazon-ebs: [2017-01-12T12:19:16+00:00] INFO: Processing execute[supervisorctl update] action nothing (/var/chef/cache/cookbooks/adcom_supervisor/providers/service.rb line 89)
    amazon-ebs: [2017-01-12T12:19:16+00:00] INFO: Processing file[/data/app/supervisord/ini/uts-cache-read.ini] action create (/var/chef/cache/cookbooks/adcom_supervisor/providers/service.rb line 94)
    amazon-ebs: [2017-01-12T12:19:16+00:00] INFO: Processing template[/data/app/supervisord/ini/uts-cache-read.ini] action create (/var/chef/cache/c
==> amazon-ebs: Provisioning with shell script: scripts/delete-node-from-chef.sh

Is it possible to have another look at this issue please?

aniketbhatnagar on 12 Jan 2017

👍4

FWIW

I am using sleep 120 at the end of my inline script as a simple workaround to the incomplete output.

rodriguezsergio on 20 Mar 2017

Bumping for I have stepped into the same issue.

Zogg on 30 May 2017

Going to lock this because this is a 2 year old issue already marked as fixed. It's likely that any issues people are seeing now are not related to the original issue, and so the discussion here may be misleading.

I recognize that some people are having real issues here, and I would love to be able to help, but I think the best way to do that is by starting a new discussion with accurate information and a reliable way to reproduce (I can't fix what I can't see). Thanks!