While debugging the creation of my container with shell scripts I am experiencing a problem with incomplete output. I am using set -x
which shows the steps of the script. I expect the output to stop where the error occurs, but thats not the case. The output stops in random places before it reaches the problematic line.
Have you experienced that before?
@i0rek Packer runs different processes concurrently and streams the output back to the main packer process, which shows it to you. As a result, the messages show in the packer's output are displayed in non-deterministic order. This is just a quirk of packer's internals.
Beyond this, I'm not sure what if anything might be broken in packer in this case. If this is something you think we should fix, can you share an example log / config that we can use to reproduce this?
@cbednarski thanks for your response. As far as I understand when a script is running the output should be in order. I found that the output is redirected into a file. When the script fails for me, the output doesn't always reach the problematic line, but stops before that. Thats makes debugging very hard. My guess is that it has to do with flushing the output to the file, or maybe the tailing stops too early or something like that.
I do think this is something that is broken in packer. I will try to provide an example. I am running into this all the time. When I run packer build
with the failing script over and over again, the last line of script output is different pretty often.
@cbednarski Atm I am looking at provisioner/shell/provisioner.go
and builder/docker/communicator.go
. This is where I suspect the cause. I am not sure about that at all.
@i0rek Thanks for clarifying! This does sound like something we can improve.
I am running into this all the time. When I run packer build with the failing script over and over again, the last line of script output is different pretty often.
I can rerun to repro this if you can share a packer config that fails sometimes.
@cbednarski I was able to reproduce the problem: https://github.com/i0rek/packer_test_2391. While working on that I made an interesting observation. When the script has regular output like echo
packer does a great job and shows the relevant line. When there is only output from set -x
the problem occurs.
When I dig into the container and check the file that contains the output of the script, everything is in there. So far so good.
I tried to workaround it with dark bash magic. It didn't work. I think packer stops tailing to early.
I think there is a problem with the tailer.
When I remove the tailer and just print the content of the file at the end it works great. It is not as dynamic that way, but at least it works reliable.
diff --git a/builder/docker/communicator.go b/builder/docker/communicator.go
index 6fedf27..323df5c 100644
--- a/builder/docker/communicator.go
+++ b/builder/docker/communicator.go
@@ -272,7 +272,7 @@ func (c *Communicator) run(cmd *exec.Cmd, remote *packer.RemoteCmd, stdin_w io.W
return
case line := <-tail.Lines:
if remote.Stdout != nil {
- remote.Stdout.Write([]byte(line.Text + "\n"))
+ // remote.Stdout.Write([]byte(line.Text + "\n"))
} else {
log.Printf("Command stdout: %#v", line.Text)
}
@@ -282,6 +282,13 @@ func (c *Communicator) run(cmd *exec.Cmd, remote *packer.RemoteCmd, stdin_w io.W
// out of the tailed file.
select {
case <-stopTailCh:
+ content, err := ioutil.ReadFile(outputFile.Name())
+ if err != nil {
+ fmt.Println(err)
+ }
+ if remote.Stdout != nil {
+ remote.Stdout.Write([]byte(content))
+ }
return
default:
}
Would you accept a PR which removes the tailer and replaces it with printing the output at the end? I looked into the tailer, but thats a rabbit hole I don't want to dig into.
The workaround doesn't always work. :( This is confusing.
@cbednarski any news regarding this issue?
@i0rek Thanks for looking into this. I have not had a chance to look into this yet.
Just ran into this. Rather unpleasant. Basically the script was exiting non-zero later on but since the output had not yet all returned it just looks like it dies mid-script. Sent me on a wild goose chase thinking I had a problem in the middle of the script when it was in fact at the end.
Found it out by throwing sleeps around the script, that seemed to let the output catch up and help me find the problem. packer 0.8.1, docker 1.7.0 using boot2docker on Yosemite FWIW.
:panda_face:
@schubert @i0rek thanks for researching this! We have some code in the UI that drains channels from stdout/stdin after a program terminates. This helps us finish reading data that may have been sent but not read before a program terminates.
I have a suspicion we may need to do something similar for the remote communicator as well. This is likely a timing issue so it can be hard to track down by running it. I will take a look at this soonish.
Thank you @cbednarski! :heart_eyes_cat:
I see this in the packer logs: 2015/07/19 20:33:34 packer-provisioner-shell: 2015/07/19 20:33:34 [ERR] 'stdout' copy error: io: read/write on closed pipe
which makes me wonder if the problem lies in here: builder/docker/exec.go
.
My workaround atm is to change packer to not delete the logs and tail it outside of packer. :(
I had some difficulty reproducing this with the 50-iteration example @i0rek provided. I only repro'd once, but using the echo
example. In that case there was simply no output from the provisioner at all.
When I bumped the iterations up to 5000 I can get it more consistently. I'm speculating that the frequency with which this happens depends on system load and scheduling, so if your system load is low you will need more output to see this.
2015/08/07 09:05:51 ui: ^[[0;32m docker: + for n in '$(seq 5000)'^[[0m
2015/08/07 09:05:51 ui: ^[[0;32m docker: + touch 4989^[[0m
2015/08/07 09:05:51 packer-provisioner-shell: 2015/08/07 09:05:51 [INFO] 188611 bytes written for 'stdout'
2015/08/07 09:05:51 packer-provisioner-shell: 2015/08/07 09:05:51 [ERR] 'stdout' copy error: io: read/write on closed pipe
2015/08/07 09:05:51 ui: ^[[0;32m docker: + for n in '$(seq 5000)'^[[0m
2015/08/07 09:05:51 ui: ^[[0;32m docker: + touch 4990^[[0m
Also:
Still investigating.
I think this error is raised in packer/rpc/communicator.go:277
, so it may be caused by the shell provisioner plugin process terminating before the parent process has received all of the output. Speculating at this point, though.
$ ~/code/gotip/bin/go test --race ./packer/rpc
...
Found 22 data race(s)
FAIL github.com/mitchellh/packer/packer/rpc 1.192s
:disappointed:
I think I will need to defer this one until I have more time to sift through the race conditions.
thanks for giving it a spin @cbednarski!
If it helps at all I ran into a similar problem with the salt provisioner. There would be an error, but It wouldn't print all of the output from the salt highstate, so it was difficult to track down the error. It looks to have printed about half of the output in my case.
I think this error is raised in packer/rpc/communicator.go:277, so it may be caused by the shell provisioner plugin process terminating before the parent process has received all of the output. Speculating at this point, though.
There was a race condition here that is now fixed. In fact all the race conditions I know about are fixed at this point. If you are still experiencing this can you try with packer built from master?
@cbednarski thanks for your work! I will give it a spin as soon as possible.
Was facing the same issue (ansible + docker) with v0.8.6,
Build from master works for me :+1:
@cbednarski I tested it. It seems way better now, but I still run into the problem. I am running docker in coreos with xhyve locally. I build master with git pull; make updatedeps; make; make dev
.
$ packer version
Packer v0.8.7.dev (25108c8d13912434d0f32faaf1ea13cdc537b21e+CHANGES)
Since I just ran into this issue again today, I built packer from HEAD and it definitely improves the situation. I received the complete STDOUT I was expecting from a build script and was able to troubleshoot the issue.
So as someone who originally experienced the issue, my use case seems fixed (incomplete output from shell scripts)
Hey guys,
I'm still seeing this but probably my use case is pretty unique (I run a Saltstack highstate that takes A LOT, like 3 hours).
Here is my version:
$ packer version
Packer v0.8.7.dev (6587926a2bfdae673f4cecbd36968d718cd40926)
2016/01/27 04:59:36 ui: ==> virtualbox-ovf: Provisioning with shell script: files/saltstack_highstate.sh
==> virtualbox-ovf: Provisioning with shell script: files/saltstack_highstate.sh
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 Opening files/saltstack_highstate.sh for reading
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 opening new ssh session
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 Starting remote scp process: scp -vt /tmp
2016/01/27 04:59:36 [INFO] 2667 bytes written for 'uploadData'
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 [INFO] 2667 bytes written for 'uploadData'
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 Started SCP session, beginning transfers...
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 Copying input data into temporary file so we can read the length
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 [DEBUG] Uploading script_1903.sh: perms=C0644 size=2667
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 SCP session complete, closing stdin pipe.
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 Waiting for SSH session to complete.
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 scp stderr (length 32): Sink: C0644 2667 script_1903.sh
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 opening new ssh session
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 starting remote command: chmod 0755 /tmp/script_1903.sh
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 remote command exited with '0': chmod 0755 /tmp/script_1903.sh
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 [INFO] RPC endpoint: Communicator ended with: 0
2016/01/27 04:59:36 [INFO] RPC client: Communicator ended with: 0
2016/01/27 04:59:36 [INFO] RPC endpoint: Communicator ended with: 0
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 [INFO] RPC client: Communicator ended with: 0
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 opening new ssh session
2016/01/27 04:59:36 packer: 2016/01/27 04:59:36 starting remote command: chmod +x /tmp/script_1903.sh; PACKER_BUILD_NAME='virtualbox-ovf' PACKER_BUILDER_TYPE='virtualbox-ovf' /tmp/script_1903.sh
// LOTS OF OUTPUT
2016/01/27 06:59:44 ui: virtualbox-ovf: [INFO ] Fetching file from saltenv 'appserver', ** done ** u'|prince/files/dtd/svg11/svg-opacity-attrib.mod'
virtualbox-ovf: [INFO ] Fetching file from saltenv 'appserver', ** done ** u'|prince/files/dtd/svg11/svg-opacity-attrib.mod'
2016/01/27 07:00:00 packer: 2016/01/27 07:00:00 remote command exited with '0': chmod +x /tmp/script_1903.sh; PACKER_BUILD_NAME='virtualbox-ovf' PACKER_BUILDER_TYPE='virtualbox-ovf' /tmp/script_1903.sh
2016/01/27 07:00:00 packer: 2016/01/27 07:00:00 [INFO] RPC endpoint: Communicator ended with: 0
2016/01/27 07:00:00 [INFO] 0 bytes written for 'stderr'
2016/01/27 07:00:00 [INFO] 1225089 bytes written for 'stdout'
2016/01/27 07:00:00 [INFO] RPC client: Communicator ended with: 0
2016/01/27 07:00:00 [INFO] RPC endpoint: Communicator ended with: 0
As you can see the script ends in the middle of the highstate taking ~2 hours.
Here is my builder config:
"builders": [
{
"type": "virtualbox-ovf",
"source_path": "../../flavors/FreeBSD-10.1_amd64/FreeBSD-10.1_amd64.ovf",
"ssh_username": "vagrant",
"ssh_password": "vagrant",
"ssh_pty": true,
"ssh_wait_timeout": "10000s",
"shutdown_command": "echo 'vagrant' | sudo -S -E -H sh -c \"sync && shutdown -p now\"",
"headless": true,
"vm_name": "{{user `vm_prefix_name`}}_{{user `vm_version`}}.{{user `build_number`}}",
"export_opts": ["--manifest", "--vsys", "0", "--vendor", "{{user `vendor`}}", "--vendorurl", "{{user `vendor_url`}}", "--version", "{{user `vm_version`}}.{{user `build_number`}}", "--description", "{{user `vm_description`}}"],
"vboxmanage": [
[
"modifyvm",
"{{.Name}}",
"--memory",
"6144"
],
[
"modifyvm",
"{{.Name}}",
"--cpus",
"8"
]
]
}
],
For the record this doesn't exits at the same point in highstate script and on a second run it started at 07:55:13 and finished at 09:47:54, not sure if it's a timeout thing or a race condition.
Some time ago I pushed all highstate output to a log file as a workaround for the race conditions but it was hard to debug because I have to copy the log outside the VM to analyze it. Also I'm starting the build via Jenkins so I can keep track of every build.
Thank you!
@i0rek I've tested this using a slightly tweaked version your problem.json
with rerun and the latest packer master, and I am unable to repro. 50 rounds of the build pass. I'm sure there are other race conditions lurking but I think this particular code path is solid.
@amontalban If you can reproduce this problem with packer master, please open a new issue and provide a repro case. There may be some other code path in the salt plugin that has a different race condition but as far as I can tell the original issue here is fixed.
If you are still experiencing this problem with packer built from master, make sure you are using the vendored dependencies. In Go 1.5 you will need to set GO15VENDOREXPERIMENT=1
. In Go 1.6 this behavior is automatic.
I'm closing this as fixed but if you have a reliable repro case please open a new issue. Thanks for the feedback, everyone!
I'm still seeing this. I run salt manually using the remote shell provisioner. Build is Go 1.6, master as of right meow.
Yep, it is still there.
Still repros in:
/usr/local/packer --version
2016/06/24 23:21:30 [INFO] Packer version: 0.10.1
2016/06/24 23:21:30 Packer Target OS/Arch: linux amd64
2016/06/24 23:21:30 Built with Go Version: go1.6.2
The qemu
builder looks like:
2016/06/24 21:09:18 packer: 2016/06/24 21:09:18 [INFO] 16710 bytes written for 'stdout'
2016/06/24 21:09:18 packer: 2016/06/24 21:09:18 [ERR] 'stdout' copy error: io: read/write on closed pipe
2016/06/24 21:09:18 ui: qemu: -
qemu: -
2016/06/24 21:09:18 ui: qemu: -# Turning this off breaks GSSAPI used
qemu: -# Turning this off breaks GSSAPI used
2016/06/24 21:09:18 packer: 2016/06/24 21:09:18 opening new ssh session
2016/06/24 21:09:18 packer: 2016/06/24 21:09:18 starting remote command: rm -f /tmp/script_5202.sh
2016/06/24 21:09:18 packer: 2016/06/24 21:09:18 remote command exited with '0': rm -f /tmp/script_5202.sh
2016/06/24 21:09:18 packer: 2016/06/24 21:09:18 [INFO] RPC endpoint: Communicator ended with: 0
2016/06/24 21:09:18 [INFO] RPC client: Communicator ended with: 0
2016/06/24 21:09:18 [INFO] RPC endpoint: Communicator ended with: 0
2016/06/24 21:09:18 packer: 2016/06/24 21:09:18 [INFO] RPC client: Communicator ended with: 0
Whereas the virtualbox-iso
builder looks like:
2016/06/24 21:21:35 packer: 2016/06/24 21:21:35 [INFO] 33094 bytes written for 'stdout'
2016/06/24 21:21:35 packer: 2016/06/24 21:21:35 [ERR] 'stdout' copy error: io: read/write on closed pipe
2016/06/24 21:21:35 ui: virtualbox-iso:
virtualbox-iso:
2016/06/24 21:21:35 ui: virtualbox-iso: -# This sshd was compiled with PATH=/usr/local/bin:/usr/bin
virtualbox-iso: -# This sshd was compiled with PATH=/usr/local/bin:/usr/bin
2016/06/24 21:21:35 ui: virtualbox-iso: +# Den
virtualbox-iso: +# Den
2016/06/24 21:21:35 packer: 2016/06/24 21:21:35 opening new ssh session
2016/06/24 21:21:35 packer: 2016/06/24 21:21:35 starting remote command: rm -f /tmp/script_1125.sh
2016/06/24 21:21:35 packer: 2016/06/24 21:21:35 remote command exited with '0': rm -f /tmp/script_1125.sh
2016/06/24 21:21:35 packer: 2016/06/24 21:21:35 [INFO] RPC endpoint: Communicator ended with: 0
2016/06/24 21:21:35 [INFO] RPC client: Communicator ended with: 0
2016/06/24 21:21:35 [INFO] RPC endpoint: Communicator ended with: 0
2016/06/24 21:21:35 packer: 2016/06/24 21:21:35 [INFO] RPC client: Communicator ended with: 0
Notably, the qemu
builder pretty consistently dies around ~16000 bytes.
These are all related:
It is still in packer 0.12
{
"type": "powershell",
"inline": [
"c:\\tmp\\bin\\unzip.exe c:\\tmp\\chef.zip -d c:\\"
]
},
Notice "cookboo" end of the output.
amazon-ebs: inflating: c:/chef/cookbooks/windows/resources/printer_port.rb
amazon-ebs: warning: skipped "../" path component(s) in ../../chef/cookbooks/windows/resources/zipfile.rb
amazon-ebs: inflating: c:/chef/cookbooks/windows/resources/zipfile.rb
amazon-ebs: warning: skipped "../" path component(s) in ../../chef/cookboo
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: Cleaning up any extra volumes...
==> amazon-ebs: No volumes to clean up, skipping
==> amazon-ebs: Deleting temporary keypair...
Running in this as well .
amazon-ebs: [2017-01-12T12:19:15+00:00] INFO: Processing adcom_supervisor_service[uts-cache-read] action start (dmp_thrift_cache::deploy_uts_cache_read line 55)
amazon-ebs: [2017-01-12T12:19:16+00:00] INFO: Processing execute[supervisorctl update] action nothing (/var/chef/cache/cookbooks/adcom_supervisor/providers/service.rb line 89)
amazon-ebs: [2017-01-12T12:19:16+00:00] INFO: Processing file[/data/app/supervisord/ini/uts-cache-read.ini] action create (/var/chef/cache/cookbooks/adcom_supervisor/providers/service.rb line 94)
amazon-ebs: [2017-01-12T12:19:16+00:00] INFO: Processing template[/data/app/supervisord/ini/uts-cache-read.ini] action create (/var/chef/cache/c
==> amazon-ebs: Provisioning with shell script: scripts/delete-node-from-chef.sh
Is it possible to have another look at this issue please?
FWIW
I am using sleep 120
at the end of my inline script as a simple workaround to the incomplete output.
Bumping for I have stepped into the same issue.
Going to lock this because this is a 2 year old issue already marked as fixed. It's likely that any issues people are seeing now are not related to the original issue, and so the discussion here may be misleading.
I recognize that some people are having real issues here, and I would love to be able to help, but I think the best way to do that is by starting a new discussion with accurate information and a reliable way to reproduce (I can't fix what I can't see). Thanks!
Most helpful comment
Running in this as well .
Is it possible to have another look at this issue please?