Packer: bufio.Scanner: token too long in ansible remote provisioner

Created on 26 Feb 2016  路  26Comments  路  Source: hashicorp/packer

Hi everyone,

I am running packer 0.9.0 on OS X 10.9.5 and tried to use the new ansible remote provisioner when I ran into an issue during my ansible playbook was running.

Debug output: https://gist.github.com/msteinhoff/64777ac9174dcdcc425a

The problem seems to lie here:

https://github.com/mitchellh/packer/blob/da1e9f4976f4651697748a91f3edc77c4d6ce883/provisioner/ansible/provisioner.go#L272-L294

You can see in the log file that before the error occurs there are 591057 bytes written to the stdout pipe:

2016/02/26 00:10:11 packer: 2016/02/26 00:10:11 [INFO] 591057 bytes written for 'stdout'

But the default buffer in bufio.Scanner can only fit 65536 bytes.

Packer then crashes and I have to shut it down with ctrl+c.

bug provisioneansible-local

Most helpful comment

Please, make a release with this fix.

All 26 comments

Interesting. Thanks for tracking this down! We probably need to change this to use a channel instead of a buffer + wait group.

I'd be glad to help fix this bug. I have never programmed in go but if you could point me in the right direction what to do I'd give it a shot.

@msteinhoff Great! Go is really easy to get started with, but maybe this fix is not the easiest to get started with dependning on what you usally do when coding. A hint is to read getting started and the tour specifically to fix this read about channels.

@rickard-von-essen thank you, I'll give it a read later today and then we'll see what happens. I wanted to learn go anyway :)

So I looked a bit deeper into the issue:

exec.Command provides a ReaderCloser for stdout and stderr form which we can read data.
Packer provides the Ui interface which allows us to push individual messages to the user.

The Ansible provisioner implements a go routine where a Scanner is used to read data from the ReaderCloser and tokenize that into individual lines which can be sent to packer Ui.

Problem: The Scanner tokens are limited to lines of 65536 bytes and Ansible sometimes creates large lines, e.g. when it buffers output. Thus, for some Ansible output, the Scanner will just go belly up.

When I look at ReaderCloser, it provides a ReadString function which does not seem to have the same limitation as Scanner.

So if I understand this correctly, I could replace the Scanner with a call to ReadString and the problem should go away.

@cbednarski With all this, how would go channels help here? Am I missing something?

@msteinhoff A channel works as a pipe (a.la. unix pipes). We use channel patterns elsewhere in packer to capture stderr/stdin because it has the benefit of being able to stream an unlimited amount of output. The scanner is reading line-by-line but has an upper limit, as you noted.

I think ReadString will work but will probably block until the entire Ansible operation completes, which means you will not see any output during the build (only after it has completed or failed). I have only skimmed through this, though, so you might have seen something I missed.

Thanks for clarification.

I have searched on exec.Command for something that returns me a channel but did not find anything. Could you point me to an example where channels are used to capture stdout?

Streaming an unlimited amount of output sounds great, but I looked at the Ui interface and only found functions to print messages, so I think we want to separate that byte stream between new lines anyway to push them with Ui.message().

ReadString itself would read until the next line (or to be more specific, to the delimiter that we must provide). Ansible blocks until it has executed a command and returns that output on the next line. So I'd say that blocking would be okay in this case.

I searched a bit in the code and found this:

https://github.com/mitchellh/packer/blob/024132a698092b0d950d0aaffdb8821616de6a72/packer/rpc/communicator_test.go#L124-L133

Which still uses a Reader and ReadString() but replaces the WaitGroup with a channel.

I'll give it a try.

Hm, before I reinvent the weel, why not just use this:

https://github.com/mitchellh/packer/blob/23d5e506006e674bdb1b0d4b9b26ff02a09af6aa/builder/qemu/driver.go#L202-L215

It looks like that does exactly what we want here. Is there some place of common functionality where I could move that function? Or would it be better to copy it into the ansible provisoner?

@msteinhoff Oops, I misremembered this as a channel rather than a pipe. Thanks for linking to that. Feel free to copy-paste!

Okay!

@cbednarski Question: Why is the WaitGroup used at all? Shouldn't it be sufficient to call cmd.Wait()?

@msteinhoff It's been awhile since I've been spelunking here but IIRC it's because we want to capture stderr and stdout independently rather than as a combined stream.

Once we've detached stdout and stderr into their own go routines (using the go keyword) we have to block until we finish reading all the data. If we don't, Wait() might return because the process has exited but before the UI thread has finished reading the output.

If we return control to the caller while log data is still in flight it may be truncated or displayed in the wrong place (e.g. after the process has already completed).

Ah I see.

Id introduce a bool channel then and just send a true value on the end of each go routine.

Is there an more elegant than just e.g. doing this:

playbookDone = make(chan bool)

go logReader(stdout)
go logReader(stderr)

cmd.Start()
<- playbookDone
<- playbookDone
cmd.Wait()

And then calling playbookDone <- true at the end of the logReader go routine?

I fixed the issue without using channels:
https://github.com/msteinhoff/packer/commit/48841ef664d3d7c4368b545d3239ace3799f8ee6

How should I go about a pull request? The guidelines say to provide tests and so on, but I am not sure how to provide such tests in this case. Its more of a integration test that would need a full VM and some ansible playbooks that must be able to produce very long lines.

@cbednarski @rickard-von-essen _ping_

@msteinhoff Sorry I missed your reply from Friday. Please go ahead and open a PR.

Id introduce a bool channel then and just send a true value on the end of each go routine.

I use this trick too, definitely a legit pattern. mitchellh always reminds me that a struct{} will take up less memory than a bool, but mainly a style thing.

The guidelines say to provide tests and so on, but I am not sure how to provide such tests in this case. Its more of a integration test that would need a full VM and some ansible playbooks that must be able to produce very long lines.

For something like this it may well be the case that we need to use an integration test. I'm OK with this. If you can provide a simple Ansible playbook and a template that reproduces the problem that should suffice.

For instance maybe you could use a simple Ansible playbook that cats a file that has more than 65536 characters on one line.

The way I generally run these is using the VMX builder, so it only takes about 30 seconds. Still quite long but not as long as a complete build, and sometimes the only reasonable way to test things.

Can you also indicate what steps need to be setup on the host to run Ansible remote? I can eventually turn this into a nested virt packer build so it can be run in CI but it will take some time so I will test this manually in the mean time. Thanks!

Hey @cbednarski, I saw your reply but was busy with business-things and forgot to answer. Sorry about that.

The boolean channel stuff did not worked out. I decided to leave the WaitGroup as-is, and all ansible output (including long lines) is now forwarded completely to the packer UI on my machine.

A test case is very easy:

- hosts: all
  tasks:
  - debug: msg="[... 65536 characters ...]"

But where would I put that file in the packer tree?

And I am not sure about the rest of the test environment and what you mean with the VMX builder.

The complete test case for now requires

  • ansible on the testing machine (I tested it with v2.0.0.2)
  • a full packer config that spins up a new VM with SSH enabled
  • a ssh host key for the packer provisioner's SSH proxy
  • a ssh public key configured on the VM and in the ansible provisioner

Spinning up such a VM takes really long.

Maybe it would be sufficient for this test to have SSH available on the testing machine, but I don't if testing such things in isolation is possible.

Also there do not seem any integration tests available for the provisioner. How do you test such things in general?

Hi guys! What about that issue? @cbednarski can you merge that PR?

Hi guys, great job! Would really appreciate moving this forward.

Same here, just found this issue after enable ansible verbose mode and running apt upgrades :)

Hey @noose @teu @ricardclau https://github.com/mitchellh/packer/issues/3392 is waiting on tests; have you compiled this patch and found it fixes the problem for you? If so would you be able to contribute a test to the PR?

Hi @cbednarski

I can confirm the patch fixes the problem for me after testing the same template with both current stable Packer (0.10.1 which crashes) and #3392 (which works perfectly)

In order to reproduce the error I am running apt-get upgrade with -vvvv using Ansible in a bare Ubuntu 14.04 which has a quite verbose output. Of course, a simple Ansible task with a debug msg of > 65535 characters will have the same effect but I wanted to test the exact case that was failing for me.

I am happy to give the tests a go but I would like some orientation about how to properly test it (functional test? injecting mocked objects to ansible/provisioner.go?) and if you prefer a functional test what would be the best way to code it so that it can be added to the CI process

Thanks for testing this, @ricardclau.

Of course, a simple Ansible task with a debug msg of > 65535 characters will have the same effect...

This seems to be the most straightforward repro. I am going to merge this but it would be great if you could contribute a test like this!

I am happy to give the tests a go but I would like some orientation a bout how to properly test it (functional test? injecting mocked objects to ansible/provisioner.go?)

A functional test / acceptance test is fine. For acceptance tests we generally gate these by way of an environment variable PACKER_ACC=1. If the ENV is set we run the extra tests.

For provider-specific tests I usually run these in VMware or Docker so I can run them locally and it's fast. Unfortunately we don't run these in CI since our CI environment does not have VM, VirtualBox, etc.

I see, thanks for the orientation, will definitely try to contribute a test later this week

Thanks for merging the patch as well, now we will be able to see Ansible in all its verbosity! :)

Hey @ricardclau, thanks for the test you provided. Glad to see this is merged now :)

Please, make a release with this fix.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wduncanfraser picture wduncanfraser  路  3Comments

Tensho picture Tensho  路  3Comments

shantanugadgil picture shantanugadgil  路  3Comments

frezbo picture frezbo  路  3Comments

mwhooker picture mwhooker  路  3Comments