packer version: 1.2.5
host platform: centos 7.5 on Azure
The vagrant-cloud
post-processor times out when uploading a box ~5GB in size. Using curl
to upload the same box is successful. This happens every time for me. I'm using an Azure VM which has ~30MB/s bandwidth.
May be related to #5360
How does it fail?
It takes about 2 hours to build the box, so I don't have logs handy. It's this message:
Error uploading box! Will retry in 10 seconds. Error: %s
But I can't remember %s
exactly. I'll have to get that for you.
It would be great with a log, at least the post-processor part.
@double16 I've run into variations of this bug many, many, times. If you compile your own packer version, try this sed command:
sed -i -e "s/common.Retry(10, 10, 3/common.Retry(10, 10, 20/g" post-processor/vagrant-cloud/step_upload.go
Which will modify packer to retry the upload 20 times before failing. With this change, you might run into this error:
magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Status: 503
Which appears to be synonymous with a successful upload, on the previous attempt, but packer didn't receive the correct status code within the allotted time.
From what I can tell, there is a delay between when the upload finishes, and sends the HTTP 100
followed by another delay before it sends an HTTP 200
... and the larger the box, the longer the delay, so the greater the likelihood a 503
will ultimately occur.
@ladar if the upload is successful but validation fails, does it re-upload entirely the next try around, or does it just see that it's uploaded already and succeed?
@SwampDragons I believe the box gets uploaded during the retries, if the previous attempt fails... until the 503 errors appear, at which point I don't believe it actually uploads the data. I just finished a full build and wiped all my logs, or I'd send you the raw data.
I originally thought this might be a timeout problem, until I noticed the issue happens more frequently when I have all 5 robots building boxes. So I started looking into a rate/size limit with vagrant cloud. I opened a ticket with HashiCorp tech support. They said the quota wasn't the problem, but it took them awhile to get back, and by then I was pegged to my v1.2.4 fork (waiting on the split function merge).
So I haven't looked into this issue too deeply as of late. With the recent merge, I'm once again trying to re-sync with the HEAD, so I'll probably spend more time investigating, and type up a full bug report when I have the time.
It's worth noting, I recently wrote a crude bash script to upload boxes which fail using packer, and that got me thinking it's a timeout issue (again). At least a timeout problem when it comes to the 503 errors. I have no theories on why the uploads sometimes fail before I get the 503.
To date, whenever I've gotten a 503, and then go download the box, and compare the checksums, they've matched. Whereas if the upload fails before I start getting the 503 errors, the box doesn't appear to be available for download.
It may also be worth mentioning, this bug first appeared when I switched from the atlas upload post processor, to the vagrant cloud post processor.
Interesting. Might be worth looking at whether the code for the client we used in atlas uploads had some kind of error handling that the vagrant cloud postprocessor doesn't
Nothing really jumps out at me as different enough to account for this; I wonder if our vagrant cloud servers are doing something differently.
Nothing really jumps out at me as different enough to account for this; I wonder if our vagrant cloud servers are doing something differently.
It's been so long that I could be remembering it wrong, but as I recall the Atlas uploader used a different set of URLs, and API pragmatics, which made them difficult to compare, at least from the perspective of the log files. I don't think I ever tried comparing the code.
I've reached out to the vagrant cloud folks to see if they have any insights.
As it turns out, I still had one log file left from my robox run last week, and it contained a text book example of the error.Note, this log/error was using packer v1.2.4, and like I said, I haven't done very much testing against HEAD.
In the debug log, I've redacted most URL/cookie strings inside the log, just in case they contain something I shouldn't be posting on GitHub. Anywhwere you see abcdefghijklmnopqrstuvwxyzz
there was a redaction.
This packer binary was built with a retry count of 8, and an increasing pause up to 300 seconds... the max allowed by the retry code, by running sed -i -e "s/common.Retry(10, 10, 3/common.Retry(10, 300, 8/g" post-processor/vagrant-cloud/step_upload.go
.
First an abbreviated, reformatted log, with the most relevant lines, as the debug output is hard to follow. The first 6 attempts failed, while the last 2 attempts returned a 503 error.
magma-developer-virtualbox (vagrant-cloud): Uploading box: output/magma-developer-virtualbox-1.8.28.box
magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 1
magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Error: Put https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz: EOF
magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 2
magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Error: Put https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz: EOF
magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 3
magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Error: Put https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz: EOF
magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 4
magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Error: Put https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz: EOF
magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 5
magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Error: Put https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz: EOF
magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 6
magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Error: Put https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz: EOF
magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 7
magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Status: 503
magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 8
magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Status: 503
If you dig into the debug output below, you'll see the upload failures show a response of "\ Based on what I know of the vagrant cloud architecture, boxes appear to get uploaded to a HashiCorp server, before being transferred to Amazon S3 for permanent storage. If the server is waiting until the latter happens before responding, that would explain the timeout. The full debug log file follows, for your reading pleasure:Post-Processor Vagrant Cloud API Upload: output/magma-developer-virtualbox-1.8.28.box https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz
Post-Processor Vagrant Cloud Upload Response:
packer:
packer: <nil>
2018/08/30 04:04:28 [INFO] (telemetry) Starting post-processor vagrant
2018/08/30 04:04:28 ui: ==> magma-developer-virtualbox (vagrant): Creating Vagrant box for 'virtualbox' provider
2018/08/30 04:04:28 ui: magma-developer-virtualbox (vagrant): Copying from include: tpl/lavabit/info.json
2018/08/30 04:04:28 ui: magma-developer-virtualbox (vagrant): Copying from artifact: output/magma-developer-virtualbox/magma-developer-virtualbox-disk001.vmdk
2018/08/30 04:04:31 ui: magma-developer-virtualbox (vagrant): Copying from artifact: output/magma-developer-virtualbox/magma-developer-virtualbox.ovf
2018/08/30 04:04:31 ui: magma-developer-virtualbox (vagrant): Renaming the OVF to box.ovf...
2018/08/30 04:04:31 packer: 2018/08/30 04:04:31 Looking for OVF to rename...
2018/08/30 04:04:31 packer: 2018/08/30 04:04:31 Looking for OVF in artifact...
2018/08/30 04:04:31 packer: 2018/08/30 04:04:31 Renaming: '/tmp/packer115534990/magma-developer-virtualbox.ovf' => box.ovf
2018/08/30 04:04:31 packer: 2018/08/30 04:04:31 Looking for OVF for base mac address...
2018/08/30 04:04:31 packer: 2018/08/30 04:04:31 Looking for OVF in artifact...
2018/08/30 04:04:31 packer: 2018/08/30 04:04:31 Base mac address: 080027496564
2018/08/30 04:04:31 ui: magma-developer-virtualbox (vagrant): Using custom Vagrantfile: tpl/developer.rb
2018/08/30 04:04:31 packer: 2018/08/30 04:04:31 Turning dir into box: /tmp/packer115534990 => output/magma-developer-virtualbox-1.8.28.box
2018/08/30 04:04:31 packer: 2018/08/30 04:04:31 Compressing with gzip compression level: 9
2018/08/30 04:04:31 packer: 2018/08/30 04:04:31 Skipping directory '/tmp/packer115534990' for box 'output/magma-developer-virtualbox-1.8.28.box'
2018/08/30 04:04:31 packer: 2018/08/30 04:04:31 Box add: '/tmp/packer115534990/Vagrantfile' to 'output/magma-developer-virtualbox-1.8.28.box'
2018/08/30 04:04:31 ui: magma-developer-virtualbox (vagrant): Compressing: Vagrantfile
2018/08/30 04:04:31 packer: 2018/08/30 04:04:31 Box add: '/tmp/packer115534990/box.ovf' to 'output/magma-developer-virtualbox-1.8.28.box'
2018/08/30 04:04:31 ui: magma-developer-virtualbox (vagrant): Compressing: box.ovf
2018/08/30 04:04:31 packer: 2018/08/30 04:04:31 Box add: '/tmp/packer115534990/info.json' to 'output/magma-developer-virtualbox-1.8.28.box'
2018/08/30 04:04:31 ui: magma-developer-virtualbox (vagrant): Compressing: info.json
2018/08/30 04:04:31 packer: 2018/08/30 04:04:31 Box add: '/tmp/packer115534990/magma-developer-virtualbox-disk001.vmdk' to 'output/magma-developer-virtualbox-1.8.28.box'
2018/08/30 04:04:31 ui: magma-developer-virtualbox (vagrant): Compressing: magma-developer-virtualbox-disk001.vmdk
2018/08/30 04:04:55 ui: magma-developer-virtualbox (vagrant): Compressing: metadata.json
2018/08/30 04:04:55 packer: 2018/08/30 04:04:55 Box add: '/tmp/packer115534990/metadata.json' to 'output/magma-developer-virtualbox-1.8.28.box'
2018/08/30 04:04:56 [INFO] (telemetry) ending vagrant
2018/08/30 04:04:56 ui: ==> magma-developer-virtualbox: Running post-processor: vagrant-cloud
2018/08/30 04:04:56 [INFO] (telemetry) Starting post-processor vagrant-cloud
2018/08/30 04:04:56 ui: ==> magma-developer-virtualbox (vagrant-cloud): Verifying box is accessible: lavabit/magma-developer
2018/08/30 04:04:56 packer: 2018/08/30 04:04:56 Post-Processor Vagrant Cloud API GET: https://vagrantcloud.com/api/v1/box/lavabit/magma-developer?access_token=ACCESS_TOKEN
2018/08/30 04:04:59 packer: 2018/08/30 04:04:59 Post-Processor Vagrant Cloud API Response:
2018/08/30 04:04:59 packer:
2018/08/30 04:04:59 packer: &{Status:200 OK StatusCode:200 Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Connection:[keep-alive] X-Content-Type-Options:[nosniff] Content-Type:[application/json; charset=utf-8] X-Request-Id:[abcdefghijklmnopqrstuvwxyz] X-Runtime:[2.036468] Date:[Thu, 30 Aug 2018 09:04:58 GMT] X-Frame-Options:[SAMEORIGIN] Strict-Transport-Security:[max-age=31536000; includeSubDomains; preload] Via:[1.1 vegur] Server:[Cowboy] X-Xss-Protection:[1; mode=block] Etag:[W/"abcdefghijklmnopqrstuvwxyz"] Cache-Control:[max-age=0, private, must-revalidate] Set-Cookie:[_atlas_session_data=abcdefghijklmnopqrstuvwxyz; path=/]] Body:0xc42015b880 ContentLength:-1 TransferEncoding:[chunked] Close:false Uncompressed:false Trailer:map[] Request:0xc420172100 TLS:0xabcdefghijklmnopqrstuvwxyz}
2018/08/30 04:04:59 ui: magma-developer-virtualbox (vagrant-cloud): Box accessible and matches tag
2018/08/30 04:04:59 ui: ==> magma-developer-virtualbox (vagrant-cloud): Creating version: 1.8.28
2018/08/30 04:04:59 ui: magma-developer-virtualbox (vagrant-cloud): Version exists, skipping creation
2018/08/30 04:04:59 ui: ==> magma-developer-virtualbox (vagrant-cloud): Creating provider: virtualbox
2018/08/30 04:04:59 packer: 2018/08/30 04:04:59 Post-Processor Vagrant Cloud API POST: https://vagrantcloud.com/api/v1/box/lavabit/magma-developer/version/1.8.28/providers?access_token=ACCESS_TOKEN.
2018/08/30 04:04:59 packer:
2018/08/30 04:04:59 packer: Body: {"provider":{"name":"virtualbox"}}
2018/08/30 04:05:00 packer: 2018/08/30 04:05:00 Post-Processor Vagrant Cloud API Response:
2018/08/30 04:05:00 packer:
2018/08/30 04:05:00 packer: &{Status:200 OK StatusCode:200 Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Content-Type:[application/json; charset=utf-8] Etag:[W/"abcdefghijklmnopqrstuvwxyz"] X-Request-Id:[abcdefghijklmnopqrstuvwxyz] X-Runtime:[0.772967] Server:[Cowboy] X-Frame-Options:[SAMEORIGIN] X-Content-Type-Options:[nosniff] Set-Cookie:[_atlas_session_data=abcdefghijklmnopqrstuvwxyz; path=/] Date:[Thu, 30 Aug 2018 09:04:59 GMT] Connection:[keep-alive] X-Xss-Protection:[1; mode=block] Strict-Transport-Security:[max-age=31536000; includeSubDomains; preload] Cache-Control:[max-age=0, private, must-revalidate] Via:[1.1 vegur]] Body:0xc42015bcc0 ContentLength:-1 TransferEncoding:[chunked] Close:false Uncompressed:false Trailer:map[] Request:0xc420172200 TLS:0xabcdefghijklmnopqrstuvwxyz}
2018/08/30 04:05:00 ui: ==> magma-developer-virtualbox (vagrant-cloud): Preparing upload of box: output/magma-developer-virtualbox-1.8.28.box
2018/08/30 04:05:00 packer: 2018/08/30 04:05:00 Post-Processor Vagrant Cloud API GET: https://vagrantcloud.com/api/v1/box/lavabit/magma-developer/version/1.8.28/provider/virtualbox/upload?access_token=ACCESS_TOKEN
2018/08/30 04:05:00 packer: 2018/08/30 04:05:00 Post-Processor Vagrant Cloud API Response:
2018/08/30 04:05:00 packer:
2018/08/30 04:05:00 packer: &{Status:200 OK StatusCode:200 Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Connection:[keep-alive] X-Frame-Options:[SAMEORIGIN] X-Runtime:[0.563452] Server:[Cowboy] Strict-Transport-Security:[max-age=31536000; includeSubDomains; preload] Via:[1.1 vegur] Date:[Thu, 30 Aug 2018 09:05:00 GMT] X-Xss-Protection:[1; mode=block] Cache-Control:[max-age=0, private, must-revalidate] Set-Cookie:[_atlas_session_data=abcdefghijklmnopqrstuvwxyz; path=/] X-Content-Type-Options:[nosniff] Content-Type:[application/json; charset=utf-8] Etag:[W/"abcdefghijklmnopqrstuvwxyz"] X-Request-Id:[abcdefghijklmnopqrstuvwxyz]] Body:0xc420062340 ContentLength:-1 TransferEncoding:[chunked] Close:false Uncompressed:false Trailer:map[] Request:0xc4201b8500 TLS:0xabcdefghijklmnopqrstuvwxyz}
2018/08/30 04:05:00 ui: ==> magma-developer-virtualbox (vagrant-cloud): Uploading box: output/magma-developer-virtualbox-1.8.28.box
2018/08/30 04:05:00 ui: magma-developer-virtualbox (vagrant-cloud): Depending on your internet connection and the size of the box,
magma-developer-virtualbox (vagrant-cloud): this may take some time
2018/08/30 04:05:00 ui: magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 1
2018/08/30 04:05:00 packer: 2018/08/30 04:05:00 Post-Processor Vagrant Cloud API Upload: output/magma-developer-virtualbox-1.8.28.box https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz
2018/08/30 04:06:37 packer: 2018/08/30 04:06:37 Post-Processor Vagrant Cloud Upload Response:
2018/08/30 04:06:37 packer:
2018/08/30 04:06:37 packer: <nil>
2018/08/30 04:06:37 ui: magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Error: Put https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz: EOF
2018/08/30 04:06:47 ui: magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 2
2018/08/30 04:06:47 packer: 2018/08/30 04:06:47 Post-Processor Vagrant Cloud API Upload: output/magma-developer-virtualbox-1.8.28.box https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz
2018/08/30 04:08:57 packer: 2018/08/30 04:08:57 Post-Processor Vagrant Cloud Upload Response:
2018/08/30 04:08:57 packer:
2018/08/30 04:08:57 packer: <nil>
2018/08/30 04:08:57 ui: magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Error: Put https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz: EOF
2018/08/30 04:09:17 ui: magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 3
2018/08/30 04:09:17 packer: 2018/08/30 04:09:17 Post-Processor Vagrant Cloud API Upload: output/magma-developer-virtualbox-1.8.28.box https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz
2018/08/30 04:10:54 packer: 2018/08/30 04:10:54 Post-Processor Vagrant Cloud Upload Response:
2018/08/30 04:10:54 packer:
2018/08/30 04:10:54 packer: <nil>
2018/08/30 04:10:54 ui: magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Error: Put https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz: EOF
2018/08/30 04:11:34 ui: magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 4
2018/08/30 04:11:34 packer: 2018/08/30 04:11:34 Post-Processor Vagrant Cloud API Upload: output/magma-developer-virtualbox-1.8.28.box https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz
2018/08/30 04:13:13 packer: 2018/08/30 04:13:13 Post-Processor Vagrant Cloud Upload Response:
2018/08/30 04:13:13 packer:
2018/08/30 04:13:13 packer: <nil>
2018/08/30 04:13:13 ui: magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Error: Put https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz: EOF
2018/08/30 04:14:33 ui: magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 5
2018/08/30 04:14:33 packer: 2018/08/30 04:14:33 Post-Processor Vagrant Cloud API Upload: output/magma-developer-virtualbox-1.8.28.box https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz
2018/08/30 04:16:10 packer: 2018/08/30 04:16:10 Post-Processor Vagrant Cloud Upload Response:
2018/08/30 04:16:10 packer:
2018/08/30 04:16:10 packer: <nil>
2018/08/30 04:16:10 ui: magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Error: Put https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz: EOF
2018/08/30 04:18:50 ui: magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 6
2018/08/30 04:18:50 packer: 2018/08/30 04:18:50 Post-Processor Vagrant Cloud API Upload: output/magma-developer-virtualbox-1.8.28.box https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz
2018/08/30 04:20:25 packer: 2018/08/30 04:20:25 Post-Processor Vagrant Cloud Upload Response:
2018/08/30 04:20:25 packer:
2018/08/30 04:20:25 packer: <nil>
2018/08/30 04:20:25 ui: magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Error: Put https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz: EOF
2018/08/30 04:25:25 ui: magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 7
2018/08/30 04:25:25 packer: 2018/08/30 04:25:25 Post-Processor Vagrant Cloud API Upload: output/magma-developer-virtualbox-1.8.28.box https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz
2018/08/30 04:25:26 packer: 2018/08/30 04:25:26 Post-Processor Vagrant Cloud Upload Response:
2018/08/30 04:25:26 packer:
2018/08/30 04:25:26 packer: &{Status:503 Service Unavailable StatusCode:503 Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Connection:[keep-alive] Server:[Cowboy] Date:[Thu, 30 Aug 2018 09:25:25 GMT] Content-Length:[506] Content-Type:[text/html; charset=utf-8] Cache-Control:[no-cache, no-store]] Body:0xc420164500 ContentLength:506 TransferEncoding:[] Close:false Uncompressed:false Trailer:map[] Request:0xc420172100 TLS:0xabcdefghijklmnopqrstuvwxyz}
2018/08/30 04:25:26 packer: 2018/08/30 04:25:26 bad HTTP status: 503
2018/08/30 04:25:26 ui: magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Status: 503
2018/08/30 04:30:26 ui: magma-developer-virtualbox (vagrant-cloud): Uploading box, attempt 8
2018/08/30 04:30:26 packer: 2018/08/30 04:30:26 Post-Processor Vagrant Cloud API Upload: output/magma-developer-virtualbox-1.8.28.box https://archivist.vagrantup.com/v1/object/abcdefghijklmnopqrstuvwxyz
2018/08/30 04:30:26 packer: 2018/08/30 04:30:26 Post-Processor Vagrant Cloud Upload Response:
2018/08/30 04:30:26 packer:
2018/08/30 04:30:26 packer: &{Status:503 Service Unavailable StatusCode:503 Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Connection:[keep-alive] Server:[Cowboy] Date:[Thu, 30 Aug 2018 09:30:25 GMT] Content-Length:[506] Content-Type:[text/html; charset=utf-8] Cache-Control:[no-cache, no-store]] Body:0xc4201644c0 ContentLength:506 TransferEncoding:[] Close:false Uncompressed:false Trailer:map[] Request:0xc420172200 TLS:0xabcdefghijklmnopqrstuvwxyz}
2018/08/30 04:30:26 packer: 2018/08/30 04:30:26 bad HTTP status: 503
2018/08/30 04:30:26 ui: magma-developer-virtualbox (vagrant-cloud): Error uploading box! Will retry in 10 seconds. Status: 503
2018/08/30 04:35:26 ui: ==> magma-developer-virtualbox (vagrant-cloud): Cleaning up provider
2018/08/30 04:35:26 ui: magma-developer-virtualbox (vagrant-cloud): Deleting provider: virtualbox
2018/08/30 04:35:26 packer: 2018/08/30 04:35:26 Post-Processor Vagrant Cloud API DELETE: https://vagrantcloud.com/api/v1/box/lavabit/magma-developer/version/1.8.28/provider/virtualbox?access_token=ACCESS_TOKEN
2018/08/30 04:35:27 packer: 2018/08/30 04:35:27 Post-Processor Vagrant Cloud API Response:
2018/08/30 04:35:27 packer:
2018/08/30 04:35:27 packer: &{Status:200 OK StatusCode:200 Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Strict-Transport-Security:[max-age=31536000; includeSubDomains; preload] Cache-Control:[max-age=0, private, must-revalidate] X-Request-Id:[abcdefghijklmnopqrstuvwxyz] X-Runtime:[0.810870] Server:[Cowboy] X-Content-Type-Options:[nosniff] Content-Type:[application/json; charset=utf-8] Etag:[W/"abcdefghijklmnopqrstuvwxyz"] Connection:[keep-alive] Set-Cookie:[_atlas_session_data=abcdefghijklmnopqrstuvwxyz; path=/] Via:[1.1 vegur] Date:[Thu, 30 Aug 2018 09:35:26 GMT] X-Frame-Options:[SAMEORIGIN] X-Xss-Protection:[1; mode=block]] Body:0xc42015a7c0 ContentLength:-1 TransferEncoding:[chunked] Close:false Uncompressed:false Trailer:map[] Request:0xc420172300 TLS:0xabcdefghijklmnopqrstuvwxyz}
2018/08/30 04:35:27 [INFO] (telemetry) ending vagrant-cloud
2018/08/30 04:35:27 Deleting original artifact for build 'magma-developer-virtualbox'
2018/08/30 04:35:27 ui error: Build 'magma-developer-virtualbox' errored: 1 error(s) occurred:
* Post-processor failed: Function never succeeded in Retry
2018/08/30 04:35:27 Waiting on builds to complete...
2018/08/30 04:35:27 Builds completed. Waiting on interrupt barrier...
2018/08/30 04:35:27 machine readable: error-count []string{"1"}
2018/08/30 04:35:27 ui error:
==> Some builds didn't complete successfully and had errors:
2018/08/30 04:35:27 machine readable: magma-developer-virtualbox,error []string{"1 error(s) occurred:\n\n* Post-processor failed: Function never succeeded in Retry"}
2018/08/30 04:35:27 ui error: --> magma-developer-virtualbox: 1 error(s) occurred:
* Post-processor failed: Function never succeeded in Retry
2018/08/30 04:35:27 ui:
==> Builds finished but no artifacts were created.
@double16 can you confirm that your seeing the same error/problem?
As you can see from my logs, adding additional retries with that sed find/replace, aka code tweak, will increase the likelihood an upload attempt will eventually make it through.
If your not setup to compile packer yourself, I can point you to a script I wrote which will do for you automatically, or I post binaries here.
@ladar Sorry for the delay, had issues rebuilding the box. Recent packer (1.3.2-dev) with your sed
script shows the same as above. The response is <nil>
for a few times, then 503
.
https://gist.github.com/double16/28919b72b8840803fe69bc24bbe456e8#file-packer2-log
@double16 after getting the 503... if you wait a reasonable period, say an hour, is the box available from the Vagrant Cloud?
If not. Try the upload.sh
script posted below. Note it requires curl and jq, and assumes a *nix/bash platform. If you have that, just add your VAGRANT_CLOUD_TOKEN
to a file called .credentialsrc
, and run the script like so:
./upload.sh double16 linux-dev-workstation virtualbox 201809.1 box/virtualbox/linux-dev-workstation-201809.1.box
If you have the time, record the script execution as an animated GIF and post it here. I've been meaning to do that, but haven't found the time. I believe the duration of post upload pause would be illuminating, as I believe it is that pause which is causing the problem.
#!/bin/bash
ORG="$1"
NAME="$2"
PROVIDER="$3"
VERSION="$4"
FILE="$5"
CURL=curl
# Cross platform scripting directory plus munchie madness.
pushd `dirname $0` > /dev/null
BASE=`pwd -P`VAGRANT_CLOUD_TOKEN
popd > /dev/null
# The jq tool is needed to parse JSON responses.
if [ ! -f /usr/bin/jq ]; then
tput setaf 1; printf "\n\nThe 'jq' utility is not installed.\n\n\n"; tput sgr0
exit 1
fi
# Ensure the credentials file is available.
if [ -f $BASE/.credentialsrc ]; then
source $BASE/.credentialsrc
else
tput setaf 1; printf "\nError. The credentials file is missing.\n\n"; tput sgr0
exit 2
fi
if [ -z ${VAGRANT_CLOUD_TOKEN} ]; then
tput setaf 1; printf "\nError. The vagrant cloud token is missing. Add it to the credentials file.\n\n"; tput sgr0
ORG="$1"
NAME="$2"
PROVIDER="$3"
VERSION="$4"
FILE="$5"
CURL=/opt/vagrant/embedded/bin/curl
LD_PRELOAD="/opt/vagrant/embedded/lib/libcrypto.so:/opt/vagrant/embedded/lib/libssl.so"
# Cross platform scripting directory plus munchie madness.
pushd `dirname $0` > /dev/null
BASE=`pwd -P`
popd > /dev/null
# The jq tool is needed to parse JSON responses.
if [ ! -f /usr/bin/jq ]; then
tput setaf 1; printf "\n\nThe 'jq' utility is not installed.\n\n\n"; tput sgr0
exit 1
fi
# Ensure the credentials file is available.
if [ -f $BASE/../../.credentialsrc ]; then
source $BASE/../../.credentialsrc
else
tput setaf 1; printf "\nError. The credentials file is missing.\n\n"; tput sgr0
exit 2
fi
if [ -z ${VAGRANT_CLOUD_TOKEN} ]; then
tput setaf 1; printf "\nError. The vagrant cloud token is missing. Add it to the credentials file.\n\n"; tput sgr0
exit 2
fi
printf "\n\n"
# Assume the position, while you create the version.
${CURL} \
--tlsv1.2 \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $VAGRANT_CLOUD_TOKEN" \
"https://app.vagrantup.com/api/v1/box/$ORG/$NAME/versions" \
--data "
{
\"version\": {
\"version\": \"$VERSION\",
\"description\": \"A build environment for use in cross platform development.\"
}
}
"
printf "\n\n"
# Create the provider, while become one with your inner child.
${CURL} \
--tlsv1.2 \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $VAGRANT_CLOUD_TOKEN" \
https://app.vagrantup.com/api/v1/box/$ORG/$NAME/version/$VERSION/providers \
--data "{ \"provider\": { \"name\": \"$PROVIDER\" } }"
printf "\n\n"
# Prepare an upload path, and then extract that upload path from the JSON
# response using the jq command.
UPLOAD_PATH=`${CURL} \
--tlsv1.2 \
--silent \
--header "Authorization: Bearer $VAGRANT_CLOUD_TOKEN" \
https://app.vagrantup.com/api/v1/box/$ORG/$NAME/version/$VERSION/provider/$PROVIDER/upload | jq -r .upload_path`
# Perform the upload, and see the bits boil.
${CURL} --tlsv1.2 --include --max-time 7200 --expect100-timeout 7200 --request PUT --output "$FILE.upload.log.txt" --upload-file "$FILE" "$UPLOAD_PATH"
printf "\n-----------------------------------------------------\n"
tput setaf 5
cat "$FILE.upload.log.txt"
tput sgr0
printf -- "-----------------------------------------------------\n\n"
# Release the version, and watch the party rage.
${CURL} \
--tlsv1.2 \
--silent \
--header "Authorization: Bearer $VAGRANT_CLOUD_TOKEN" \
https://app.vagrantup.com/api/v1/box/$ORG/$NAME/version/$VERSION/release \
--request PUT | jq '.status,.version,.providers[]' | grep -vE "hosted|hosted_token|original_url|created_at|updated_at|\}|\{"
printf "\n\n"
I'm using a patched 1.3.2 build. Looks like it deletes the provider if it fails. So no, the box didn't show up later. Attached is the recording using the script. (Note for myself next time, turn off transparency).
@double16 the GIF captures what I've been seeing. Should anybody investigate this further, note the "hang" at the end of the upload. In this case the pause was relatively short, but I've seen it last several minutes. I suspect when it stretches long enough (say more than 60 seconds), packer returns the "
I've also noticed that newer versions of packer seem to be deleting failed uploads (aka the 503 no longer indicates the box is already uploaded). I don't know if this is a packer, or a cloud change.
I captured more data on the issue. I believe the reason a 503 no longer results in the box being available is because packer
deletes box (technically it's deleting the provider) when the upload fails.
This screen shows the box available Vagrant Cloud website while packer is getting the 503 errors:
Yet, when packer
is done, and the browser is refreshed, the box is gone:
It's hard to see because of the zoom level, but at the end of the log, on the right side of the image, it says "Cleaning up provider" ... followed by "Deleting provider: libvirt" ... that logic must be new.
@SwampDragons can you ask the Vagrant Cloud programmers what is triggering the 503 errors?
This is a higher quality, accelerated GIF showing the upload process. Note the 1 min pause after the file is uploaded, before the server acknowledges. As I said previously, I think the timeouts are a result of that pause lasting more than 1 minute. While creating the upload.sh
script, I ran into this issue, because curl
times out at 60 seconds by default. The solution with curl
was adding --expect100-timeout 7200
... which fixed the issue. Perhaps we can increase the time packer is willing to wait for a response as well?
I managed to conduct a test, where I downloaded the box while packer was showing a 503 error, and compare the hashes with the local box file, and as suspected they matched. The question is, can packer
determine separate the malformed uploads which should be deleted, from 503 errors indicating a previous attempt worked. (Of course the 503 issue is moot if we can fix the timeout problem.)
$ curl --location --user-agent "${AGENT}" "https://app.vagrantup.com/lineageos/boxes/lineage/versions/1.8.38/providers/libvirt.box" | sha256sum ; sha256sum lineageos-libvirt-1.8.38.box
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 300 0 300 0 0 438 0 --:--:-- --:--:-- --:--:-- 438
100 470 100 470 0 0 492 0 --:--:-- --:--:-- --:--:-- 2447
100 1551M 100 1551M 0 0 32.1M 0 0:00:48 0:00:48 --:--:-- 33.5M
49101f52e4708b4b18534b725186a07989094a4f8329f6c09d8fa980c8991f18 -
49101f52e4708b4b18534b725186a07989094a4f8329f6c09d8fa980c8991f18 lineageos-libvirt-1.8.38.box
$ curl --location --user-agent "${AGENT}" "https://app.vagrantup.com/lineageos/boxes/lineage/versions/1.8.38/providers/libvirt.box" | sha256sum ; sha256sum lineageos-libvirt-1.8.38.box
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 40 0 40 0 0 112 0 --:--:-- --:--:-- --:--:-- 112
a50aa56a19be12e2329e0b217e25d33a0eb81c206c31c6949f32a7214f75456c -
49101f52e4708b4b18534b725186a07989094a4f8329f6c09d8fa980c8991f18 lineageos-libvirt-1.8.38.box
I finally gave up looking for a packer based fix, and removed the post processor upload step. I now handle the uploads using a curl
script (similar to the one above). It's more reliable, and I can ensure the boxes are uploaded one at a time, even when I have 5+ robots building boxes.
If anyone on the Hashicorp sides wants to troubleshoot this issue let me know. I can trigger the failures at will. All I have to do is use my shell script to start uploading boxes in parallel. The more boxes in parallel, and the larger the box files, the easier it is to trigger the bug, suggesting a bottleneck on the backend. I've even found that if I upload large boxes sequentially, over a fast enough connection (gigabit speeds), I'll trigger the bug.
Sorry for dropping the ball on this one; I do intend to get back to it but I'm glad you have a workaround in the meantime.
@SwampDragons I was trying to say the upload errors appear to be an issue with the Vagrant Cloud backend. You might be able to find workaround for the timeout issue, but ultimately, if the boxes are too large, or there are too many simultaneous uploads, the cloud will have problems. If anybody at Hashicorp wants to troubleshoot, have them reach out to me. As I said, I can trigger the issue at will.
That said, as @double16 mentioned, it appears that packer will delete any uploaded artifact from the cloud if a 503 error is returned. I think you should revert this change. As I've mentioned, it seems that when a 503 error is returned following an upload attempt, the uploaded artifact usually becomes becomes available for download after a few minutes.
If you were to make any change, it should be retrying an upload until you succeed, or get 503. If a 503 is detected, abort the upload with a warning message saying the upload couldn't be verified, but may become available in a few minutes.
I've never seen a 503 returned, and then seen packer successfully upload a box. I presume because it exists already.
The real question, to me, is whether a 503 should also cause the release step to be skipped?
I should have said recent packer version delete upon getting 503. The old logic, where the artifact was left intact, was better.
If your concern is corrupted uploads, then perhaps a new post processor which downloads and then verifies via a hash, the integrity of the upload would be ideal. Even better if the Vagrant Cloud API offered a way of checking a box hash without downloading it (not aware of a way to do this currently).
If downloading seems extreme, and a has can't be retrieved via the API, a less extreme option would be comparing the size of the uploaded object.
I'll take a look at better handling for the 503 when I get back in January.
Hello ! I will try to see what I can do to help here.
So first, I've read that bumping the upload timeout seems to help; so I have set the vagrant client timeout to 120min, like you suggested @ladar. Here's a bunch of binaries containing the patch, can you please tell me if that helps ?
packer_linux_amd64.zip packer_windows_amd64.zip packer_darwin_amd64.zip
The Cleanup part of the vagrant build hasn't changed in ~4 years, so this is probably a change Vagrant side. From your comments my wild guess is that there a single task s3 uploader and that in moments of rush it takes more time than usual for it to catch up.
Hi @azr ... I've played around with the timeouts rather extensively. As I left things, I was using this patch:
https://github.com/lavabit/robox/blob/master/res/providers/packer-upload-timeout.patch
I also used variations of this sed
command to tweak the retry count/delay:
sed -i -e "s/common.Retry(10, 10, 3/common.Retry(10, 300, 10/g" \
post-processor/vagrant-cloud/step_upload.go
The version above retried uploads 10 times, increasing the delay between attempts, from 10 seconds up to 5 minutes. I think 5 minutes was the maximum allowed by the retry class, but it's been a _very_ long time since I looked at that code. Either way, the print statement doesn't print the actual delay (minor bug, that doesn't matter when the delay is fixed at 10 seconds). I believe tweaking code to use a minimum delay of (10 seconds), and grow to a max of say 60 seconds might be a good thing to try.
I believe you're correct, there is a bottleneck on the server side. I don't think casual users will experience it, but when I 3+ robots building and uploading boxes, the odds of 2+ robots trying to upload large box files at the same time was high enough to cause major delays/problems.
I found that my robots would get stuck for 30+ minutes (re)-trying to upload box files. About 50% time, the box would eventually get through, (the percentages dropped dramatically the more robots I used). This 50% was using the higher timeouts, longer delays, and 10 or 20 retries. The failed boxes would keep failing with a 4xx error, and retry, or get 503, and simply fail. I never saw packer
get a 503, and then succeed. I believe the 503 error was the server's way of saying it had the box, and didn't need another copy. Early versions of packer
ignored 503 errors, and based on my testing, I'd see the box available via the Vagrant Cloud after 10 to 20 minutes (depending on box size, load, etc). At some point the logic was changed so packer
started deleting uploads when it hit a 503, forcing me to upload the box again, manually.
This started happening often enough that I eventually gave up on using packer
to upload the boxes. I ended up writing a script bash
, jq
, and curl
to handle uploads. I'm still improving that script, but I've gotten better results using this strategy, particularly if I throttle the number of simultaneous
When using curl
what I found was that boxes would upload, but if the server was under load, curl would return indicating an HTTP 100 code. I believe this was misleading. Based on my research, I think what happens is curl gets an HTTP 100 response when it starts the upload, and if it doesn't get a response quick enough after the upload, it uses that "continue" response code as the result. I never figured out how to force curl
to wait longer. I haven't figured out if curl
is closing the connection despite my command line timeout value, or if the server is unilaterally closing the connection, and curl
is simply using the last response it got (the 100) as the result.
Long story short, I can trigger the issue at will. If someone at HashiCorp would like to coordinate with me, I can setup a time to upload box files, so they can try debug things on the server.
What I found was that when I had 3-6 boxes uploading at once, curl
would start returning 100 codes, but the boxes would still be available for download a few seconds later. Unfortunately the only way to saturate my gigabit lines was kick off 6+ uploads at a time, but if I did, and the box files were large enough, the uploads would eventually get 5xx errors (I think it was a 523, but don't recall precisely). I eventually found it best if I started the upload process manually, and only uploaded 1-2 boxes at a time. Of course this method means that uploads take 6-24 hours, but if I'm patient, and don't try to rush it (by kicking off too many simultaneous uploads), it did work reliably, albeit slow.
I plan to create a meta upload function, using the parallel
command, which should allow me to control the number of simultaneous uploads with more precision, but haven't done it yet.
I've rambled on long enough. Let me know if you need more info.
I've nudged upstream folks and I guess they've got a repro but need to find time to dig in, so we're gonna have to bump this again. Sorry, all.
I just finished uploading v1.9.4. Uploading all fo the boxes single file took ~8 hours, and out of the 453 box uploads, 31 retries had to be performed.
A retry means resending the file, so it's a lot of wasted bandwidth. More importantly, the inability to upload boxes in parallel hurts. When I run 8 upload jobs at a time, the overall speed goes up. Based on my experiments, I'd estimate the overall could be cut to under to ~2 or ~3 hours, if it worked. Unfortunately the errors/retries goes way way up, with some boxes failing to succeed even after 10 retries, so I try to keep the job number limited. After several failed attempts, I did this release single file, and it finally worked.
If anybody is curious, here is a gist with the upload script output. It shows the HTTP error codes, frequency of errors, curl
error output (aka curl: (52) Empty reply from server
) and size of the boxes that failed.
So I ran @ladar's script above (because vagrant cloud started failing again for me), and this is what I got from my curl response when it uploaded the vagrant box.
curl: (56) GnuTLS recv error (-110): The TLS connection was non-properly terminated.
This probably "failed", but it mostly actually succeeded and did not get closed properly.
don't know if that helps at all or not.
for a while vagrant cloud was uploading my big kali boxes (~4.2G) no problem, but within the past week it stopped uploading properly, you can check my ci logs when it started happening for me here: https://github.com/elreydetoda/packer-kali_linux/issues/10#issuecomment-507760281
@elreydetoda you might want to try using a version of curl
built against OpenSSL. Based on your logs, it appears you are installing vagrant
, which contains an embedded version of curl
built using OpenSSL. On my systems, it's available at:
/opt/vagrant/embedded/bin/curl
I've been tweaking my upload script, and gotten it to work reliably. My current version automatically retries failed uploads till it works. But if I kickoff more than 1 upload at a time, it inevitably runs into errors and has to reupload the box. The more jobs I kickoff, the more retries, so I've resorted to single file uploads (a pain when you have 478 boxes to upload).
The latest version of my upload script is here. Unlike the version above though, this script parses the box params from the file name, and looks for a .credentialsrc
file the robox
repo root. It should be easily adaptable though.
@ladar, thank you so much! I really appreciate it! I will dig into it, and also you can pick pieces from my adaptation if you want. I added a help function in case people just try to execute it, and don't read the code, and a create new box section if the box hasn't been created before. Going to merge it to master for my repo soon, but still working on integrating it fully.
@elreydetoda your welcome, but it looks like you based your uploader off my upload.sh
script, which is the noisy version, and slightly outdated version.
All of my recent improvements are in the silent.sh
script, it only prints a message when the upload is finished, (hence silent), it doesn't release the box after uploading, and most importantly, auto-retries the file upload step if it fails aka doesn't result in an HTTP 200 code. There may be a few other tweaks (like using the newer curl
version from vagrant
if available, etc).
If desired, there is separate release.sh
script which releases the version (I run that when all the providers are finished uploading), and a delete.sh
script, which will delete the box version entirely. I create the box via the web, since in my case, that's a relatively rare event
Just to tell you I had the same problem on my loops. I tried everything to solve, using the curl
from Vagrant, retrying 10 times, Keep-Alive and none of them solved. My loops perform a test of the box right after it get uploaded, and if the test succeed (vagrant box add
, init
and up
) then I finally release. Otherwise, I delete the version.
The thing is... ignoring the fail on upload causes my test to fail, aparently the box isn't available for download. But I noticed that after some minutes of the fail on the upload, the box appear on Vagrant Cloud, so I just added a sleep
after this process and everything goes green now.
Check what I'm saying here:
https://travis-ci.com/felipecassiors/ubuntu1804-4dev/builds/142033043#L6536
https://github.com/felipecassiors/ubuntu1804-4dev/blob/v1.0.0/scripts/deploy.sh#L104
Most helpful comment
I finally gave up looking for a packer based fix, and removed the post processor upload step. I now handle the uploads using a
curl
script (similar to the one above). It's more reliable, and I can ensure the boxes are uploaded one at a time, even when I have 5+ robots building boxes.If anyone on the Hashicorp sides wants to troubleshoot this issue let me know. I can trigger the failures at will. All I have to do is use my shell script to start uploading boxes in parallel. The more boxes in parallel, and the larger the box files, the easier it is to trigger the bug, suggesting a bottleneck on the backend. I've even found that if I upload large boxes sequentially, over a fast enough connection (gigabit speeds), I'll trigger the bug.