Packer version: 1.2.5
Host platform: Centos 7 (though I'm pretty sure it doesn't matter)
The _simplest example template: the most basic vacuous image for amazon-ebs builder you can conceive of, but behind a brutal proxy
Problem statement: packer does not seem to pay heed to any of the {http, https}_proxy or {HTTP, HTTPS}_PROXY environment variables when it comes to generating an AWS EC2 AMI. In exemplar form:
Given I am running packer in EC2/VPC with no igw egress to the public internet
And I have set the gamut of http(s)_proxy/i variables to a proxy to gain access to the public internet
When I try to build an AMI with packer
Then packer 1.2.5 hangs as it tries to access the public EC2 service endpoint (to do anything)
More background:
We did a good bit of archaeological digging, and I'm pretty sure this goes all the way back to the switch from goamz to aws-sdk-go (in late 2015?). v0.7.5 was the last version with goamz and going back to 0.3.3 there was a patch to goamz to pay heed to the proxy env vars. After the move to aws-sdk-go, can't find any code in the builder that defines the HTTPClient based upon http.ProxyFromEnvironment (looking in builder/amazon/common/access_config.go). v1 of aws-sdk-go looks like it requires an explicit call to define an HTTPClient with proxy.
(As an aside go-aws-sdk v2 seems to pay heed to these environment variables)
Even if it's dirt simple, a template will help us reproduce, as will debug logs (set the env var PACKER_LOG=1)
I'll have to get permission on the logs but... I will reproduce a simple template in a few
i can confirm this is the case that packer is not honoring the http_proxy
, https_proxy
, HTTP_PROXY
, nor HTTPS_PROXY
vars when exported in environment. additionally, running packer build -var 'https_proxy=https://domain.name'
does not work either.
if you are using a proxy to access aws endpoints, most likely you will not be able to publicly release log files.
this can be reproduced by requiring proxy access to hit aws endpoints, and then implementing this proxy via env vars.
Facing same situation here...
I just did some digging around in the aws sdk.
Here's where we obtain the session: https://github.com/hashicorp/packer/blob/master/builder/amazon/common/access_config.go#L71
and if you follow the trail of breadcrumbs here's where aws grabs the default config:
https://github.com/aws/aws-sdk-go/blob/586c9ba6027a527800564282bb843d7e6e7985c9/aws/defaults/defaults.go#L58
If you drill down, it's using the net/http library's DefaultClient, which, according to the docs, uses DefaultTransport, which according to that same page does use ProxyFromEnvironment.
DefaultTransport is the default implementation of Transport and is used by DefaultClient. It establishes network connections as needed and caches them for reuse by subsequent calls. It uses HTTP proxies as directed by the $HTTP_PROXY and $NO_PROXY (or $http_proxy and $no_proxy) environment variables.
It's entirely possible I've read this code wrong -- where did you get the impression that v1 of the sdk required an explicit call? I can always make a patch to access_config with the WithHTTPClient(http.DefaultClient)
call if you're willing to test whether that'll fix things for you.
Otherwise, I'm afraid the SDK may be a false lead. Which begs the question of why this isn't actually working for you. I'll try to get a repro in the morning.
yea this issue may be trickier than i originally thought. i was not able to reproduce the issue with a vanilla centos7 behind a very simple implementation of "tinyproxy".
In the production deployment, I am using a "hardened" centos7 and i'm behind a brutal heavy duty forward proxy and multiple firewalls. I am not in a position to export those logs or to run that particular hardened centos7 behind my simple tiny proxy to do some better testing.
All that said - when I patch the packer code to FORCE the client to use ProxyFromEnvironment, it DOES indeed work on that hardened centos7 behind the brutal proxy.....
So.... at this point I'm leaning more toward something funky in a system library or os configuration that is interfering with the uptake of the proxy env vars into the http client.....
Is there some targeted debug/print statements I could inject into my patched packer running in the production environment that might help get to the bottom of what is going on?
Yeah, we can figure out some logs for you to patch into your Packer. I know you can't share full logs, but is there a chance you can share the particular error you're receiving, and maybe the last log line before that error is occurring, etc? That may help us narrow down where in the builder to focus our efforts.
It would also help if you can run with the -debug
command line flag and let me know what "step" it's failing at.
@SwampDragons any updates on this? I'm trying to deploy the packer build of an AMI internally but our entire company is behind a proxy :/
I'm still waiting to get more information from users experiencing this; with no logs or repro case my hands are tied.
@SwampDragons I followed the instructions to give a basic trace, but I don't have the aforementioned patches to put into packer itself to help you with it. Is it just a basic trace with PACKER_LOG=1 that you need or something with a patch and a trace? If so I can give you what I have.
I haven't made any patches yet because I don't know where the problem is occurring with enough detail to write them yet. I need the basic trace.
@SwampDragons Here is the basic trace. Removed some of the tcp addresses w/ "X.X.X.X" for a bit less identifiable info. All of our web traffic goes through a main corporate proxy, and we have an http_proxy and https_proxy requirement on most of our systems.
2018/09/05 14:37:09 ui: ==> amazon-ebs: Prevalidating AMI Name: packer-example 1536176229
==> amazon-ebs: Prevalidating AMI Name: packer-example 1536176229
2018/09/05 14:37:12 [ERR] Checkpoint error: Get https://checkpoint-api.hashicorp.com/v1/check/packer?arch=amd64&os=darwin&signature=704fccce-945e-867f-e8e8-8fefdc7d7783&version=1.2.5: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/09/05 14:37:12 packer: 2018/09/05 14:37:12 [ERR] Checkpoint error: Get https://checkpoint-api.hashicorp.com/v1/check/packer?arch=amd64&os=darwin&signature=704fccce-945e-867f-e8e8-8fefdc7d7783&version=1.2.5: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
==> amazon-ebs: Error querying AMI: RequestError: send request failed
2018/09/05 14:39:09 ui error: ==> amazon-ebs: Error querying AMI: RequestError: send request failed
==> amazon-ebs: caused by: Post https://ec2.us-east-2.amazonaws.com/: dial tcp X.X.X.X:443: i/o timeout
==> amazon-ebs: caused by: Post https://ec2.us-east-2.amazonaws.com/: dial tcp X.X.X.X:443: i/o timeout
2018/09/05 14:39:09 [INFO] (telemetry) ending amazon-ebs
2018/09/05 14:39:09 ui error: Build 'amazon-ebs' errored: Error querying AMI: RequestError: send request failed
caused by: Post https://ec2.us-east-2.amazonaws.com/: dial tcp X.X.X.X:443: i/o timeout
2018/09/05 14:39:09 Waiting on builds to complete...
2018/09/05 14:39:09 Builds completed. Waiting on interrupt barrier...
2018/09/05 14:39:09 machine readable: error-count []string{"1"}
2018/09/05 14:39:09 ui error:
==> Some builds didn't complete successfully and had errors:
2018/09/05 14:39:09 machine readable: amazon-ebs,error []string{"Error querying AMI: RequestError: send request failed\ncaused by: Post https://ec2.us-east-2.amazonaws.com/: dial tcp X.X.X.X:443: i/o timeout"}
Build 'amazon-ebs' errored: Error querying AMI: RequestError: send request failed
2018/09/05 14:39:09 ui error: --> amazon-ebs: Error querying AMI: RequestError: send request failed
caused by: Post https://ec2.us-east-2.amazonaws.com/: dial tcp X.X.X.X:443: i/o timeout
2018/09/05 14:39:09 ui:
==> Builds finished but no artifacts were created.
2018/09/05 14:39:09 [INFO] (telemetry) Finalizing.
caused by: Post https://ec2.us-east-2.amazonaws.com/: dial tcp X.X.X.X:443: i/o timeout
@triskadecaepyon if you can share... can you describe any salient info about your use of DNS and VPC routing and endpoints?
@triskadecaepyon actually one more pointed question - are you behind a blue coat proxy?
@erickascic Some updates here: I've been working with IT to figure out some issues with our VPN, and it turns out we had some special settings that were regional (I had to change my http_proxy and https_proxy settings to regional variants). I still have one remaining issue, but I'm unsure if it is because of the proxy or via my own misunderstanding of packer.
I now get to the point where the instance is made, but fails to connect through tcp:
2018/09/06 13:59:38 ui: ==> amazon-ebs: Waiting for SSH to become available...
==> amazon-ebs: Waiting for SSH to become available...
2018/09/06 13:59:53 packer: 2018/09/06 13:59:53 [DEBUG] TCP connection to SSH ip/port failed: dial tcp X.X.X.X:22: i/o timeout
2018/09/06 14:00:13 packer: 2018/09/06 14:00:13 [DEBUG] TCP connection to SSH ip/port failed: dial tcp X.X.X.X:22: i/o timeout
Tried both with and without the keys settings, and the local ssh agent override. Thoughts?
@triskadecaepyon do you have a bastion host? If so you may need to add some bastion-specific information to your ssh config. Check out https://www.packer.io/docs/templates/communicator.html#ssh-communicator.
@erickascic
All that said - when I patch the packer code to FORCE the client to use ProxyFromEnvironment, it DOES indeed work on that hardened centos7 behind the brutal proxy.....
Can you share your patch?
@SwampDragons We aren't using a bastion-style ssh, but I did try as many of the combinations I could, none of them I could get to work. I'll try it again at home without a proxy as a litmus test to my template and setup.
@erickascic
All that said - when I patch the packer code to FORCE the client to use ProxyFromEnvironment, it DOES indeed work on that hardened centos7 behind the brutal proxy.....
Can you share your patch?
@erickascic
Seconded... :+1:
Please do share the patch if possible -- we are encountering the same issues with our restrictive corporate proxy
Here is a patch against v1.3.1 that fixes the http proxy compatibility for the ebs-builder, that was cobbled together, where http.ProxyFromEnvironment is defined for ec2conn client in ebs/builder.go.
Tested by setting environment variable http_proxy=http://proxy_host
Note: I am not sure what the root cause is... Maybe this can help someone:
https://github.com/hashicorp/packer/compare/v1.3.1...n888:force_http_proxy_ebs_builder
Prior to the patch, it would hang on the following, and eventually timeout as it tried to connect to the AWS API without the http proxy.
2018/10/26 23:56:52 Running builder: amazon-ebs
2018/10/26 23:56:52 [INFO] (telemetry) Starting builder amazon-ebs
2018/10/26 23:56:52 packer: 2018/10/26 23:56:52 Found region us-west-2
2018/10/26 23:56:52 packer: 2018/10/26 23:56:52 [INFO] AWS Auth provider used: "EnvConfigCredentials"
2018/10/26 23:56:52 packer: 2018/10/26 23:56:52 [INFO] Finding AZ and VpcId for the given subnet 'subnet-cbbcb0ad'
Note: this patch only worked on v1.3.1. On v.1.3.2, the same patch hangs on (same behavior pre and post patch):
[INFO] AWS Auth provider used: "EnvConfigCredentials"
Edit: also had to do a similar change in: builder/amazon/common/step_create_tags.go
I seem to be having the same issue:
I did a dev build off of master and am getting the following
2018/12/07 10:48:59 packer: 2018/12/07 10:48:59 [INFO] AWS Auth provider used: "StaticProvider"
2018/12/07 10:51:57 [INFO] (telemetry) ending amazon-ebs
2018/12/07 10:51:57 ui error: Build 'amazon-ebs' errored: error validating regions: RequestError: send request failed
caused by: Post https://ec2.us-east-1.amazonaws.com/: dial tcp 54.239.29.8:443: connect: connection refused
2018/12/07 10:51:57 Waiting on builds to complete...
When i used the latest production binary build 1.3.2 i think it just hangs
We have authenticated proxy which does SSL decryption but i do have our root certs set and am setting AWS_CA_BUNDLE to point to updated certs
Also what are the ports packer uses 443 and 22
Anything else?
I tried applying the proxy fix to 1.3.1 but still not working.
Oddly enough the checkpoint fails but work with curl using our proxy settings
@tsporthd
is your environment variable http_proxy
set prior to running packer with the proxy fix?
you should also see "-ebsproxyfix" when passing version arg:
$ packer version
Packer v1.3.1-ebsproxyfix (ada697956)
I've opened PR #7226 with an extension of @n888's patch. I've attached patched builds of Packer to that PR; I'd really appreciate if any of you who have experienced this issue could test the patch out and verify that manually setting the proxyfromenvironment flag works.
@SwampDragons I'll give a test later this week and report back. Thanks for providing the test builds!
@SwampDragons here are the results (surprise, it worked!):
New version of packer in #7226 :
==> amazon-ebs: Waiting for instance (i-xxxxx) to become ready...
==> amazon-ebs: Using ssh communicator to connect: xx.xx.xxx.xxx
==> amazon-ebs: Waiting for SSH to become available...
==> amazon-ebs: Connected to SSH!
==> amazon-ebs: Provisioning with shell script: ./build_idp3_core.sh
amazon-ebs: PREFIX=/home/ubuntu/conda
amazon-ebs: installing: python-2.7.15-h1571d57_0 ...
amazon-ebs: Python 2.7.15 :: Anaconda, Inc.
It works! no more failing on the SSH connection!
Old version:
==> amazon-ebs: Waiting for instance (i-xxxxx) to become ready...
==> amazon-ebs: Waiting for SSH to become available...
==> amazon-ebs: Timeout waiting for SSH.
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: Cleaning up any extra volumes...
==> amazon-ebs: No volumes to clean up, skipping
==> amazon-ebs: Deleting temporary security group...
==> amazon-ebs: Deleting temporary keypair...
Build 'amazon-ebs' errored: Timeout waiting for SSH.
Notes:
Now just waiting until you merge that PR :)
HI folks,
while creating an image using packer, coming across this error, could you pls look into this why ssh connection is not getting established with the ubuntu-t2.micro isntance.
==> amazon-ebs: Adding tags to source instance
amazon-ebs: Adding tag: "Name": "Packer Builder"
amazon-ebs: Instance ID: i-00ff77978e8ce2ae1
==> amazon-ebs: Waiting for instance (i-00ff77978e8ce2ae1) to become ready...
==> amazon-ebs: Using ssh communicator to connect: 52.77.243.171
==> amazon-ebs: Waiting for SSH to become available...
==> amazon-ebs: Timeout waiting for SSH.
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: Error terminating instance, may still be around: RequestError: send request failed
==> amazon-ebs: caused by: Post https://ec2.ap-southeast-1.amazonaws.com/: dial tcp 52.95.15.29:443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
==> amazon-ebs: Cleaning up any extra volumes...
==> amazon-ebs: Error describing volumes: RequestError: send request failed
==> amazon-ebs: caused by: Post https://ec2.ap-southeast-1.amazonaws.com/: dial tcp 52.95.15.29:443: connectex: An established connection was aborted by the software in your host machine.
==> amazon-ebs: Deleting temporary security group...
==> amazon-ebs: Error cleaning up security group. Please delete the group manually: sg-0d96c856fd0de617711
==> amazon-ebs: Deleting temporary keypair...
Build 'amazon-ebs' errored: Timeout waiting for SSH.
==> Some builds didn't complete successfully and had errors:
--> amazon-ebs: Timeout waiting for SSH.
This issue was resolved 6 months ago; if you are having trouble establishing an SSH connection you should reach out to the mailing list or community page
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.