log: https://gist.github.com/andoriyu/8f9b6ca994fe6b3d73862a8f64966380#file-unset-vault_cacert
template: https://gist.github.com/andoriyu/8f9b6ca994fe6b3d73862a8f64966380#file-template-json
I forgot to export VAULT_CACERT
environment variable and it broke whole thing without explanation.
Looks like packer
doesn't check if any of the required variables for vault are set at all before trying.
Oof, sorry about that.
This one isn't as easy to solve as the other issue you opened because there may be a valid situation where there is no VAULT_CACERT set. Does this still hang infinitely for you if you set a VAULT_CLIENT_TIMEOUT? I'm wondering if we can always set that from inside packer to make this more likely to at least fail in a reasonable period of time.
@SwampDragons I haven't tried with VAULT_CLIENT_TIMEOUT. What interesting is that after ~30 minutes it actually does reach out to vault.
Shouldn't this be a quick error path because packer can't establish secure connection with vault?
I assumed so too, but on a quick read of the code it looks like the golang vault api bindings are using a pretty robust request retry wrapper, which I bet is the source of this hang. That retry wrapper looks like it'll keep retrying for a long time if the server response code is within the 500 range, unless the timeout has been set.
It's a little surprising to me that failing to set the CACERT would be returning a 5xx error.
I'll need to figure out how to set up an environment to directly reproduce this so I can verify that's what is going on here.
Well, it won't be a 500, this will be client not able to validate server certificate without private CA certificate?
Basically my vault has my own CA for it's certificates and clients required to have CA certificate to validate server's identity because server certificate signed by that CA.
Here is how curl
fails in this scenario:
* About to connect() to blahblah port 443 (#0)
* Trying x.x.x.x...
* Connected to blahblah (x.x.x.x) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* Server certificate:
* subject: OU=Vault,O=blahblah,L=Los Angeles,ST=CA,C=US
* start date: Dec 07 22:01:00 2018 GMT
* expire date: Dec 04 22:01:00 2028 GMT
* common name: (nil)
* issuer: CN=blahblah,L=Los Angeles,ST=CA,C=US
* NSS error -8179 (SEC_ERROR_UNKNOWN_ISSUER)
* Peer's Certificate issuer is not recognized.
* Closing connection 0
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html
curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.
However, I pass --cacert
it will work without issues.
Earlier I said that it works after 30 minutes, but I think I was wrong since error could have been swallowed by issue you fixed earlier.
I was finally able to scrabble together an appropriate dev environment to reproduce this, and I can happily confirm that with a build of master after having merged the fix for the other issue you opened, this one is resolved too. Instead of hanging for half an hour and then silently failing, Packer correctly errors with Error initializing core: template: root:1:3: executing "root" at <vault
/secret/hellofoo
>: error calling vault: Error reading vault secret: Get https://localhost:8200/v1/secret/hello: x509: certificate signed by unknown authority
The mechanism wasn't the one I initially thought it was, it was a retry built into Packer's template interpolation code that we weren't aborting properly if we got a legit error from a template func.
I'm going to close since the fix is merged. Thanks for reporting this and bearing with me as I worked through it.
@SwampDragons thank you for fixing this so quick!
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.