proposal: port forwarding via SSH tunnel
I'd like to start adding port forwarding via SSH tunnels to terraform.
This is useful when you want to use terraform with systems which are only accessible via a jump host, ie. company internal systems.
Right now terraform already ships with a bunch of providers which might need to
talk to internal systems (e.g. postgres/ mysql/ influxdb/…).
The status quo is to create a SSH tunnel beforehand, or, in cases where the
entire infrastructure is created from scratch, to be split terraform scripts into multiple stages with glue code outside.
E.g. one might setup a private cluster with a jump host, open an
SSH tunnel via bash, and then run a differen terraform script using the newly created
tunnel to access private systems, all wrapped in a single setup.sh
script.
Assuming that the SSH tunnel is required for all resources of a given provider,
I suggest adding connection settings to the terraform providers as well, like this:
provider "consul" {
address = "localhost:80"
datacenter = "nyc1"
# run "ssh -L localhost:80:demo.consul.io:80" for any resources of this provider
connection {
user = "private-user"
host = "private.jump-host.io"
forward {
remote_host = "demo.consul.io"
remote_port = 80
local_port = 80
}
}
}
# Access a key in Consul; consul is only available via SSH tunnel
resource "consul_keys" "app" {
key {
name = "ami"
path = "service/app/launch_ami"
default = "ami-1234"
}
}
Looking forward to any feedback before I head of adding something like this to terraform… ;)
Related: #4442, #4775
Hi @nicolai86! I'm certainly not opposed to this, though I'm not sure exactly what it would look like. Going to cc @phinze or @mitchellh here for a second opinion on this.
This sounds reasonable to me. My only comment so far would be to name it something like local_forward
to align it with the actual type of forwarding being done (-L
), and leave room in case we find a need for remote_forward
(-R
) later on.
This is an interesting approach. I have some feedback, but really just exploring the idea:
Given that the connection doesn't really "belong to" the provider, I wonder if we should hoist it out to the top level, and add some interpolation variables for it like this:
provider "consul" {
# expands to the local listen address and port that the "connection" created
address = "${connection.consul_tunnel.local_address}"
datacenter = "nyc1"
}
connection "consul_tunnel" {
type = "ssh"
user = "private-user"
host = "private.jump-host.io"
forward {
remote_host = "demo.consul.io"
remote_port = 80
local_port = 80
}
}
Presumably for real use the user would sometimes need to provide some credentials in the connection
block (either a private key or a password), so the ability to interpolate from variables would be useful to avoid hard-coding those credentials in the config.
It could also be nice to make the local port optional and have Terraform just allocate any arbitrary open port and expose it via the interpolation variable, so the user doesn't need to think about what port is likely to be open on all machines where Terraform might be run.
Wondering if maybe it would be more intuitive to invert the nesting, so that the forwarder is the primary object and it takes a connection as part of its configuration, similar to how resources and provisioners work:
port_forward "consul_tunnel" {
remote_host = "demo.consul.io"
remote_port = 80
connection {
# Now the connection block is the same as in other contexts, as long
# as the selected connection type supports port forwarding.
type = "ssh"
user = "private-user"
host = "private.jump-host.io"
}
}
provider "consul" {
address = "${port_forward.consul_tunnel.local_address}"
}
I think exposing the port forwarding as a primitive is a good idea in terms of reuse between multiple resources, and it might also help with code reuse given that the connection attribute already exists on resources. I'm also hoping for a clean integration into the execution graph.
It seems that the general theme is "this is a worthwhile addition" and the questions are mostly minor details. Since I have no idea at all about the terraform core internals I'll take a deep dive and report back in a couple of days…
@nicolai86 I would suggest giving @phinze and/or @mitchellh a chance to respond since they know Terraform (and its roadmap) the best and are likely to give more detailed feedback. Of course, that doesn't mean you can't dig in and start learning about Terraform core. :grinning:
don't worry, just want to start learning about terraform core internals. Did I sound like I will go of building yet? 😅
Reflecting on this a while later...
At work we took the approach of running Terraform on a host within the network it's being deployed to, and running it with an automation tool.
This has been working out really well for us:
HEAD
of our git repo, and so we don't need to expose SSH access to these Terraform machines, bastion or otherwise.So with all of that said, while it'd be _great_ to have a feature like what was proposed here in the long run so that Terraform can be flexible to run in a variety of different environments, in the short term I'd wholeheartedly recommend that folks consider this alternative approach which has worked out very well for us.
AFAIK such a setup is not possible with Atlas today, in which case I would also suggest that it would be a great feature to be able to use the Atlas UI to control "agents" running within a private network over a secure channel as an alternative to running Terraform on Hashicorp-run infrastructure, which would then enable the above configuration with Atlas as the orchestration tool.
I think running Terraform on a server within the VPC is a nice work around for this problem but it has a bootstrapping issue. Where does this server come from initially? Terraform. It means admitting that you have to split your infrastructure management and cannot stand the entire thing up with one run of Terraform.
I also have multiple VPCs that are managed from one Terraform source repository. Applying changes now involves connecting to multiple Terraform nodes and running the updates. And splitting the code out.
All of that is possible, and I can even automate with Fabric or Bash but I don't like adding more tools when Terraform is supposed to be the tool. Also I'm layering scripted automation on top of my very nice declarative automation which just makes me feel a little gross.
For me, I added the SSH tunnel step to a plan and apply shell wrapper for now.
Yes, it is the case that we had to bootstrap the environment from outside and that there is one Terraform config that requires custom effort to apply because it affects the deploy workers themselves. A temporary extra machine booted manually from the same AMI as the deploy workers addresses that problem, but I certainly won't claim that this is super convenient. It's just a compromise that we tolerate because we apply this configuration relatively infrequently compared to the others that deal with our applications themselves.
Hi guys,
just wanted to add my 5 cent and try to revive this topic.
From my perspective to move the tunnel out of the provider looks smart, but has a severe disadvantage.
If you have a remote exec or a file copy the ssh connection is closed after that, so nothing to clean.
It just simple exits. Even if terraform crashes. If you would implement a tunnel this way.
port_forward "consul_tunnel" {
remote_host = "demo.consul.io"
remote_port = 80
connection {
# Now the connection block is the same as in other contexts, as long
# as the selected connection type supports port forwarding.
type = "ssh"
user = "private-user"
host = "private.jump-host.io"
}
}
You need a destructor in the code that can also be triggered.
So from this logic extend the existing connection and add it to certain providers or ressources would be the more safer route to go.
Any progress on this? We have to open an SSH tunnel every time we run terraform as it manages our RDS instances that are private only.
This is a major blocker for us as well.
What we are thinking as a workaround, but of course doesn't help all, is to use a kubernetes job to run terraform plan/apply.
As it runs in the cluster, it has access to the private resources, and it's easy to run for all (using a web interface for kubernetes) without needing manually setup tunnels, credentials for those and all. And the idea is to use a remote tfstate on S3 (or something else).
I'll update if we have the time to go more on this path. But, of course, will only help people also running kubernetes clusters :)
i mostly just (right now) want to be able to provision a vm with docker and forward the docker.sock so that terraform can deploy containers onto it without having to set up tcp listener (because i won't want it later anyway.)
Any progress on this? It's almost a year now… The mentioned Terraform gurus were asked for an opinion but didn't reply. Is this issue abandoned?
Bastion hosts are quite common, and relying on external scripts to create an SSH tunnel before Terraform can operate sucks, makes the whole process way more complicated since there are more steps that you must remember of, makes your project far more difficult to maintain if you have multiple resources that require such feature (Redis, MySQL, ElasticSearch, Consul, …), and can be _very_ dangerous if you're working with multiple environments (it's kinda easy to launch terraform apply
on dev when you still have your tunnel pointing to production database, and vice versa). I definitely can't see why this issue is considered so low priority?
Hi @fquffio!
Before I respond I should explain that at the time of my last comments I was an outside open source contributor, but in the mean time I've become a Hashicorp employee working on Terraform.
It is not that this issue is considered _low_ priority, but rather that there are many issues that are all considered important. There remains design work to do to figure out exactly how this will work, and then non-trivial implementation work to get it actually done.
Believe me that I really want to see this feature too, and we'll get there. We're working through the feature request backlog as fast as we can while also keeping up with bug fixes, etc. I understand the frustration and I can only ask for continued patience.
At this time, my hope is to move forward with a configuration structure somewhat like the following, taken from my comment above:
port_forward "consul_tunnel" {
target_host = "demo.consul.io"
target_port = 80
connection {
# Now the connection block is the same as in other contexts, as long
# as the selected connection type supports port forwarding.
type = "ssh"
user = "private-user"
host = "private.jump-host.io"
}
}
provider "consul" {
address = "${port_forward.consul_tunnel.local_address}"
}
It'll take a little more prototyping to figure out the details of this, such as how we can wire the connection creation and shutdown into the graph, whether the existing connection
mechanism can be extended to support tunnels in this way, etc. We'll have more to say here when we are able to complete that prototyping.
I'm also interested in this and suggest something along these lines: using a connection block inside the provider:
provider "consul" {
address = "${aws_route53_record.elb_consul.fqdn}"
datacenter = "dc1"
connection {
type = "tunnel"
host = "${aws_instance.bastion_1.public_ip}"
port = "8500"
private_key = "${file("${var.local_ssh_key_path}")}"
user = "${var.ssh_user}"
}
}
While I like this approach I think it is sensible for long term, I have to wonder if it would not be easier to get bastion support as it exists today with aws_instance, and other resources added to resources like postgres_database, etc so that people can start using it today.
Either way, I'm a big +1 for supporting bastion hosts on more resources.
+1 I think this same pattern could be good for supporting VPN access to resources. Having the ssh tunnel be a resource which depends on other resources (Like the bastion instance for example) would solve any ordering issues on first run.
@apparentlymart it is time to fix this. You have been dancing around the issue for too long. Either fix it or close it but you have kept us waiting for too long.
@vmendoza That comment seems a little out of line for a free open source project. If you feel so strongly about it... dig in and write some code.
I would also request that if/when this is implemented there be a remote_command portion. I specifically want to forward a port to a service that I want to launch as I make the connection.
ssh -L my_port:target_port host some_service_providing_access_on_target_port
Hello everyone. I decided to try to tackle this myself by building a custom provider. And I'm happy to say that I'm quite pleased with the result. It works by declaring a data source (basically, what you want is a local port that you want to be forwarded somewhere via SSH).
While I am sure there are many things that can be improved, what is great about my solution is that it is usable right now.
I'd like to invite everyone who is having this issue to try it out. Here's the repository: https://github.com/stefansundin/terraform-provider-ssh
Please be careful and do not use in production quite yet. If it breaks something you can keep both pieces. :)
As always, suggestions for improvements are welcome! Thanks all!
@stefansundin nice!
but there is an issue: tunnel is not recreated on apply - stefansundin/terraform-provider-ssh#1
Hey @apparentlymart. I've been trying to figure out the issue that @jaymecd reported, but I couldn't find any good solution. Any chance you could take a quick look and say whether or not it is even solvable (or impossible as of right now). There is more info here: https://github.com/stefansundin/terraform-provider-ssh/issues/1
Thanks!
A simple solution to this and similar issues is to provide an option to use a local OpenSSH client binary instead of Go's native ssh implementation. This would allow us to use ProxyCommand to create whatever kind of tunneling we need. See #4523.
I think the developers of Docker Machine got this one right - they use the local 'ssh' binary if present and only fall back on the native Go crypto/ssh implementation when no binary is available (or is explicitly requested - see https://docs.docker.com/machine/reference/ssh/).
OpenSSH is ubiquitous and highly configurable - is there really any benefit in attempting to re-implement some of its features in Terraform?
Checking back in on this. Would love to see this someday!
Also, I can confirm the plugin works great as a stopgap measure until native support is added! Great work @stefansundin!
If you want to use standard tools like curl
with a SOCKS5h
proxy via SSH tunnel, then you are in luck!
I've found a working solution for docker containers to access services via socks5h://
. See my comment & diagram in issue: #17754
This works with local-exec
provider!
An example use case was that I needed to bootstrap Vault Server CA Certificates for use later in other Terraform resources. However, this Vault server was only accessible inside our secure VPC behind a bastion host. Additionally, it was a private hosted Route53 zone that is only resolvable from within the VPC, so the SOCKS5
+h type protocol was important for DNS to resolve!
For example:
# Note: 172.16.222.111 is the alias IP for the host laptop running terraform in docker container
# See diagram in issue comment for #17754 above for clarification!
resource "null_resource" "vault-web-ca" {
triggers {
id = "${uuid()}"
}
provisioner "local-exec" {
command = <<EOF
ALL_PROXY="socks5h://172.16.222.111:${var.socks_proxy_port}";
HTTP_PROXY="$${ALL_PROXY}";
HTTPS_PROXY="$${ALL_PROXY}";
export ALL_PROXY HTTP_PROXY HTTPS_PROXY;
echo '${data.aws_ssm_parameter.vault-ca-crt.value}' > /tmp/vault-ca.crt && \
sync && \
curl -s -k -o - https://vault-${var.env}.${local.private_dns_zone_name}/v1/ca/web/ca_chain > ${path.module}/generated/vault-web-ca-chain.crt && \
curl -s -k /tmp/vault-ca.crt -o - https://vault-${var.env}.${local.private_dns_zone_name}/v1/ca/web/ca/pem > ${path.module}/generated/vault-web-ca.crt && \
sync
EOF
}
}
# Now we can read in these generated cert files and use them later in Terraform
data "local_file" "vault-web-ca-chain" {
depends_on = ["null_resource.vault-web-ca"]
filename = "${path.module}/generated/vault-web-ca-chain.crt"
}
data "local_file" "vault-web-ca" {
depends_on = ["null_resource.vault-web-ca"]
filename = "${path.module}/generated/vault-web-ca.crt"
}
The current problem is that Terraform itself does not support socks5h://
. This is possibly due to an upstream bug in Golang regarding socks5h://
support in x/net/proxy
(golang/go#13454). If this is ever fixed, perhaps Terraform providers and code that uses standard x/net/proxy
library will _just work_!
Here's my workaround for this, in case it helps. I'm using Terragrunt so here's the terragrunt.hcl
, a hook is enought to keep it in the workflow without changing the habits.
Note, I tried various combination of nohup, (fork), ((double fork)), &, only screen did the trick.
include {
path = find_in_parent_folders()
}
terraform {
# source = "git::[email protected]:terraform-aws-modules/terraform-aws-rds.git//modules/db_instance?ref=v2.5.0"
source = "."
before_hook "open_tunnel_through_bastion" {
commands = ["plan", "apply", "show", "destroy"]
execute = ["screen", "-d", "-m", "ssh", "-L", "12345:${dependency.instance.outputs.this_db_instance_address}:${dependency.instance.outputs.this_db_instance_port}", dependency.bastion.outputs.hostname, "sleep", "60"]
}
}
dependency "bastion" {
config_path = "../../../bastion/"
mock_outputs = {
hostname = "localhost"
}
}
dependency "instance" {
config_path = "../../instance/"
mock_outputs = {
this_db_instance_address = "localhost"
this_db_instance_port = 12345
this_db_instance_username = "mockup_user"
}
}
inputs = {
host = "localhost"
port = "12345"
postgres_user = dependency.instance.outputs.this_db_instance_username
postgres_password = "REDACTED"
db_name = "REDACTED"
db_password = "REDACTED"
db_extensions = ["uuid-ossp", "pgcrypto"]
}
Note : at some point, don't remember why exactly, I had to move the execute command to a script and call it from the execute
Any progress here with this?
I'm wondering what the hold up on this is, found this through #4775 and it's nearly 4 years old. Doesn't similar code to accomplish this already exist in provisioners? Or are there other blockers?
That feature would help a lot for readability when Terraform is interacting with private cloud resources. I currently use null_resource.local-exec to spawn proxies through my bastion host, but it's definitely not clean code.
@rgarrigue Thanks for sharing your way, this looks clean I think I'll give it a try
@apparentlymart Do you know if this idea it still in sight ?
Also, has anyone thought it would be any good to consider forwarding ports to a kubernetes container directly in this feature ? Like using kubectl port-forward in the background.
Could be nice to use Terraform database, consul or vault providers, when they are running inside the Kubernetes.
This is still a possible future feature, but at this time nobody on the Terraform team at HashiCorp is working on this due to priorities being elsewhere.
In the meantime, we hear that users are employing some other strategies to address this problem within Terraform's current capabilities:
@apparentlymart the problem with that strategy is that it requires a VPN to already be set up before a terraform run, or for the connection to be set up during the run. That's fine if you already have previously existing infrastructure, but in my specific case I would like to be able to provision, and destroy, everything from scratch. That includes possible VPN servers, and using a managed VPN solution, while possible with some local-exec magic to connect to it, would be costly in the long run since terraform doesn't support (by design) creating and destroying an ephemeral resource in the same run.
I'm going to investigate possible workarounds.
This is still a possible future feature, but at this time nobody on the Terraform team at HashiCorp is working on this due to priorities being elsewhere.
In the meantime, we hear that users are employing some other strategies to address this problem within Terraform's current capabilities:
- Run Terraform on a system on the other side of the bastion, which has direct network access to the services Terraform is managing.
- Use a general IP VPN rather than SSH tunnel to give the system running Terraform access to the services it will manage, so that the indirection is invisible to individual applications like Terraform.
Neither of these solutions are compatible with Terraform Cloud which is a major downside. A terraform native option (e.g. simply the option to provide a ssh private key, ssh user and host to first initiate the connection would be really great.
Now that Boundary is a thing, could Terraform support connecting to services through it?
Most helpful comment
@vmendoza That comment seems a little out of line for a free open source project. If you feel so strongly about it... dig in and write some code.