Hey,
Since the release of v0.10.3 we're unable to run plans on terraform containing helm_release resources in our CI system. It does work when run manually, so I'm not convinced there is a bug in the provider - but I would appreciate any help in tracking the route cause.
Example terraform is below. When run via Jenkins (on a docker container with terraform installed), terraform quits (during plan for pre-existing resources, or during apply for new ones) with the incredibly useful Error: error installing: Unauthorized
If the provider is forced to version 0.10.2, it works as expected. If I run the same terraform on my machine, with the same credentials and the 0.10.3 provider, it also works fine.
Terraform v0.12.10
k8s cluster version v1.13.11-eks-5876d6 running helm v2.15.1
provider "aws" {
region = "us-east-1"
}
# Get cluster data
data "aws_eks_cluster" "cluster" {
name = "test_cluster"
}
provider "aws" {
region = "us-east-1"
alias = "eks_assume_role"
assume_role {
role_arn = "arn:aws:iam::xxxxxx:role/eks-k8s-role"
}
}
# Get cluster auth with assumed role
data "aws_eks_cluster_auth" "cluster" {
provider = aws.eks_assume_role
name = data.aws_eks_cluster.cluster.name
}
provider "helm" {
service_account = "helm"
namespace = "kube-system"
kubernetes {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.cluster.token
}
}
data "helm_repository" "stable" {
name = "stable"
url = "https://kubernetes-charts.storage.googleapis.com"
}
resource "helm_release" "example" {
name = "my-redis-release"
repository = data.helm_repository.stable.metadata[0].name
chart = "redis"
version = "6.0.1"
}
terraform applyAs mentioned, this seems to be environment-specific, as I can only reproduce it on the docker container that runs our terraform builds. I'm not sure why this would make a difference though.
Note that the cluster auth uses an assumed role, so should not have different permissions when run locally compared to when run via Jenkins.
Again I'm not sure that this is a provider bug, as it is only reproducible on one specific environment, but something about the changes in 0.10.3 is affecting it and I don't know enough to spot a cause.
Thanks,
I ran into similar issues with version 0.12 too. For example after a terraform destroy, sometimes I got the following on next apply:
Error: rpc error: code = Unknown desc = Unauthorized
on xxx.tf line 36, in resource "helm_release" "xxx":
Confirmed still an issue in 0.10.4
I can confirm this. Downgrading to 0.10.2 made our CI pass again. Only reproducible on our CI pods. not locally.
It impacts terraform plan as well. I haven't had a chance to turn on the authenticator logs in eks to observe what's going on there, hope to get some time to dig deeper this week. Pinning to 0.10.2 is working for me for now
Update: I'm noticing a similar issue with the kubernetes provider as well, but that gave me insight to what is going on here. It is somehow using the service token from the in-cluster configuration instead of the provided configuration.
Any news about it? is necessary a fix for it.
Check the logs of tiller pod. It may turn out that tiller k8s service account was removed and Tiller is not able to make queries. That how it was in my case:
Terraform plan:
Error: rpc error: code = Unknown desc = Unauthorized
Tiller logs:
[storage/driver] 2020/01/16 18:00:39 query: failed to query with labels: Unauthorized
Restarting the pod failed with:
Error creating: pods "tiller-deploy-7b489d95c4-" is forbidden: error looking up service account kube-system/tiller: serviceaccount "tiller" not found
Adding service account fixed the problem.
I took a look at tiller pods and just I'm seeing:
[main] 2020/01/20 19:07:08 Starting Tiller v2.13.1 (tls=false)
[main] 2020/01/20 19:07:08 GRPC listening on :44134
[main] 2020/01/20 19:07:08 Probes listening on :44135
[main] 2020/01/20 19:07:08 Storage driver is ConfigMap
[main] 2020/01/20 19:07:08 Max history per release is 0
^C
if I execute the helm command from command line it works , seems that the way that terraform/helm authenticate with k8s is the problem when use a role iam.
You have to add the role to the aws authenticator:
kubeconfig_aws_authenticator_additional_args = [ "-r", aws_iam_role.kubernetesAdmin.arn ]
Closing this issue since is making reference to a version based on Helm 2, if this is still valid to the master branch please reopen it. Thanks.
I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!
Most helpful comment
I can confirm this. Downgrading to 0.10.2 made our CI pass again. Only reproducible on our CI pods. not locally.