1. What kops version are you running? The command kops version, will display
this information.
Version 1.15.2 (git-ad595825a)
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
3. What cloud provider are you using?
AWS GovCloud
4. What commands did you run? What is the simplest way to reproduce this issue?
Creating a cluster in an existing VPC with existing subnets and an existing Route53 Zone. The zone was specified as a Zone ID.
5. What happened after the commands executed?
W0221 23:03:15.427827 12438 executor.go:130] error running task "IAMRolePolicy/<redacted>" (9m59s remaining to succeed): error creating/updating IAMRolePolicy: MalformedPolicyDocument: Partition "aws" is not valid for resource "arn:aws:route53:::hostedzone/<redacted>".
status code: 400, request id: <redacted>
W0221 23:03:15.427860 12438 executor.go:130] error running task "DNSName/<redacted>" (9m59s remaining to succeed): error creating ResourceRecordSets: NoSuchHostedZone: The specified hosted zone does not exist.
6. What did you expect to happen?
The route53 arn should use the partition associated with the region, aws-us-gov.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
creationTimestamp: "2020-02-21T22:55:11Z"
generation: 3
name: <redacted>
spec:
api:
loadBalancer:
type: Public
authorization:
rbac: {}
channel: stable
cloudLabels:
Owner: Ops
Team: Ops
cloudProvider: aws
configBase: s3://<redacted>/<redacted>
dnsZone: <redacted>.
etcdClusters:
- cpuRequest: 200m
etcdMembers:
- instanceGroup: master-us-gov-west-1b-1
name: "1"
- instanceGroup: master-us-gov-west-1b-2
name: "2"
- instanceGroup: master-us-gov-west-1b-3
name: "3"
memoryRequest: 100Mi
name: main
- cpuRequest: 100m
etcdMembers:
- instanceGroup: master-us-gov-west-1b-1
name: "1"
- instanceGroup: master-us-gov-west-1b-2
name: "2"
- instanceGroup: master-us-gov-west-1b-3
name: "3"
memoryRequest: 100Mi
name: events
iam:
allowContainerRegistry: true
legacy: false
kubelet:
anonymousAuth: false
kubernetesApiAccess:
- <redacted>
kubernetesVersion: 1.15.9
masterInternalName: api.internal.<redacted>
masterPublicName: api.<redacted>
networkCIDR: <redacted>
networkID: vpc-<redacted>
networking:
weave:
mtu: 8912
nonMasqueradeCIDR: 100.64.0.0/10
sshAccess:
- <redacted>
subnets:
- cidr: <redacted>
id: subnet-<redacted>
name: us-gov-west-1a
type: Private
zone: us-gov-west-1a
- cidr: <redacted>
id: subnet-<redacted>
name: us-gov-west-1b
type: Private
zone: us-gov-west-1b
- cidr: <redacted>
id: subnet-<redacted>
name: utility-us-gov-west-1a
type: Utility
zone: us-gov-west-1a
- cidr: <redacted>
id: subnet-<redacted>
name: utility-us-gov-west-1b
type: Utility
zone: us-gov-west-1b
topology:
dns:
type: Private
masters: private
nodes: private
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2020-02-21T22:55:11Z"
labels:
kops.k8s.io/cluster: <redacted>
name: master-us-gov-west-1b-1
spec:
additionalSecurityGroups:
- sg-<redacted>
image: ami-<redacted>
machineType: t3.2xlarge
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-us-gov-west-1b-1
role: Master
subnets:
- us-gov-west-1b
tenancy: dedicated
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2020-02-21T22:55:11Z"
labels:
kops.k8s.io/cluster: <redacted>
name: master-us-gov-west-1b-2
spec:
additionalSecurityGroups:
- sg-<redacted>
image: ami-<redacted>
machineType: t3.2xlarge
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-us-gov-west-1b-2
role: Master
subnets:
- us-gov-west-1b
tenancy: dedicated
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2020-02-21T22:55:11Z"
labels:
kops.k8s.io/cluster: <redacted>
name: master-us-gov-west-1b-3
spec:
additionalSecurityGroups:
- sg-<redacted>
image: ami-<redacted>
machineType: t3.2xlarge
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-us-gov-west-1b-3
role: Master
subnets:
- us-gov-west-1b
tenancy: dedicated
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2020-02-21T22:55:11Z"
labels:
kops.k8s.io/cluster: <redacted>
name: nodes
spec:
additionalSecurityGroups:
- sg-<redacted>
image: ami-<redacted>
machineType: t3.2xlarge
maxSize: 3
minSize: 3
nodeLabels:
kops.k8s.io/instancegroup: nodes
role: Node
subnets:
- us-gov-west-1a
- us-gov-west-1b
tenancy: dedicated
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
Relevant output:
I0221 23:11:41.338498 12819 iamrolepolicy.go:147] Creating IAMRolePolicy
I0221 23:11:41.338515 12819 iamrolepolicy.go:175] PutRolePolicy RoleName=masters.<redacted> PolicyName=masters.<redacted>: {
...
{
"Effect": "Allow",
"Action": [
"route53:ChangeResourceRecordSets",
"route53:ListResourceRecordSets",
"route53:GetHostedZone"
],
"Resource": [
"arn:aws:route53:::hostedzone/<redacted>"
]
},
...
Other ARNs in the same output, such as for S3, properly include the aws-us-gov partition.
9. Anything else do we need to know?
This was fixed in https://github.com/kubernetes/kops/issues/8359, included in Kops 1.16 which will be released imminently. Are you able to try the latest Kops 1.16 beta to confirm it is fixed?
I've downloaded 1.16.0-beta.2 and it seems to fix this arn problem but I suspect there is other code with a similar issue as I get this error now:
I0221 23:27:39.849867 13631 executor.go:176] Executing task "DNSName/api.<redacted>": *awstasks.DNSName {"Name":"api.<redacted>","Lifecycle":"Sync","ID":null,"Zone":{"Name":"Z08249791WBSC69ZYDIJJ","Lifecycle":"Sync","DNSName":"<redacted>","ZoneID":"<redacted>","Private":true,"PrivateVPC":{"Name":"<redacted>","Lifecycle":"Sync","ID":"<redacted>","CIDR":"<redacted>","EnableDNSHostnames":null,"EnableDNSSupport":true,"Shared":true,"Tags":null}},"ResourceType":"A","TargetLoadBalancer":{"Name":"api.<redacted>","Lifecycle":"Sync","LoadBalancerName":"<redacted>","DNSName":"<redacted>","HostedZoneId":"<redacted>","Subnets":[{"Name":"utility-us-gov-west-1b.<redacted>","ShortName":"utility-us-gov-west-1b","Lifecycle":"Sync","ID":"subnet-8d240de9","VPC":{"Name":"<redacted>","Lifecycle":"Sync","ID":"<redacted>","CIDR":"<redacted>","EnableDNSHostnames":null,"EnableDNSSupport":true,"Shared":true,"Tags":null},"AvailabilityZone":"us-gov-west-1b","CIDR":"<redacted>","Shared":true,"Tags":{"SubnetType":"Utility","kubernetes.io/cluster/<redacted>":"shared","kubernetes.io/role/elb":"1"}},{"Name":"utility-us-gov-west-1a.<redacted>","ShortName":"utility-us-gov-west-1a","Lifecycle":"Sync","ID":"subnet-a2250fd4","VPC":{"Name":"<redacted>","Lifecycle":"Sync","ID":"<redacted>","CIDR":"<redacted>","EnableDNSHostnames":null,"EnableDNSSupport":true,"Shared":true,"Tags":null},"AvailabilityZone":"us-gov-west-1a","CIDR":"<redacted>","Shared":true,"Tags":{"SubnetType":"Utility","kubernetes.io/cluster/<redacted>":"shared","kubernetes.io/role/elb":"1"}}],"SecurityGroups":[{"Name":"api-elb.<redacted>","Lifecycle":"Sync","ID":"<redacted>","Description":"Security group for api ELB","VPC":{"Name":"<redacted>","Lifecycle":"Sync","ID":"<redacted>","CIDR":"<redacted>","EnableDNSHostnames":null,"EnableDNSSupport":true,"Shared":true,"Tags":null},"RemoveExtraRules":["port=443"],"Shared":null,"Tags":{"KubernetesCluster":"<redacted>","Name":"api-elb.<redacted>","kubernetes.io/cluster/<redacted>":"owned"}}],"Listeners":{"443":{"InstancePort":443,"SSLCertificateID":""}},"Scheme":null,"HealthCheck":{"Target":"SSL:443","HealthyThreshold":2,"UnhealthyThreshold":2,"Interval":10,"Timeout":5},"AccessLog":null,"ConnectionDraining":null,"ConnectionSettings":{"IdleTimeout":300},"CrossZoneLoadBalancing":{"Enabled":false},"SSLCertificateID":"","Tags":{"KubernetesCluster":"<redacted>","Name":"api.<redacted>","Owner":"Ops","Team":"Ops","kubernetes.io/cluster/<redacted>":"owned"}}}
I0221 23:27:39.850452 13631 request_logger.go:45] AWS request: route53/ListResourceRecordSets
I0221 23:27:39.887373 13631 dnsname.go:76] Found DNS resource "NS" "<redacted>."
I0221 23:27:39.888108 13631 dnsname.go:76] Found DNS resource "SOA" "<redacted>."
I0221 23:27:39.888532 13631 dnsname.go:76] Found DNS resource "A" "management.<redacted>."
I0221 23:27:39.888625 13631 dnsname.go:178] Updating DNS record "api.<redacted>"
I0221 23:27:39.888854 13631 request_logger.go:45] AWS request: route53/ChangeResourceRecordSets
W0221 23:27:39.908367 13631 executor.go:128] error running task "DNSName/api.<redacted>" (9m29s remaining to succeed): error creating ResourceRecordSets: NoSuchHostedZone: The specified hosted zone does not exist.
status code: 404, request id: <redacted>
I0221 23:27:39.908408 13631 executor.go:143] No progress made, sleeping before retrying 1 failed task(s)
It found the existing records but then fails to create a new record on the same zone.
Having just got off with Amazon Support, it turns out that Alias records are not supported by Gov Cloud at this time.
We are looking at augmenting the code and will test the possibility of a CNAME record instead of an Alias and will report back ASAP.
When attempting to create the same record type via the API we get the same record stating that the zone is not found. It in fact is the zone of the internal ELB of the kubernetes API and not the main hosted zone.
Ah, thats good to know, thanks for the update! We may be able to force kops to use CNAMEs rather than Aliases when in GovCloud.
Thanks @rifelpet! Will let you know this week if we can get it working with some augmented code. I may try it in a non-GovCloud region and also a Gov Cloud Region.
Out of curiousity why is Kops using Alias records and not CNAME's? Is there also a reason why it is still using the classic ELB type rather than the more modern ALB/NLB architecture?
If I had to guess, it's because of many of the advantages are outlined here. mainly that it avoids an additional round trip dns lookup, avoids an intermediate TTL, supports health checking of alias targets, doesn't expose the aws resource's dns name, etc.
I'm sure Kops' route53 support was not designed with the possibility that Route53 might be supported but Alias records would not. It may be a minor change, some quick searching reveals this code but there may be others:
I don't recall the reasoning for keeping classic ELBs, it may just be a matter of no one has put in the effort to switch. there are a few issues that have discussed it in the past, perhaps we could add opt-in support for NLBs sometime soon.
Good news, I was able to get this working with some manual intervention. I'll write up my override steps and post them here....there some manual steps but it is up and running and kops validate cluster returns positively. I'll try and write some automation if possible (likely an Ansible Playbook).
@mgs4332 any updates you could provide would be helpful!
@mgs4332 Any update as @weisjohn responded would be much appreciated. I am in the same boat as I need some alias created.
@ksummersill2 i haven't had this issue since moving to gossip protocol: https://github.com/kubernetes/kops/blob/master/docs/getting_started/aws.md#configure-dns
I am glad you message me so that I can update this. I wrote an article to do this with gossip DNS just like you said. https://medium.com/@ksummersill/setup-kops-and-calico-within-aws-gov-cloud-using-gossip-dns-cd6ed5cba36c
@ksummersill2 glad you found a workaround with the gossip protocol. Apologies as I changed jobs and no longer had access to the resources I was working on. The method I got around it with was wrapping it in Ansible and creating standard CNAME records and rebooting the worker nodes to have them automatically join the cluster after first boot.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.