Kops: Help with creating dns-free cluster

Created on 5 Jul 2018 · 15Comments · Source: kubernetes/kops

kops version 1.9.1
kubectl client version 1.11.0
aws
I was following the tutorial on how to use kops in aws-china, but in us-west-2 (I just wanted a dns-free cluster). Everything went smooth until I reached the kops validate cluster part.
```Using cluster from kubectl context: xxxxxxxxx-example.k8s.local
Validating cluster xxxxxxxxx-example.k8s.local

unexpected error during validation: error listing nodes: Get https://internal-api-xxxxxxxxx-exam-6vnila-578615394.us-west-2.elb.amazonaws.com/api/v1/nodes: dial tcp xxx.xx.xx.xx:443: i/o timeout
```

I expected everything to work like in the tutorial/

cluster manifest:
```apiVersion: kops/v1alpha2
kind: Cluster
metadata:
creationTimestamp: 2018-07-05T11:29:14Z
name: xxxxxxxxx-example.k8s.local
spec:
api:
loadBalancer:
type: Internal
authorization:
rbac: {}
channel: stable
cloudProvider: aws
configBase: s3://xxxxxxxxx-example-com-state-store/xxxxxxxxx-example.k8s.local
etcdClusters:
- etcdMembers:
- instanceGroup: master-us-west-2a
  
  name: a
  
  name: main
- etcdMembers:
- instanceGroup: master-us-west-2a
  
  name: a
  
  name: events
  
  iam:
  
  allowContainerRegistry: true
  
  legacy: false
  
  kubernetesApiAccess:
- 0.0.0.0/0
  
  kubernetesVersion: 1.9.6
  
  masterPublicName: api.xxxxxxxxx-example.k8s.local
  
  networkCIDR: 172.20.0.0/16
  
  networking:
  
  weave:
  
  mtu: 8912
  
  nonMasqueradeCIDR: 100.64.0.0/10
  
  sshAccess:
- 0.0.0.0/0
  
  subnets:
- cidr: 172.20.32.0/19
  
  name: us-west-2a
  
  type: Private
  
  zone: us-west-2a
- cidr: 172.20.0.0/22
  
  name: utility-us-west-2a
  
  type: Utility
  
  zone: us-west-2a
  
  topology:
  
  dns:
  
  type: Public
  
  masters: private
  
  nodes: private

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-07-05T11:29:15Z
labels:
kops.k8s.io/cluster: xxxxxxxxx-example.k8s.local
name: master-us-west-2a
spec:
associatePublicIp: false
image: kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
machineType: m3.medium
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-us-west-2a
role: Master
subnets:

us-west-2a

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-07-05T11:29:16Z
labels:
kops.k8s.io/cluster: xxxxxxxxx-example.k8s.local
name: nodes
spec:
associatePublicIp: false
image: kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
machineType: t2.medium
maxSize: 2
minSize: 2
nodeLabels:
kops.k8s.io/instancegroup: nodes
role: Node
subnets:

us-west-2a```

aredocumentation good first issue lifecyclrotten

Source

eranreshef

Most helpful comment

I can't share the manifest because I deleted it (it was only for a POC).
If the load balancer is only internal than you won't have access to the cluster from outside.
You can perhaps try to create another machine inside the cluster's VPC and try to access the cluster from there...

eranreshef on 13 Sep 2018

👍3

All 15 comments

@eranreshef It looks like you have the DNS in the cluster spec set to public when you need private.

While adapting the instructions for AWS china in us-west-2 is quite ambitious, you may have more success following along with this guide.

Let me know if that gets you back on track!

geojaz on 12 Jul 2018

The good-starter-issue here is cleaning up the chinese docs- it needs a run through to make sure it works and then updating of reference to old versions of kops.

geojaz on 12 Jul 2018

👍2

I am facing the same issue. Any pointer would be really appreciated. I could provide more details if needed.

savitharaghunathan on 12 Sep 2018

@sNathan13 I did eventually managed to get it working.
I followed the tutorial but changed the create command to be:

kops create cluster \
    --node-count 3 \
    --zones ${AWS_REGION}a,${AWS_REGION}b,${AWS_REGION}c \
    --master-zones ${AWS_REGION}a,${AWS_REGION}b,${AWS_REGION}c \
    --networking calico \
    ${NAME}

eranreshef on 13 Sep 2018

👍1

@eranreshef Thanks for replying :) Would it be possible to share your cluster manifest or is it same as the one you have mentioned above ? When I have the load balancer config set as public, I am able to validate cluster. But I want it to be internal.

savitharaghunathan on 13 Sep 2018

👍1

eranreshef on 13 Sep 2018

👍3

With an Internal ELB, kubectl command doesn't work as well.
Seems Public ELB + Security Group is the best practice for managing built-in service.

NiuZhuang on 20 Nov 2018

Hi NiuZhuang, Can you verify if the route tables have routes defined to access internal resources? Initially the internal ELB did not work for me as well, but I fixed it by adding necessary routes.

savitharaghunathan on 20 Nov 2018

Hi sNathan13, thanks for your kind reply.

I have two route tables, both of which have internal access.
Public one:
172.20.0.0/16 - local
0.0.0.0/0 - igw ID
Private one:
172.20.0.0/16 - local
0.0.0.0/0 - NAT ID

My environment:
Private masters, private nodes on aws.
Local kubectl.

What necessary routes did you add?

NiuZhuang on 21 Nov 2018

Hi NiuZhuang, my setup had a VPN tunnel established across AWS VPC and corporate N/W. I also added the corporate IP range to the route in addition to local route( 172.x.x.x/x).

savitharaghunathan on 21 Nov 2018

OK I get it.
Perhaps VPN is a one true way with the internal ELB.

NiuZhuang on 21 Nov 2018

👍1

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 19 Feb 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 21 Mar 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 20 Apr 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.