I have created a service using the following:
apiVersion: v1
kind: Service
metadata:
name: nginx-example
labels:
name: nginx-example
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: "0.0.0.0/0"
spec:
selector:
app: nginx-example
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: http-web
The ELB launches but the nodes are seen as unavailable. It also looks like security groups are not set properly on the ELB to allow it access on the port set by K8 (in this case 30892). I had not modified any SG's as part of creating the cluster.
I used Kops 1.8.1 for K8 1.9.3 - anyone got any ideas what's going on?
Do you have a pod running that matches the selector?
Here is the deployment manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-example
labels:
app: nginx-example
spec:
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
Couple of errors:
Error from server (BadRequest): error when creating "nginx.yaml": Deployment in version "v1" cannot be handled as a Deployment: no kind "Deployment" is registered for version "apps/v1"
you are also using a selector in your service definition of "app: nginx-example", but your pods have "app: nginx"
I've tweaked your definition:
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: nginx-example
labels:
app: nginx-example
spec:
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx-example
labels:
name: nginx-example
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: "0.0.0.0/0"
spec:
selector:
app: nginx
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: http-web
Was the error on the apiVersion then?
There were two things I changed.
First was the apiVersion as I got this error:
Error from server (BadRequest): error when creating "nginx.yaml": Deployment in version "v1" cannot be handled as a Deployment: no kind "Deployment" is registered for version "apps/v1"
The second was your selector in your service. Here's how I debugged it
Created from your config (with the apiVersion changed)
$ kubectl create -f nginx.yaml
deployment "nginx-example" created
service "nginx-example" created
Check the pods are running
$ kubectl get pods | grep nginx
nginx-example-569477d6d8-gfksb 1/1 Running 0 39s
nginx-example-569477d6d8-p92k9 1/1 Running 0 39s
Check your selector
$ kubectl describe service nginx-example | grep -i selector | awk '{print $2}'
app=nginx-example
Find pods matching your label
$ get pods -l $(kubectl describe service nginx-example | grep -i selector | awk '{print $2}')
No resources found.
No pods found, so the service won't pass anything to the pods, even though they are running....
So let's delete and recreate
$ kubectl delete -f nginx.yaml
deployment "nginx-example" deleted
service "nginx-example" deleted
$ kubectl create -f nginx-patched.yaml
deployment "nginx-example" created
service "nginx-example" created
md5-b114dbb005b457d2ca2a7f4e79f9f571
$ kubectl get pods | grep nginx
nginx-example-569477d6d8-vxc5s 1/1 Running 0 5s
nginx-example-569477d6d8-zqhbv 1/1 Running 0 5s
md5-77a5512332c13b216ffdf7af077d1dbb
$ kubectl describe service nginx-example | grep -i selector | awk '{print $2}'
app=nginx
md5-5d07d02203b326b4273f8b5bee7ca339
$ kubectl get pods -l $(kubectl describe service nginx-example | grep -i selector | awk '{print $2}')
NAME READY STATUS RESTARTS AGE
nginx-example-569477d6d8-vxc5s 1/1 Running 0 37s
nginx-example-569477d6d8-zqhbv 1/1 Running 0 37s
So now the service should be passing the requests onto the pods. You can verify this using the kubernetes dashboard (if you have installed on your cluster)
https://[cluster api host]/api/v1/namespaces/kube-system/services/http:kubernetes-dashboard:/proxy/#!/service/default/nginx-example?namespace=default
Your internal ELB has subnets configured in the same availability zone as your node?
Are you tagging the privates nodes with the annotation kubernetes.io/role/internal-elb as described in aws.go?
// TagNameSubnetInternalELB is the tag name used on a subnet to designate that
// it should be used for internal ELBs
const TagNameSubnetInternalELB = "kubernetes.io/role/internal-elb"
@amalucelli what should be annotated with kubernetes.io/role/internal-elb?
The K8 nodes?
For internal ELBs, you tag with the annotation
service.beta.kubernetes.io/aws-load-balancer-internal: "0.0.0.0/0"
When I tested your yaml earlier, it did correctly create an internal-only ELB, so that is fine.
@huang-jy did it connect to the nodes as well?
@darrenhaken yes, it picked up the pods as per your config. I use a similar method for my corporate test cluster.
The only catch you have to remember is that the ELB will resolve to a private IP and not a public IP if you use that annotation.
@huang-jy did you have any issues with the Security Groups it creates?
At the moment the instance nodes seem to have a default node-xx security group attached which restricts incoming traffic to the master. I'm wondering if the issue I have is about SG's blocking connectivity.
@darrenhaken your yaml does not create any new security groups -- or do you mean the security groups kops creates at the cluster creation?
I mean the security groups Kops creates.
No, I had no issues. I added additional groups to it as part of the creation though, to allow the cluster to talk to other parts of my AWS estate though.
How did you add additional groups?
Assuming the groups are already created:
kind: InstanceGroup
metadata:
creationTimestamp: 2018-01-04T10:25:26Z
labels:
kops.k8s.io/cluster: [clustername]
name: master-eu-west-1a
spec:
additionalSecurityGroups:
- sg-xxxxxxxx
- sg-yyyyyyyy
- sg-zzzzzzzz
....
....
And can you add any additional security groups to the ELB that K8 creates for a Service?
I haven't tried myself, but additionalSecurityGroups doesn't work in a service definition since it's AWS specific.
error: error validating "test.yaml": error validating data: ValidationError(Service.spec): unknown field "additionalSecurityGroups" in io.k8s.api.core.v1.ServiceSpec; if you choose to ignore these errors, turn validation off with --validate=false
Do you have anything set on cloudConfig for your Kops manifest?
I noticed i had this set:
cloudConfig:
disableSecurityGroupIngress: true
elbSecurityGroup: sg-xxxxxxx
I wondered if that's causing the problem, I'm redeploying the cluster to see.
@huang-jy So I did curl {IP_OF_NODE}:{NODEPORT} where nodeport is the one the ELB is trying to connect to. I get a connection refused.
I have added an SG to the node to allow all inbound/outbound traffic but I still get the same problem. Is there any kubectl commands to verify that the ports are open?
@darrenhaken you don't normally have to hit the node directly. You have created a loadbalancer as part of the service, you should be using that.
Since you have created an internal loadbalancer, which therefore resolves to internal IPs, you can only access that if you are DirectConnected into the VPC, or you are going through a bastion. Please confirm you are doing one of those. If not, take off that internal annotation to make the ELB public.
Assuming you are able to access it using the internal IP, I found another error.
Your service definition describes how to port map the requests. You described this:
ports:
- name: http
port: 80
targetPort: http-web
But have not declared http-web anywhere.
This leads to this message showing when trying to curl the elb:
curl: (52) Empty reply from server
Change that to:
ports:
- name: http
port: 80
targetPort: 80
And it should work
$ curl -IL [elb-host-name]
HTTP/1.1 200 OK
Server: nginx/1.7.9
Date: Thu, 22 Feb 2018 17:43:06 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 23 Dec 2014 16:25:09 GMT
Connection: keep-alive
ETag: "54999765-264"
Accept-Ranges: bytes
For summary, here's the full config I used
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-example
labels:
app: nginx-example
spec:
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx-example
labels:
name: nginx-example
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: "0.0.0.0/0"
spec:
selector:
app: nginx
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: 80
I tried to connect directly to the node rather than through the ELB to verify if I could at least connect directly to the node.
I do have a VPN to the VPC and can access other resources in the account.
Below are all the files I have used to create the cluster
apiVersion: kops/v1alpha2
kind: Cluster
metadata:
creationTimestamp: 2018-02-21T11:51:01Z
name: k8.dev.k8s.local
spec:
api:
loadBalancer:
type: Internal
authorization:
alwaysAllow: {}
channel: stable
cloudProvider: aws
configBase: s3://bucket/k8.dev.k8s.local
dnsZone: dns.fake.zone.com
etcdClusters:
- etcdMembers:
- instanceGroup: master-eu-west-1a
name: a
name: main
- etcdMembers:
- instanceGroup: master-eu-west-1a
name: a
name: events
iam:
allowContainerRegistry: true
legacy: false
kubernetesApiAccess:
- 0.0.0.0/0
kubernetesVersion: 1.9.3
masterPublicName: api.k8.dev.k8s.local
networkCIDR: 10.90.0.0/16
networkID: vpc-4c4dca2b
networking:
weave:
mtu: 8912
nonMasqueradeCIDR: 100.xx.x.x/10
sshAccess:
- 0.0.0.0/0
subnets:
- cidr: 10.xxx.0.0/22
id: subnet-id1
name: subnet-x1
type: Private
zone: eu-west-1a
- cidr: 10.xxx.16.0/22
id: subnet-id2
name: private-b
type: Private
zone: eu-west-1b
- cidr: 10.xxx.32.0/22
id: subnet-id3
name: private-c
type: Private
zone: eu-west-1c
topology:
dns:
type: Public
masters: private
nodes: private
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-02-21T11:51:01Z
labels:
kops.k8s.io/cluster: k8.dev.k8s.local
name: master-eu-west-1a
spec:
associatePublicIp: false
image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14
machineType: m3.medium
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-eu-west-1a
role: Master
subnets:
- private-a
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-02-21T11:51:01Z
labels:
kops.k8s.io/cluster: k8.dev.k8s.local
name: nodes
spec:
associatePublicIp: false
additionalSecurityGroups:
- sg-xxxxx
image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14
machineType: t2.medium
maxSize: 6
minSize: 3
nodeLabels:
kops.k8s.io/instancegroup: nodes
role: Node
subnets:
- sg-xx1
- sg-xx2
- sg-xx3
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-example
spec:
selector:
matchLabels:
app: nginx-example
replicas: 2
template:
metadata:
labels:
app: nginx-example
spec:
containers:
- name: nginx-example
image: nginx:1.7.9
ports:
- name: http-port
containerPort: 80
apiVersion: v1
kind: Service
metadata:
name: nginx-example
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: "0.0.0.0/0"
spec:
selector:
app: nginx-example
type: LoadBalancer
ports:
- name: nginx-example
port: 80
targetPort: 80
Can you take off the internal elb annotation then? Confirm if it works?
@huang-jy slight update here but if I change the service to use a NodePort for the type I now get a response from the IP of a node using the allocated port. Will try taking internal off now...
@huang-jy without the annotation no Load Balancer is created
@darrenhaken internal annotation only affects the ELB and when you use LoadBalancer on the service type. It makes no difference when you use NodePort type.
@huang-jy I put it back to LoadBalancer and I do seem to still be able to curl the Node directly but the ELB still reports the nodes as out of service. Any info you'd need about the ELB?
If you curl the ELB, do you get a reply like I did?
@huang-jy I just made some progress. If I change the health check manually from TCP to HTTP it works. Eh!?
Bear in mind I didn't change the healthcheck on the ELB.
@huang-jy if i curl the ELB it doesn't work. Change the health check to be HTTP and it immediately detects the nodes as healthy. Then curl works.
That is strange.
What security groups do you have attached on the nodes?
@huang-jy I was wrong it was related to the security groups. I added a security which that I applied to the instance group which basically is an allow all SG. Then healthy nodes appeared.. This has also been added to the instance group via:
additionalSecurityGroups:
- sg-xxxxx
Does that make sense?
I don't know how to automatically add the same SG to the ELB when its created, do you?
Alternatively can I simply remove the additionalSecurityGroups and it will 'just work'?
Here is what the SG applies:
All traffic | All | All | 0.0.0.0/0
-- | -- | -- | --
All traffic | All | All | 0.0.0.0/0
-- | -- | -- | --
Okay can I propose this:
Remove all "additionalSecurityGroups" from your worker ig definition so that the ONLY security group you have on the nodes is the nodes.kubernetes one.
kubectl delete your nginx example deployment and service then kubectl create it again with the internal annotation.
See if that works. If it does, kubectl delete again, re-add the security groups you took off to additionalSecurityGroups and then kubectl create again.
If this time it doesn't work, then one or more of the security groups you just re-added is conflicting.
@huang-jy Isn't additionalSecurityGroups part of the cluster and not the service manifest?
Still remove it. We essentially want to get as close to a vanilla sg configuration as possible.
And, no, they're ig specific.
https://github.com/kubernetes/kops/issues/4486#issuecomment-367722850
Alright, @huang-jy I think I know what's fixed it but I will also rebuild the cluster with all SG's removed and get back to vanilla. I'll reply to this issue tomorrow with the result and keep you posted (it's getting late here and a rebuild takes a long time)
The last thing I removed before testing this again was:
cloudConfig:
disableSecurityGroupIngress: true
elbSecurityGroup: sg-33cb4449
This meant that when an ELB was now created it used a randomly generated one by Kops. I think that has allowed it access to the nodes, it was just a little slow to connect at first.
My goal of adding SG's to both the ELB and Nodes btw was so to restrict access on the Node port range to the ELB. Are you aware of achieving something like this? It's quite a common pattern.
Yes, I have a corporate base sg attached to all the nodes, and masters via the instance group definitions, so it's not uncommon at all.
How do you also apply that to the ELB as well so that it is permitted to communicate to the nodes?
Are you comfortable sharing as TF or CloudFormation so I can see it as code showing the ports but scrubbing IPs etc?
We don't use TF or CF for the Kubernetes cluster, we use kops. And it's not production ready yet :p
It's essentials the comment I put earlier. On all the master igs and node ig, I have a spec.additionalSecurityGroups section with the additional corporate sgs that need to go onto the boxes.
I haven't added anything additional to the ELBs yet.
Ah ok so you add it to the master IG too.
If you haven鈥檛 added anything to the ELB how do you ensure the port restrictions are allowed by the ELB?
I鈥檝e found out you can add an annotation to the service to add an SG to a service btw.
The service defines what ports you allow in.
The ELB is created as part of the Service declaration and in there you define what ports to allow in and where to map it to on the pod (in your example, you mapped 80 -> 80)
Thanks for all the help on this you鈥檝e been awesome!
Can we close?
I think we can. @darrenhaken ?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
I had a similar issue with a service:
Service in version "v1" cannot be handled as a Service: v1.Service: Spec: v1.ServiceSpec: Selector: ReadString: expects " or n, but found 2, error found in #10 byte of
We use yaml files for manifests, and from what I've read, kubernetes translates yaml to json before execution. In my case, I have a build process that populates a label version with a value of the commit sha's last 8 chars. This was the first time I'd run into an all-numeric last 8 sha, and we don't use quotes in yaml because it doesn't require quotes.
The kubernetes json schema that it validates against requires labels to be a string, but somewhere along the way the json parser thought my label was a number instead of a string, and threw that error.
WRAP YOUR YAML STRINGS IN QUOTES! Especially if they are dynamically populated!
Most helpful comment
There were two things I changed.
First was the apiVersion as I got this error:
The second was your selector in your service. Here's how I debugged it
Created from your config (with the apiVersion changed)
Check the pods are running
Check your selector
Find pods matching your label
No pods found, so the service won't pass anything to the pods, even though they are running....
So let's delete and recreate
So now the service should be passing the requests onto the pods. You can verify this using the kubernetes dashboard (if you have installed on your cluster)
https://[cluster api host]/api/v1/namespaces/kube-system/services/http:kubernetes-dashboard:/proxy/#!/service/default/nginx-example?namespace=default