Hi I have problem with kubernetes cluster created with kops
I'm using this command to create cluster on AWS:
kops create cluster \
--bastion="true" \
--master-count=3 \
--node-count=4 \
--master-zones eu-west-1a,eu-west-1b,eu-west-1c \
--zones eu-west-1a,eu-west-1b,eu-west-1c \
--node-size t2.medium \
--master-size t2.small \
--dns private \
--dns-zone MY_DNS_ZONE \
--vpc MY_VPC \
--topology private \
--networking kopeio-vxlan \
--target=terraform \
${NAME}
Everything is working fine before I try to create PVC on EBS.
My pods are in "Pending" state and in kubectl get events I see comunicate:
0/7 nodes are available: 3 node(s) had taints that the pod didn't tolerate, 4 node(s) had no available volume zone.
When I change cluster from Multiple AZs to only single AZ everything is working fine.
As I understand pods are not able to connect to PVs in different AZ than pods.
Can you help me with this case or redirect me to some who can help with that? I'm new to Kubernetes and to run my project I used docker-compose, kops and kompose (https://github.com/kubernetes/kompose)
kubectl version:
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:39:04Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.11", GitCommit:"637c7e288581ee40ab4ca210618a89a555b6e7e9", GitTreeState:"clean", BuildDate:"2018-11-26T14:25:46Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
docker-compose version:
docker-compose version 1.23.1, build b02f1306
docker-py version: 3.5.0
CPython version: 3.6.7
OpenSSL version: OpenSSL 1.1.0f 25 May 2017
kops version:
Version 1.10.0 (git-8b52ea6d1)
kompose version:
1.17.0 (a74acad)
here are examples of my project database kubernetes files with claims:
database deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
kompose.cmd: kompose convert
kompose.version: 1.17.0 (a74acad)
creationTimestamp: null
labels:
io.kompose.service: database
name: database
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: database
spec:
containers:
image: mysql:5.7
name: DATABASE_NAME
ports:
- containerPort: 3306
resources: {}
volumeMounts:
- mountPath: /var/lib/mysql
name: database-claim0
- mountPath: /etc/mysql/conf.d
name: database-claim1
restartPolicy: Always
volumes:
- name: database-claim0
persistentVolumeClaim:
claimName: database-claim0
- name: database-claim1
persistentVolumeClaim:
claimName: database-claim1
status: {}
database claim:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
io.kompose.service: database-claim0
name: database-claim0
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
status: {}
Hello @wszychta
As I understand pods are not able to connect to PVs in different AZ than pods.
If you are specifically using AWS EBS volumes for the PVs this is true, yes.
This is currently a hard limitation for AWS EBS volumes, you can read a bit about that here: An EBS volume and the instance to which it attaches must be in the same Availability Zone.
Kubernetes itself supports many other storage backends that could be used zone independently, but of course with different properties (like performance, pricing, cloud provider support, ...). For example there is AWS EFS that can be used in any AZ within an AWS region but with its own tradeoffs (e.g. https://github.com/kubernetes-incubator/external-storage/issues/1030).
AWS EBS is definitely a common option with Kubernetes PVs and if you know the limitations you can build your application deployments around that.
One workaround, as you already mentioned, to just put the whole cluster into a single AZ is usually insufficient, as you lose important high-availability properties.
The other option I would recommend for Kubernetes 1.10/1.11 is to control where your volumes are created and where your pods are scheduled:
You can also read a bit about this here: https://github.com/kubernetes/kubernetes/issues/34583
There are some bigger improvements coming with Kubernetes 1.13 (and 1.12 as a beta feature), which you can read about here: https://kubernetes.io/blog/2018/10/11/topology-aware-volume-provisioning-in-kubernetes/
From the Kops side, I think there is little that can be done at the moment to change anything about this situation (except for working towards 1.13 馃槈).
you may be able to work around this by pinning deployments to one az for now, I am looking forward to not having to do so in the future, if you ensure that deployments happen in one az things should be ok. Match on this label failure-domain.beta.kubernetes.io/zone=ap-northeast-1d and things will work
Thank you for your answers.
I think (for now) I'm going to use only one AZ and wait for kops to support kubernetes 1.13 :)
The WaitForFirstConsumer property mentionned in the 1.13 blog post is working well when you spawn your containers.
But if you ever need to upgrade underlying servers, to select another instance type for example, and you rollout update them with instance refresh, the container on it will be killed, and may attempt to reschedule elsewhere where his PVC actually don't exist. (At this time, the PVC has not been destroyed, so it does not attempt to create a fresh one, so we won't benefit from the "WaitForFirstConsumer" property here. It will just failed
Anyone experienced this ?
The WaitForFirstConsumer property mentionned in the 1.13 blog post is working well when you spawn your containers.
But if you ever need to upgrade underlying servers, to select another instance type for example, and you rollout update them with instance refresh, the container on it will be killed, and may attempt to reschedule elsewhere where his PVC actually don't exist. (At this time, the PVC has not been destroyed, so it does not attempt to create a fresh one, so we won't benefit from the "WaitForFirstConsumer" property here. It will just failedAnyone experienced this ?
Strange, how does that make sense, in the documentation it says:
When persistent volumes are created, the PersistentVolumeLabel admission controller automatically adds zone labels to them. The scheduler (via the VolumeZonePredicate predicate) will then ensure that pods that claim a given volume are only placed into the same zone as that volume, as volumes cannot be attached across zones.
https://kubernetes.io/docs/setup/best-practices/multiple-zones/#functionality
@guyelia Interesting. I don't know "why" it happen, but it sure happened to me.
I'll try to verify tomorrow if the "zone label" is actually added
Most helpful comment
Hello @wszychta
If you are specifically using AWS EBS volumes for the PVs this is true, yes.
This is currently a hard limitation for AWS EBS volumes, you can read a bit about that here:
An EBS volume and the instance to which it attaches must be in the same Availability Zone.Kubernetes itself supports many other storage backends that could be used zone independently, but of course with different properties (like performance, pricing, cloud provider support, ...). For example there is AWS EFS that can be used in any AZ within an AWS region but with its own tradeoffs (e.g. https://github.com/kubernetes-incubator/external-storage/issues/1030).
AWS EBS is definitely a common option with Kubernetes PVs and if you know the limitations you can build your application deployments around that.
One workaround, as you already mentioned, to just put the whole cluster into a single AZ is usually insufficient, as you lose important high-availability properties.
The other option I would recommend for Kubernetes 1.10/1.11 is to control where your volumes are created and where your pods are scheduled:
You can also read a bit about this here: https://github.com/kubernetes/kubernetes/issues/34583
There are some bigger improvements coming with Kubernetes 1.13 (and 1.12 as a beta feature), which you can read about here: https://kubernetes.io/blog/2018/10/11/topology-aware-volume-provisioning-in-kubernetes/
From the Kops side, I think there is little that can be done at the moment to change anything about this situation (except for working towards 1.13 馃槈).