1. What kops version are you running?
1.14.0
2. What Kubernetes version are you running?
1.15.4
3. What cloud provider are you using?
AWS
4. What commands did you run?
kops create -v 10 \
--name cluster1.somedomain.io \
--state "s3://cluster1.somedomain.io" \
-f cluster_config.yaml
kops create -v 10 secret sshpublickey admin \
--name cluster1.somedomain.io \
--state "s3://cluster1.somedomain.io" \
--config cluster_config.yaml \
-i cluster1.somedomain.io.pub
kops -v 10 update cluster cluster1.somedomain.io \
--state "s3://cluster1.somedomain.io" \
--yes
5. What is the simplest way to reproduce this issue?
Create a cluster and set spec.kubeScheduler.usePolicyConfigMap: true
6. What happened after the commands executed?
NotReady state7. What did you expect to happen?
Ready state without manual intervention.8. Please provide your cluster manifest.
9. Please run the commands with most verbose logging by adding the -v 10 flag.
10. Anything else do we need to know?
This is not a new issue. I've been working around it since K8s 1.13, maybe earlier.
There are two different issues causing the kube-scheduler pods to crash.
system:kube-scheduler clusterrole doesn't grant access to configmaps.Related log messages:
I1003 05:31:39.339583 1 server.go:161] Starting Kubernetes Scheduler version v1.15.4
couldn't get policy config map kube-system/scheduler-policy: configmaps "scheduler-policy" is forbidden: User "system:kube-scheduler" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
Manual fix: edit the system:kube-scheduler clusterrole and append the following:
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- list
- watch
scheduler-policy configmap contains invalid predicates.Related log messages:
F1003 06:02:19.748546 1 plugins.go:240] Invalid configuration: Predicate type not found for CheckNodeMemoryPressure
...
F1003 06:17:53.461767 1 plugins.go:240] Invalid configuration: Predicate type not found for CheckNodeDiskPressure
...
F1003 06:22:57.394597 1 plugins.go:240] Invalid configuration: Predicate type not found for CheckNodeCondition
...
F1003 06:43:25.075534 1 plugins.go:240] Invalid configuration: Predicate type not found for NoVolumeNodeConflict
scheduler-policy configmap[original]
scheduler-policy configmap[working]
scheduler-policy configmap[tuned]
Manual fix: edit the scheduler-policy configmap and remove the troublesome predicates (CheckNodeMemoryPressure, CheckNodeDiskPressure, CheckNodeCondition, NoVolumeNodeConflict)
I suspect whats going on is that since the TaintNodesByCondition featuregate has been enabled by default since K8s 1.12.0 the troublesome predicates are removed (src) and the configmap resource (src) doesn't reflect this change won't work when used here
It looks like issue 2 is a pretty straightforward fix, but I'm not sure how to handle patching the clusterrole in issue 1.
Related to #6579
this feature is DEPRECATED in latest kubernetes. So I do not see why it should be fixed / used anymore.
--policy-configmap string
DEPRECATED: name of the ConfigMap object that contains scheduler's policy configuration. It must exist in the system namespace before scheduler initialization if --use-legacy-policy-config=false. The config must be provided as the value of an element in 'Data' map with the key='policy.cfg'
--policy-configmap-namespace string聽聽聽聽聽Default: "kube-system"
DEPRECATED: the namespace where policy ConfigMap is located. The kube-system namespace will be used if this is not provided or is empty.
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/
@zetaab
The command which starts the scheduler on kops-managed cluster
Command:
/bin/sh
-c
mkfifo /tmp/pipe; (tee -a /var/log/kube-scheduler.log < /tmp/pipe & ) ; exec /usr/local/bin/kube-scheduler --kubeconfig=/var/lib/kube-scheduler/kubeconfig --leader-elect=true --policy-configmap=scheduler-policy --policy-configmap-namespace=kube-system --v=2 > /tmp/pipe 2>&1
It uses --kubeconfig flag which is depricated as well. Looks like instead of the flags config file should be used now.
# kube-scheduler --help
...
Misc flags:
--config string
The path to the configuration file. Flags override values in this file.
--write-config-to flag can be used to produce sample configuration file
# kube-scheduler --kubeconfig=/var/lib/kube-scheduler/kubeconfig --leader-elect=true --policy-configmap=scheduler-policy --policy-configmap-namespace=kube-system --v=2 --write-config-to /scheduler_config.yaml
Which produces
algorithmSource:
provider: DefaultProvider
apiVersion: kubescheduler.config.k8s.io/v1alpha1
bindTimeoutSeconds: 600
clientConnection:
acceptContentTypes: ""
burst: 100
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /var/lib/kube-scheduler/kubeconfig
qps: 50
disablePreemption: false
enableContentionProfiling: false
enableProfiling: false
failureDomains: kubernetes.io/hostname,failure-domain.beta.kubernetes.io/zone,failure-domain.beta.kubernetes.io/region
hardPodAffinitySymmetricWeight: 1
healthzBindAddress: 0.0.0.0:10251
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: true
leaseDuration: 15s
lockObjectName: kube-scheduler
lockObjectNamespace: kube-system
renewDeadline: 10s
resourceLock: endpoints
retryPeriod: 2s
metricsBindAddress: 0.0.0.0:10251
percentageOfNodesToScore: 0
schedulerName: default-scheduler
schema
The KubeSchedulerConfiguration structure has Plugins field which has Score list. The plugins can be configured in PluginConfig field of KubeSchedulerConfiguration structure
Kops should use the configuration file instead of flags and allow its configuration.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
@rtluckie shall this really be closed? issues didn't go away, I just had to apply the same manual workaround on our clusters
We should not close this issue. Exiting functionality is broken :(
While this indeed deprecated, if you need to use it to access scheduler settings that aren't yet exposed in Kops you can use this RBAC policy:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kube-scheduler-configmap
namespace: kube-system
rules:
- apiGroups: [""]
resources: [configmaps]
resourceNames: [scheduler-policy]
verbs: [get]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: kube-scheduler-configmap
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kube-scheduler-configmap
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: system:kube-scheduler
Most helpful comment
@zetaab
The command which starts the scheduler on kops-managed cluster
It uses
--kubeconfigflag which is depricated as well. Looks like instead of the flags config file should be used now.--write-config-toflag can be used to produce sample configuration fileWhich produces
schema
The
KubeSchedulerConfigurationstructure hasPluginsfield which hasScorelist. The plugins can be configured inPluginConfigfield ofKubeSchedulerConfigurationstructureKops should use the configuration file instead of flags and allow its configuration.