Bug Description
Istiod is not responding to new namespaces. Created a new namespace and labelled istio-injectio=enabled. I deployed one app to the namespace and the pod is stuck in init stage. Istio proxy is not able to mount configmap "istio-ca-root-cert"
Unable to mount volumes for pod "azure-vote-front-5bc759676c-7hg5t_am-dev(a45f7c3a-d546-4461-af15-c5442ae39de9)": timeout expired waiting for volumes to attach or mount for pod "am-dev"/"azure-vote-front-5bc759676c-7hg5t".
Last log from istiod -
2020-03-25T10:04:33.132573Z warn k8s.io/[email protected]/tools/cache/reflector.go:105: watch of *v1.ConfigMap ended with: too old resource version: 2764317 (2764517)
list of unmounted volumes=[istiod-ca-cert]. list of unattached volumes=[default-token-58r9k istio-envoy podinfo istio-token istiod-ca-cert]
Warning FailedMount 3m57s (x33 over 54m) kubelet, aks-linuxpool02-21909056-vmss000000 MountVolume.SetUp failed for volume "istiod-ca-cert" : configmap "istio-ca-root-cert" not found
[ x] Configuration Infrastructure
[ ] Docs
[ ] Installation
[x ] Networking
[ ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[x ] Developer Infrastructure
New namespace should have configmap "istio-ca-root-cert" and should deploy the app
Steps to reproduce the bug
This happened intermittently. In one of the namespace I was able to attach the proxy and once the problem started, further no namespace is working.
Version (include the output of istioctl version --remote
and kubectl version
and helm version
if you used Helm)
Istioctl - client version: 1.5.0
kubectl - server: v1.15.7, client: v1.17.0
How was Istio installed?
istioctl
Environment where bug was observed (cloud vendor, OS, etc)
Azure Kubernetes Engine
I'm facing the same issue. I have delete and recreated the namespace still having the same problem.
@sonujose @mani0070 To reproduce the problem, can you share detailed instructions (e.g., the commands you use) of how you install Istio and how you deploy an example application? Meanwhile, besides the Azure Kubernetes Engine platform mentioned in the issue, do you also notice this problem on other platforms?
I have the same issue aswell on Azure Kubernetes Engine (v 1.15.7).
# istioctl version
client version: 1.5.0
control plane version: 1.5.0
data plane version: 1.5.0 (17 proxies)
Same log in istiod.
2020-03-30T12:46:54.099810Z warn k8s.io/[email protected]/tools/cache/reflector.go:105: watch of *v1.ConfigMap ended with: too old resource version: 3142588 (3143468)
Created the namespace with these commands, istio does not seems to add the configmap istio-ca-root-cert
in this namespace. Other namespaces are working as intended.
# kubectl create namespace keycloak
namespace/keycloak created
# kubectl label namespace keycloak istio-injection=enabled
namespace/keycloak labeled
# kubectl describe namespaces keycloak
Name: keycloak
Labels: istio-injection=enabled
Annotations: <none>
Status: Active
No resource quota.
No LimitRange resource.
# kubectl get configmaps -n keycloak
No resources found in keycloak namespace.
# kubectl get configmaps -n default
NAME DATA AGE
istio-ca-root-cert 1 10d
@lei-tang Deployed Istio using the istioctl tool. Istioctl version (1.5.0) and Kubernetes cluster version (v1.15.7)
istioctl manifest apply \
--set values.grafana.enabled=true \
--set values.prometheus.enabled=true \
--set values.kiali.enabled=true --set values.kiali.createDemoSecret=true \
--set values.global.proxy.accessLogFile="/dev/dtdout"
Also deployed the same Istio1.5 in two different aks clusters, and this issue was observed in only one of the cluster. The whole problem started after istiod logged this warning - k8s.io/[email protected]/tools/cache/reflector.go:105: watch of *v1.ConfigMap ended with: too old resource version: 3142588
Istiod pod restart fixed the issue
I was able to fix this issue by deleting the istiod pod. On restart everything started working fine, till now no problems. After the istiod restart the configmap was automatically created for all new namespaces.
I'm hitting this issue as well... created 2 random namespaces, and none of them has any configmap created.
cc @howardjohn
I found this one when I pulled the 1.5 release branch on 3/27. I went back to this commit and the ConfigMap was getting created again.
@GregHanson that PR seems entirely unrelated. a restart of istiod also fixes that, do you think it was just "fixed" by a restart not that PR?
confirmed restart of istiod has helped populated istio-ca-root-cert cm for my non-default namespaces.
@howardjohn - I didn't mean this commit was the culprit - just that it must have come in to the release branch sometime after it. I hit the problem on Friday when I pulled, reverted to that commit, deleted and recreated my namespaces, and I wasn't able to reproduce
@lei-tang I have installed istio using
apiVersion: install.istio.io/v1alpha2
kind: IstioControlPlane
spec:
# Use the default profile as the base
# More details at: https://istio.io/docs/setup/additional-setup/config-profiles/
profile: default
values:
global:
# Ensure that the Istio pods are only scheduled to run on Linux nodes
defaultNodeSelector:
beta.kubernetes.io/os: linux
# Enable mutual TLS for the control plane
controlPlaneSecurityEnabled: true
mtls:
# Require all service to service communication to have mtls
enabled: false
grafana:
# Enable Grafana deployment for analytics and monitoring dashboards
enabled: true
security:
# Enable authentication for Grafana
enabled: true
kiali:
# Enable the Kiali deployment for a service mesh observability dashboard
enabled: true
tracing:
# Enable the Jaeger deployment for tracing
enabled: true
run the below command : istio-1.5.1
istioctl manifest apply -f istio_controlplane_config.yaml
The name space is labelled with
Labels: istio-injection=enabled
and deployed the application using helm. when the label is removed I can deploy the application.
Below is the error message
6m25s Warning FailedMount pod/vote-register-dev-78588cfcf6-kc568 MountVolume.SetUp failed for volume "istiod-ca-cert" : configmap "istio-ca-root-cert" not found
6m17s Warning FailedMount pod/vote-register-dev-78588cfcf6-kc568 Unable to mount volumes for pod "vote-register-dev-78588cfcf6-kc568_vote-dev(f4b4b9b1-766b-428c-a970-62a59c497a66)": timeout expired waiting for volumes to attach or mount for pod "vote-dev"/"vote-register-dev-78588cfcf6-kc568". list of unmounted volumes=[istiod-ca-cert]. list of unattached volumes=[default-token-brkz8 istio-envoy podinfo istio-token istiod-ca-cert]
too old resource version: 3142588
Which means the resource of the cluster updates frequently before push all updates to the client side.
When fails with this, the reflector will re list-watch again.
@howardjohn It should not be related to resync period.
BTW, in recent k8s, a feature called watchBookMark is introduced to mitigate too old resource version
The current implementation of the namespace controller is complicated and difficult to reason its correctness. The PR https://github.com/istio/istio/pull/22613 is created to simplify the implementation and may solve the bugs in this issue. With https://github.com/istio/istio/pull/22613, the implementation becomes very simple:
I have been facing this issue for a while. It was happening very regularly for me before. Moving from 2 to 1 istiod replica seems to have helped quite a bit. A restart of istiod always seems to resolve the issue.
Is everyone facing this issue on AKS? Seems 4 of the people that have reported it are on azure?
GKE @howardjohn
This problem is caused by SDS, you can close SDS to solve this problem.
When downloading and installing istio 1.5.1 on the day of release, I found that using istioctl manifest apply --set profile=demo can work normally, but using istioctl manifest generate does not work. I see related errors through kubelet -n istio-system get ev " list of unmounted volumes = [istiod-ca-cert]. list of unattached volumes = [default-token-brkz8 istio-envoy podinfo istio-token istiod-ca-cert] ", and I did not find a specific answer through Google. Finally, I saw the blog https://istio.io/blog/2019/trustworthy-jwt-sds/. After testing, my problem was solved.
Solution:
vim /etc/kubernetes/manifests/kube-apiserver.yaml
Add
spec:
containers:
kubernetes release 1.14.9
@tony-liuliu interesting. I just had this happen yesterday on GKE 1.15.9-gke.22
@howardjohn This is happening on IKS with the operator server pod and 1.5.1. Any other logs or info I can gather that has not already be done here?
At this point I am not really sure what is happening, seems like we are missing events or something. I have only seen one log from someone that had this issue and there were some api server connectivity issues, so it may be useful to see some more to see if there are any common trends (from Istiod). Thanks
I am facing this issue too in minikube, using Istio and MetalLB.
Steps that took me to this error:
prometheus
with the label istio-injection=false
: kubectl label namespace prometheus istio-injection=false
kubectl label namespace prometheus istio-injection=true --overwrite
kubectl get -n prometheus deployments $(kubectl get -n prometheus deployments -l app.kubernetes.io/name=grafana -o jsonpath='{.items[0].metadata.name}') --output yaml | istioctl kube-inject -f - | kubectl apply -n prometheus -f -
deployment.apps/prometheus-operator-grafana configured
Then I got this error:
Unable to attach or mount volumes: unmounted volumes=[istiod-ca-cert], unattached volumes=[storage sc-dashboard-volume podinfo sc-datasources-volume sc-dashboard-provider config prometheus-operator-grafana-token-lczb6 istiod-ca-cert istio-envoy]: timed out waiting for the condition
Afterwards I deleted the istiod-
pod from the istio-system
namespace, and the deployment proceeded to end successfully.
@guillem-riera Which version do you test on?
I am facing the same issue with 1.5.1
and doing a restart of the istiod
deployment is a valid workaround in the meantime:
kubectl -n istio-system rollout restart deployment istiod
@hzxuzhonghu , I am using 1.5.1
istioctl version
client version: 1.5.1
control plane version: 1.5.1
data plane version: 1.5.1 (10 proxies)
Could confirm the issue with istio 1.5.0 (AWS EKS 1.14) and 1.5.1 (AWS EKS 1.15). The issue is happening between 10th and 14th day after deploying. Bouncing istiod resolving issue. Last time we had 5 istiod pods, bouncing one of them resolved the issue and pushed cm into target namespace. Configmap by itself hasn't been changed, so it was the same certificate that other namespaces had.
Also, existing namespaces which already had configmap injected worked without any issue.
I scaled down istiod to one to reproduce the issue, but it is sitting there for a week without any problems ¯_(ツ)_/¯.
I will post here if I get some results on the testing.
I faced this issue some 24 days ago and redeployed istiod
and it worked. Now again am having the same issue in the AKS cluster(v1.15.7).
Noted one more thing that even after redeploying istiod configmap was not creating for the old namespaces which doesn't have it, had to manually copy certificate and create configmap in such cases, otherwise I had to delete namespace and recreate it.
client version: 1.5.0
control plane version: 1.5.0
data plane version: 1.5.0 (14 proxies)
/cc @myidpt
There are a few PRs merged for this issue (e.g., https://github.com/istio/istio/pull/22598 and https://github.com/istio/istio/pull/22606). Does this issue occur for an Istio binary with these PRs?
Meanwhile, thanks @emedina for sharing a workaround, which restarts the istiod
deployment through "kubectl -n istio-system rollout restart deployment istiod".
@lei-tang the resync period looks promising, can these be cherry picked into 1.5?
The PR (https://github.com/istio/istio/pull/22606) that adds the resync period has been cherry picked into the 1.5 branch as https://github.com/istio/istio/pull/22773 on 4-8-2020, which should be included in the next 1.5 release (1.5.2).
@lei-tang Has this eventually been included in 1.5.2
? It's definitively not in the changes of the release notes: https://istio.io/news/releases/1.5.x/announcing-1.5.2/#changes
Yes, https://github.com/istio/istio/pull/22773 has been included in 1.5.2, which can be found in https://github.com/istio/istio/commits/1.5.2.
Hi,
We are facing the same issue with 1.5.2 on AKS, our pods are stuck on the Init phase:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 42m default-scheduler Successfully assigned nto-payment/customer-app-5c5ff5c5f-hxs9f to aks-default-16109932-vmss000000
Warning FailedMount 11m (x23 over 42m) kubelet, aks-default-16109932-vmss000000 MountVolume.SetUp failed for volume "istiod-ca-cert" : configmap "istio-ca-root-cert" not found
Warning FailedMount 98s (x18 over 40m) kubelet, aks-default-16109932-vmss000000 Unable to mount volumes for pod "customer-app-5c5ff5c5f-hxs9f_nto-payment(4b227c43-d6dd-42f4-aac5-78afef26c8e0)": timeout expired waiting for volumes to attach or mount for pod "nto-payment"/"customer-app-5c5ff5c5f-hxs9f". list of unmounted volumes=[istiod-ca-cert]. list of unattached volumes=[default-token-h5c9m istio-envoy podinfo istio-token istiod-ca-cert]
using this IstioOperator
spec:
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
profile: default
components:
policy:
enabled: true
sidecarInjector:
enabled: true
citadel:
enabled: true
telemetry:
enabled: true
addonComponents:
prometheus:
enabled: false
values:
global:
disablePolicyChecks: false
telemetry:
v1:
enabled: true
v2:
enabled: false
started working again after restarting the istiod
deployment and our app pods.
@dosten Thanks for letting us know that this issue still occurs with 1.5.2 on AKS.
The PR #22613 is created to simplify the implementation of the namespace controller that manages the ConfigMap in this issue. It may solve the bugs in this issue, but is unverified. If anyone can try the PR #22613 and let us know the results, it will be helpful.
I also meet same issue with 1.5.2 on on-premise. Need restart istiod, and then istiod-ca-cert configmap was auto generated in namespace which label istio-injection=enabled.
@roywangtj can you please provide the full istiod logs ?
Added the contributors of the pilot/pkg/bootstrap/namespacecontroller.go (@howardjohn, @hzxuzhonghu) to collectively diagnose this issue.
For what its worth, Istio 1.5 installation is working fine on AKS 1.16.7 and 1.17.3 (preview)
@howardjohn there is no istiod logs, Maybe the issue only happens at first istiod installation. I try to reproduce it but failed.
I will list my first play scenario at the following:
Maybe issue will appear, I have no new k8s cluster to reproduce it, you can feel free test it.
Hello,
Same issue here, with istioctl 1.5.1, on AKS, version 1.15.7.
Florent
EDIT: I updated istio to 1.5.2 and it works fine.
For docker-desktop, trustworthy JWTs are still not enabled. Editing the API server pod config with:
- --service-account-key-file=/run/config/pki/sa.pub
+ - --service-account-signing-key-file=/run/config/pki/sa.key
+ - --service-account-issuer=api
+ - --api-audiences=api
can't be done as it's overwritten by the Docker Desktop application continuously.
However, since SDS moved into istiod
now, it can't be shut off; despite configuring the IstioOperator CRD like so:
sds:
enabled: false
# and further down
sds:
enabled: false
token:
aud: istio-ca
udsPath: ""
This error is new:
istiod-668dd9d86b-57nzg discovery {"level":"info","time":"2020-04-28T13:37:05.272378Z","scope":"validationController","msg":"Reconcile(enter): retry dry-run creation of invalid config"}
But then there's a new port that needs to be opened from GKE master -> 15017, or you'll get timeouts when deploying VirtualServices.
On the up-side, all tokens are generated and we got it running!!! 🔆
Same issue with istioctl 1.5.2, on Docker for mac(edge 2.3.0.0), k8s version 1.16.5.
it work fine after restart k8s
@foxracle restart k8s might a be a touch excessive. you can resolve by restarting the istiod pod(s)
Same issue with istioctl 1.5.2, on Docker Enterprise, RHEL 8.1, k8s version 1.14.8
It works fine after delete pod istiod
I have the same issue with Istio 1.5.2 on GKE 1.15.9-gke.24. restarting Istio worked
@hari-dmartlabs can you show your istiod logs?
Unfortunately, I couldn't collect logs. But right after the restart, the istiod service just vanished. It was like someone had deleted it. And the deployments started failing since injection couldn't find the service.
I had to recreate the service to get things working.
Will get the logs if it repeats.
Quite a scare.
kubectl -n istio-system rollout restart deployment istiod
@lei-tang the rollout restart also worked for me
Deployed onto AWS using _kops_ with Istio 1.5.2 -- not an AKS managed cluster.
I noticed issues after a high WS saturation test with a shared _Gateway_ in _istio-system_ and many _VirtualService_ kinds within individual namespaces. During my initial investigation, I noticed every pod within _istio-system_ had been recreated; however, the corresponding terminated pods never cleared up.
I get it too on istio 1.5.2 kubernetes v1.17.5 deployed with kubespray
If I delete a namespace and re-create it, the namespace doesn't have the secret created. Reproduced it twice in a row.
I should clarify, its when creating the namespace and about 40 other resources in the namespace with kapp deploy
I got it to work by creating the namespace by itself, pausing, then deploying everything else into the namespace
I have both a fix and a reproducer. Fix: https://github.com/istio/istio/pull/23783
To reproduce, modify the istio-leader configmap and change holder identity to something else. After a minute or so everything will be broken
I have the same issue when I upgrade istio from 1.5.0 (installed by apply helm generated yaml files) to 1.5.4 by apply istioctl manifest generate
yaml file.
I have both a fix and a reproducer. Fix: #23783
To reproduce, modify the istio-leader configmap and change holder identity to something else. After a minute or so everything will be broken
I got some thought from the aboved explain. So I check the istio-leader
.
When I see the configmap istio-leader
in the istio-system
, I found the clue. the old pilot are still the istio-leader, so the new istiod will not became the namespace controller, so the configmap istio-ca-root-cert
was not created in the namespace that was injected.
apiVersion: v1
kind: ConfigMap
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"istio-pilot-9979d9b5c-bw7sw","leaseDurationSeconds":30,"acquireTime":"2020-05-29T01:36:04Z","renewTime":"2020-05-29T02:50:45Z","leaderTransitions":0}'
creationTimestamp: "2020-05-29T01:36:04Z"
name: istio-leader
namespace: istio-system
resourceVersion: "41583"
selfLink: /api/v1/namespaces/istio-system/configmaps/istio-leader
uid: 4842a8ce-fb32-4224-91e2-9e2e51acf86b
In order to fix this, we shoud make the new istiod become the istio-leader
, so you can delete the old istio-pilot, or delete the istio-leader
configmap in the istio-system
, and then check if the new istiod is become the new leader
@mgxian thank you - this makes a lot of sense.
@howardjohn @ostromart should we delete the Istio-leader cm during upgrade from 1.4 to 1.5?
I don't think so. How come the old pilot are still around if you upgrades for 1.5 to 1.5.4? They should be replaced
@mgxian I noticed you used istioctl manifest generate
to upgrade... which I'd assume old pilot from 1.4 will still be around.
istioctl upgrade
should handle this properly in theory, pending tests.
@howardjohn in which release is your fix targeted ?
@howardjohn I think we need to retrospect leader election code again since there are so many issues maybe related.
Same problem: kubectl -n istio-system rollout restart deployment istiod - fixed issue but I'm not sure that I want to use this trick every time when I face this issue...
AKS: 1.16.7
istioctl version
client version: 1.6.0
control plane version: 1.5.2
data plane version: 1.5.2
@KIRY4 just update to 1.5.4 and it's fixed
@howardjohn is the fix already in 1.5.4? The linked PR seems to have been merged 8 days ago
@dosten my apologies, it is not in 1.5.4. Will be in 1.5.5 I guess. I forgot 1.5.4 was a security release so it was skipped
@howardjohn I'm hitting this on 1.6.1? Just following the basic install
istioctl manifest generate > $HOME/generated-manifest.yaml
kubectl create ns istio-system
kubectl apply -f $HOME/generated-manifest.yaml
kubectl -n istio-system describe pod istiod-5c99cfc846-b6wk8
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m38s default-scheduler Successfully assigned istio-system/istiod-5c99cfc846-b6wk8 to prod-minion1
Warning FailedMount 35s kubelet, prod-minion1 Unable to attach or mount volumes: unmounted volumes=[istio-token], unattached volumes=[inject istiod-service-account-token-8mzd2 config-volume istio-token local-certs cacerts]: timed out waiting for the condition
Warning FailedMount 28s (x9 over 2m37s) kubelet, prod-minion1 MountVolume.SetUp failed for volume "istio-token" : failed to fetch token: the server could not find the requested resource
kubectl -n istio-system describe pod istio-ingressgateway-74d4d8d459-q549t
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m35s default-scheduler Successfully assigned istio-system/istio-ingressgateway-74d4d8d459-q549t to prod-minion1
Warning FailedMount 92s kubelet, prod-minion1 Unable to attach or mount volumes: unmounted volumes=[istio-token istiod-ca-cert], unattached volumes=[istio-token podinfo ingressgateway-certs config-volume ingressgatewaysdsudspath istio-ingressgateway-service-account-token-k7rrt istio-envoy istiod-ca-cert ingressgateway-ca-certs]: timed out waiting for the condition
Warning FailedMount 87s (x9 over 3m34s) kubelet, prod-minion1 MountVolume.SetUp failed for volume "istiod-ca-cert" : configmap "istio-ca-root-cert" not found
Warning FailedMount 86s (x9 over 3m34s) kubelet, prod-minion1 MountVolume.SetUp failed for volume "istio-token" : failed to fetch token: the server could not find the requested resource
@dkirrane that is a difference issue, root cause is Istiod not starting up. Probably because of https://preliminary.istio.io/docs/ops/best-practices/security/#configure-third-party-service-account-tokens
@howardjohn istio 1.5.5 is out , release notes just talk about security update. Is this change in ?
any update ? I just got the same issue !
Istio version: 1.5.1
restarting istiod pod fixed the problem.
I think it's fixed in 1.5.4. See comment above https://github.com/istio/istio/issues/22463#issuecomment-638215549
@howardjohn @myidpt @zhaohuabing HI, not sure why the issue has been closed without resolution, we are running into this issue on istio 1.5.4, so, can you please confirm if it's fixed in 1.5.4 or let us know the version in which it is fixed ?
I was facing the same problem, in version 1.5.4, but when restart the istiod restart fixed the problem. before restart even though multiple namespaces were created, configmap was not getting instantiated for the namespaces.
@rajivml @Mdirfan6 Actually this is fixed in 1.5.6, please see the release announcement
https://istio.io/latest/news/releases/1.5.x/announcing-1.5.6/.
Thanks for the update @knight42
Still facing the same issue in 1.5.6
Still facing the same issue in 1.5.6
yes we too observed it on 1.5.6
I also took action with istiod restart.
Istio 1.7.2
Having restarts in istiod, ingressgateway and also failed to mount.
it's really bad, and istiod restart not helps.
Any workaround here?
Most helpful comment
I am facing the same issue with
1.5.1
and doing a restart of theistiod
deployment is a valid workaround in the meantime: