Version:
v1.17.0+k3s.1
Describe the bug
The k3s logs are filled with these messages. I recently upgraded the cluster from v1.0.1 and only started to notice these errors since the upgrade.
The only reference I can find for this error message is here https://github.com/rancher/kine/blob/master/pkg/server/types.go. Apologies if it is not k3s related but I am struggling to find any other reference.
Jan 12 15:55:31 k3s-0 k3s[20260]: time="2020-01-12T15:55:31.873615770Z" level=error msg="error while range on /registry/runtimeclasses/ /registry/runtimeclasses/: revision has been compact"
Jan 12 15:55:31 k3s-0 k3s[20260]: E0112 15:55:31.874460 20260 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 12 15:55:31 k3s-0 k3s[20260]: time="2020-01-12T15:55:31.890895443Z" level=error msg="error while range on /registry/networkpolicies/ /registry/networkpolicies/: revision has been compact"
Jan 12 15:55:31 k3s-0 k3s[20260]: E0112 15:55:31.891273 20260 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 12 15:55:31 k3s-0 k3s[20260]: time="2020-01-12T15:55:31.903446015Z" level=error msg="error while range on /registry/namespaces/ /registry/namespaces/: revision has been compact"
Jan 12 15:55:31 k3s-0 k3s[20260]: E0112 15:55:31.903896 20260 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 12 15:55:31 k3s-0 k3s[20260]: time="2020-01-12T15:55:31.910270374Z" level=error msg="error while range on /registry/services/specs/ /registry/services/specs/: revision has been compact"
Jan 12 15:55:31 k3s-0 k3s[20260]: E0112 15:55:31.910619 20260 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 12 15:55:31 k3s-0 k3s[20260]: time="2020-01-12T15:55:31.943914165Z" level=error msg="error while range on /registry/services/specs/ /registry/services/specs/: revision has been compact"
Jan 12 15:55:31 k3s-0 k3s[20260]: E0112 15:55:31.944332 20260 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 12 15:55:32 k3s-0 k3s[20260]: time="2020-01-12T15:55:32.059049239Z" level=error msg="error while range on /registry/services/specs/ /registry/services/specs/: revision has been compact"
Jan 12 15:55:32 k3s-0 k3s[20260]: E0112 15:55:32.059797 20260 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 12 15:55:32 k3s-0 k3s[20260]: time="2020-01-12T15:55:32.145574240Z" level=error msg="error while range on /registry/csidrivers/ /registry/csidrivers/: revision has been compact"
Additional context
I am running a 6 node cluster with a mix of AMD64 and ARM64 nodes. The nodes and pods are otherwise healthy.
NAME STATUS ROLES AGE VERSION
k3s-1 Ready worker 9d v1.17.0+k3s.1
pi4-a Ready worker 2d19h v1.17.0+k3s.1
k3s-2 Ready worker 9d v1.17.0+k3s.1
pi4-c Ready worker 7d v1.17.0+k3s.1
k3s-0 Ready master 9d v1.17.0+k3s.1
pi4-b Ready worker 2d18h v1.17.0+k3s.1
I am noticing the same.
Same version from k3s.
My research also found out it is something k3s related.
At first I thought it happened when removing a node from the cluster, but it appears, that i always happens. Even single node.
Jan 12 20:27:46 node1 k3s[848]: E0112 20:27:46.862433 848 reflector.go:156] k8s.io/client-go/metadata/metadatainformer/informer.go:89: Failed to list *v1.PartialObjectMetadata: rpc error: code = Unknown desc = revision has been compact
Jan 12 20:27:46 node1 k3s[848]: time="2020-01-12T20:27:46.870017486+01:00" level=error msg="error while range on /registry/ceph.rook.io/cephclients/ /registry/ceph.rook.io/cephclients/: revision has been compact"
Jan 12 20:27:46 node1 k3s[848]: E0112 20:27:46.870321 848 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
So I did some digging in the logs tonight and found that two of the nodes were spamming the following.
Jan 13 21:41:22 k3s-1 k3s[868]: E0113 21:41:22.978783 868 reflector.go:156] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.217184 868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Service: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.650262 868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.811611 868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.CSIDriver: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.852121 868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.NetworkPolicy: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.869329 868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Namespace: rpc error: code = Unknown desc = revision has been compact
A restart of the k3s-agent on both hosts stopped the errors. I also found that theses errors started after the same nodes got rebooted. As you can see from the prometheus graph below there were a high number of 503 errors after the reboot. Since restarting the agents the errors have stopped.
Mine didn't stopped after a reboot tho.
It's atm spamming for over 24 hours when all works fine
The same for me. It already happened twice. There first time I thought this is because I renamed my master node (don't do that ;-), so I started completely fresh with a new installation 4 days ago. But the Sqlite error (the error comes from Kine) re-appears and I have no idea to fix.
Beside spamming the logs, the other, more problematic sympton is, that services are broken now, so that I get connection refused on those services (which worked before).
Please help ! This is really rendering the cluster totally useless.
I can confirm, that restarting the agent (service k3s-agent restart) on the node helped. Thanks for the workaround, but please let's fix that as it is not a corner-case issue. Let me know how I can help (e.g. provide more debugging info)
Also the services start to work magically again.
Mine didn't stopped after a reboot tho.
Not rebooting helps (it actually seems to cause the error), but restarting the agent again after a reboot helped.
Just got the same thing after restarting the k3s server with systemctl restart k3s on Ubuntu. I had to restart k3s-agent on the other node in order get everything working.
k3s server logs showed:
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.002177 29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 15 14:49:00 seago.khaus k3s[29963]: time="2020-01-15T14:49:00.004905271-08:00" level=error msg="error while range on /registry/services/specs/ /registry/services/specs/: revision has been compact"
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.005340 29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.006634 29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.008154 29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 15 14:49:00 seago.khaus k3s[29963]: time="2020-01-15T14:49:00.171927417-08:00" level=error msg="error while range on /registry/runtimeclasses/ /registry/runtimeclasses/: revision has been compact"
Jan 15 14:49:00 seago.khaus k3s[29963]: time="2020-01-15T14:49:00.174663746-08:00" level=error msg="error while range on /registry/csidrivers/ /registry/csidrivers/: revision has been compact"
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.175094 29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.175763 29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 15 14:49:00 seago.khaus k3s[29963]: time="2020-01-15T14:49:00.495781769-08:00" level=error msg="error while range on /registry/services/specs/ /registry/services/specs/: revision has been compact"
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.496953 29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 15 14:49:00 seago.khaus k3s[29963]: time="2020-01-15T14:49:00.996062826-08:00" level=error msg="error while range on /registry/networkpolicies/ /registry/networkpolicies/: revision has been compact"
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.996725 29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
k3s agent logs showed:
Jan 15 14:49:00 maersk.khaus k3s[1678]: E0115 14:49:00.009002 1678 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Pod: rpc error: code = Unknown desc = revision has been compact
Jan 15 14:49:00 maersk.khaus k3s[1678]: E0115 14:49:00.012194 1678 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Namespace: rpc error: code = Unknown desc = revision has been compact
Jan 15 14:49:00 maersk.khaus k3s[1678]: E0115 14:49:00.013557 1678 reflector.go:156] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: rpc error: code = Unknown desc = revision has been compact
Jan 15 14:49:00 maersk.khaus k3s[1678]: E0115 14:49:00.014981 1678 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Service: rpc error: code = Unknown desc = revision has been compact
Jan 15 14:49:00 maersk.khaus k3s[1678]: E0115 14:49:00.181475 1678 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: rpc error: code = Unknown desc = revision has been compact
Jan 15 14:49:00 maersk.khaus k3s[1678]: E0115 14:49:00.182540 1678 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.CSIDriver: rpc error: code = Unknown desc = revision has been compact
Jan 15 14:49:00 maersk.khaus k3s[1678]: E0115 14:49:00.503843 1678 reflector.go:156] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: rpc error: code = Unknown desc = revision has been compact
me too, some error, version is v1.17.0+k3s.1
Same here, on a 3 node raspberry cluster. After restarting agents on non-master nodes the errors are gone. I am not sure what will make them reappear, restart of k3s afterwards on master didn't trigger it ...
This seems to have something to do with the kine etcd compaction emulation. etcd periodically compacts (deletes) old keyspace revisions to reduce storage utilization. kine emulates this by recording the revision ID as compacted, and then deleting any rows with that revision from the database. This seems to be normal etcd behavior that kine is emulating. I'm having a hard time deciphering the documentation, but etcd seems to auto-compact hourly, and keeps the last 1000 revisions. I can't tell what kine is doing but it seems to me like it might be compacting too aggressively?
or, alternately, it might be the fact that kine uses a different error for compacted revisions? Compare: https://github.com/etcd-io/etcd/blob/master/etcdserver/api/v3rpc/rpctypes/error.go#L30
vs
https://github.com/rancher/kine/blob/master/pkg/server/types.go#L11
There is a PR in at https://github.com/rancher/kine/pull/21 which I think will help address this
looks like a bingo to me
Fixed in v1.17.2-alpha3+k3s1. Logs do not contain the mentioned messages. Closing issue. Please feel free to re-open or create new issue if there are any concerns.
This issue was seen when the master nodes were scaled down to zero in a HA setup and brought back up. When a new master to the NLB that lead the agent to switch from not ready to ready and since endpoint controller was broken and other controllers endpoints wasn't updated.
Closing as the original issue is resolved.
Most helpful comment
So I did some digging in the logs tonight and found that two of the nodes were spamming the following.
A restart of the k3s-agent on both hosts stopped the errors. I also found that theses errors started after the same nodes got rebooted. As you can see from the prometheus graph below there were a high number of 503 errors after the reboot. Since restarting the agents the errors have stopped.