K3s: "revision has been compact" repeated in the k3s log

Created on 12 Jan 2020  路  16Comments  路  Source: k3s-io/k3s

Version:
v1.17.0+k3s.1

Describe the bug

The k3s logs are filled with these messages. I recently upgraded the cluster from v1.0.1 and only started to notice these errors since the upgrade.

The only reference I can find for this error message is here https://github.com/rancher/kine/blob/master/pkg/server/types.go. Apologies if it is not k3s related but I am struggling to find any other reference.

Jan 12 15:55:31 k3s-0 k3s[20260]: time="2020-01-12T15:55:31.873615770Z" level=error msg="error while range on /registry/runtimeclasses/ /registry/runtimeclasses/: revision has been compact"
Jan 12 15:55:31 k3s-0 k3s[20260]: E0112 15:55:31.874460   20260 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 12 15:55:31 k3s-0 k3s[20260]: time="2020-01-12T15:55:31.890895443Z" level=error msg="error while range on /registry/networkpolicies/ /registry/networkpolicies/: revision has been compact"
Jan 12 15:55:31 k3s-0 k3s[20260]: E0112 15:55:31.891273   20260 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 12 15:55:31 k3s-0 k3s[20260]: time="2020-01-12T15:55:31.903446015Z" level=error msg="error while range on /registry/namespaces/ /registry/namespaces/: revision has been compact"
Jan 12 15:55:31 k3s-0 k3s[20260]: E0112 15:55:31.903896   20260 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 12 15:55:31 k3s-0 k3s[20260]: time="2020-01-12T15:55:31.910270374Z" level=error msg="error while range on /registry/services/specs/ /registry/services/specs/: revision has been compact"
Jan 12 15:55:31 k3s-0 k3s[20260]: E0112 15:55:31.910619   20260 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 12 15:55:31 k3s-0 k3s[20260]: time="2020-01-12T15:55:31.943914165Z" level=error msg="error while range on /registry/services/specs/ /registry/services/specs/: revision has been compact"
Jan 12 15:55:31 k3s-0 k3s[20260]: E0112 15:55:31.944332   20260 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 12 15:55:32 k3s-0 k3s[20260]: time="2020-01-12T15:55:32.059049239Z" level=error msg="error while range on /registry/services/specs/ /registry/services/specs/: revision has been compact"
Jan 12 15:55:32 k3s-0 k3s[20260]: E0112 15:55:32.059797   20260 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 12 15:55:32 k3s-0 k3s[20260]: time="2020-01-12T15:55:32.145574240Z" level=error msg="error while range on /registry/csidrivers/ /registry/csidrivers/: revision has been compact"

Additional context

I am running a 6 node cluster with a mix of AMD64 and ARM64 nodes. The nodes and pods are otherwise healthy.

NAME    STATUS   ROLES    AGE     VERSION
k3s-1   Ready    worker   9d      v1.17.0+k3s.1
pi4-a   Ready    worker   2d19h   v1.17.0+k3s.1
k3s-2   Ready    worker   9d      v1.17.0+k3s.1
pi4-c   Ready    worker   7d      v1.17.0+k3s.1
k3s-0   Ready    master   9d      v1.17.0+k3s.1
pi4-b   Ready    worker   2d18h   v1.17.0+k3s.1

Most helpful comment

So I did some digging in the logs tonight and found that two of the nodes were spamming the following.

Jan 13 21:41:22 k3s-1 k3s[868]: E0113 21:41:22.978783     868 reflector.go:156] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.217184     868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Service: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.650262     868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.811611     868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.CSIDriver: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.852121     868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.NetworkPolicy: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.869329     868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Namespace: rpc error: code = Unknown desc = revision has been compact

A restart of the k3s-agent on both hosts stopped the errors. I also found that theses errors started after the same nodes got rebooted. As you can see from the prometheus graph below there were a high number of 503 errors after the reboot. Since restarting the agents the errors have stopped.

Capture

All 16 comments

I am noticing the same.
Same version from k3s.
My research also found out it is something k3s related.
At first I thought it happened when removing a node from the cluster, but it appears, that i always happens. Even single node.

Jan 12 20:27:46 node1 k3s[848]: E0112 20:27:46.862433     848 reflector.go:156] k8s.io/client-go/metadata/metadatainformer/informer.go:89: Failed to list *v1.PartialObjectMetadata: rpc error: code = Unknown desc = revision has been compact
Jan 12 20:27:46 node1 k3s[848]: time="2020-01-12T20:27:46.870017486+01:00" level=error msg="error while range on /registry/ceph.rook.io/cephclients/ /registry/ceph.rook.io/cephclients/: revision has been compact"
Jan 12 20:27:46 node1 k3s[848]: E0112 20:27:46.870321     848 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}

So I did some digging in the logs tonight and found that two of the nodes were spamming the following.

Jan 13 21:41:22 k3s-1 k3s[868]: E0113 21:41:22.978783     868 reflector.go:156] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.217184     868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Service: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.650262     868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.811611     868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.CSIDriver: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.852121     868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.NetworkPolicy: rpc error: code = Unknown desc = revision has been compact
Jan 13 21:41:23 k3s-1 k3s[868]: E0113 21:41:23.869329     868 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Namespace: rpc error: code = Unknown desc = revision has been compact

A restart of the k3s-agent on both hosts stopped the errors. I also found that theses errors started after the same nodes got rebooted. As you can see from the prometheus graph below there were a high number of 503 errors after the reboot. Since restarting the agents the errors have stopped.

Capture

Mine didn't stopped after a reboot tho.
It's atm spamming for over 24 hours when all works fine

The same for me. It already happened twice. There first time I thought this is because I renamed my master node (don't do that ;-), so I started completely fresh with a new installation 4 days ago. But the Sqlite error (the error comes from Kine) re-appears and I have no idea to fix.

Beside spamming the logs, the other, more problematic sympton is, that services are broken now, so that I get connection refused on those services (which worked before).

Please help ! This is really rendering the cluster totally useless.

I can confirm, that restarting the agent (service k3s-agent restart) on the node helped. Thanks for the workaround, but please let's fix that as it is not a corner-case issue. Let me know how I can help (e.g. provide more debugging info)

Also the services start to work magically again.

Mine didn't stopped after a reboot tho.

Not rebooting helps (it actually seems to cause the error), but restarting the agent again after a reboot helped.

Just got the same thing after restarting the k3s server with systemctl restart k3s on Ubuntu. I had to restart k3s-agent on the other node in order get everything working.

k3s server logs showed:

Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.002177   29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 15 14:49:00 seago.khaus k3s[29963]: time="2020-01-15T14:49:00.004905271-08:00" level=error msg="error while range on /registry/services/specs/ /registry/services/specs/: revision has been compact"
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.005340   29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.006634   29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.008154   29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 15 14:49:00 seago.khaus k3s[29963]: time="2020-01-15T14:49:00.171927417-08:00" level=error msg="error while range on /registry/runtimeclasses/ /registry/runtimeclasses/: revision has been compact"
Jan 15 14:49:00 seago.khaus k3s[29963]: time="2020-01-15T14:49:00.174663746-08:00" level=error msg="error while range on /registry/csidrivers/ /registry/csidrivers/: revision has been compact"
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.175094   29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.175763   29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 15 14:49:00 seago.khaus k3s[29963]: time="2020-01-15T14:49:00.495781769-08:00" level=error msg="error while range on /registry/services/specs/ /registry/services/specs/: revision has been compact"
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.496953   29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jan 15 14:49:00 seago.khaus k3s[29963]: time="2020-01-15T14:49:00.996062826-08:00" level=error msg="error while range on /registry/networkpolicies/ /registry/networkpolicies/: revision has been compact"
Jan 15 14:49:00 seago.khaus k3s[29963]: E0115 14:49:00.996725   29963 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"revision has been compact", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}

k3s agent logs showed:

Jan 15 14:49:00 maersk.khaus k3s[1678]: E0115 14:49:00.009002    1678 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Pod: rpc error: code = Unknown desc = revision has been compact
Jan 15 14:49:00 maersk.khaus k3s[1678]: E0115 14:49:00.012194    1678 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Namespace: rpc error: code = Unknown desc = revision has been compact
Jan 15 14:49:00 maersk.khaus k3s[1678]: E0115 14:49:00.013557    1678 reflector.go:156] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: rpc error: code = Unknown desc = revision has been compact
Jan 15 14:49:00 maersk.khaus k3s[1678]: E0115 14:49:00.014981    1678 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Service: rpc error: code = Unknown desc = revision has been compact
Jan 15 14:49:00 maersk.khaus k3s[1678]: E0115 14:49:00.181475    1678 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: rpc error: code = Unknown desc = revision has been compact
Jan 15 14:49:00 maersk.khaus k3s[1678]: E0115 14:49:00.182540    1678 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.CSIDriver: rpc error: code = Unknown desc = revision has been compact
Jan 15 14:49:00 maersk.khaus k3s[1678]: E0115 14:49:00.503843    1678 reflector.go:156] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: rpc error: code = Unknown desc = revision has been compact

me too, some error, version is v1.17.0+k3s.1

Same here, on a 3 node raspberry cluster. After restarting agents on non-master nodes the errors are gone. I am not sure what will make them reappear, restart of k3s afterwards on master didn't trigger it ...

This seems to have something to do with the kine etcd compaction emulation. etcd periodically compacts (deletes) old keyspace revisions to reduce storage utilization. kine emulates this by recording the revision ID as compacted, and then deleting any rows with that revision from the database. This seems to be normal etcd behavior that kine is emulating. I'm having a hard time deciphering the documentation, but etcd seems to auto-compact hourly, and keeps the last 1000 revisions. I can't tell what kine is doing but it seems to me like it might be compacting too aggressively?

or, alternately, it might be the fact that kine uses a different error for compacted revisions? Compare: https://github.com/etcd-io/etcd/blob/master/etcdserver/api/v3rpc/rpctypes/error.go#L30
vs
https://github.com/rancher/kine/blob/master/pkg/server/types.go#L11

There is a PR in at https://github.com/rancher/kine/pull/21 which I think will help address this

looks like a bingo to me

Fixed in v1.17.2-alpha3+k3s1. Logs do not contain the mentioned messages. Closing issue. Please feel free to re-open or create new issue if there are any concerns.

This issue was seen when the master nodes were scaled down to zero in a HA setup and brought back up. When a new master to the NLB that lead the agent to switch from not ready to ready and since endpoint controller was broken and other controllers endpoints wasn't updated.

Closing as the original issue is resolved.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ewoutp picture ewoutp  路  4Comments

pierreozoux picture pierreozoux  路  4Comments

jgreat picture jgreat  路  3Comments

giezi picture giezi  路  3Comments

gilkotton picture gilkotton  路  3Comments