K9s: K9s is slow in large clusters

Created on 12 Apr 2020 · 30Comments · Source: derailed/k9s

Is your feature request related to a problem? Please describe.
Im trying to use k9s for my work and i was having issues with k9s being so slow when connecting to large(2k deployments and around 4k pods) cluster.(k8s version v1.14) I have set the refresh time to 10 seconds but it hasnt changed anything. With kubectl(v1.17 locally) command it would take around 2 seconds to retrive all pods or deployments but launcing k9s and waiting even more than 10 seconds.

Describe the solution you'd like
Ive read the doc files both in github and in website and i think it may related to k8s version(not sure how to debug this though) It is said that k9s would work best with latest version of k8s but the truth is in production level k8s will not always be that latest rather it would be a couple of version behind because nobody wants to mess with a working system.It may be also good to support earlier release of k8s or perhaps point out the releases that would work best with that specific k8s version.

performance

Source

fazilhero

👍3

All 30 comments

@fazilhero Thank you for this issue! Do you run metrics-server on your cluster? Also does the perf improves if you filter your pods by namespace vs all-namespaces ie k9s -n fred?

derailed on 14 Apr 2020

I m not sure about the metrics-server but the issue is with opening the k9s itself. Im using the latest version at the moment when i type k9s with kubeconfig in readonly mode it takes a lot of time to get pods and etc. With kubectl commanda it takes about 2 seconds to retrive pods but k9s takes more than 30 seconds and sometimes i give up and close it. Once its is loaded it is somewhat okay but the first part is quite slow. That also inclued xray part

fazilhero on 14 Apr 2020

👍1

@fazilhero I can see that 4k pods raw load will take some time especially in light of having to get metrics. Have you tried starting k9s with the -n option to pre-select a namespace ie

k9s -n fred

Is K9s still slow loading for a given namespace? Do you still experience slowness while viewing deployments vs pods in all namespaces? Also it will be useful to know if your cluster uses a metrics-server ie does kube-system/metrics-server exists? Any details here would be super useful so we can track this down ie k9s logs. Tx!

derailed on 25 Apr 2020

@derailed I have tried with one namespace but i guess it is still too much it didn't effect that much. Starting k9s and retrieving everything the first time is very slow. Say last time i checked pods and quit. Next time I open it will pick up where i left off and it takes a lot of load that part. We are using prometeus for metrics I think because i didn't see anythign related to metrics server.

fazilhero on 27 Apr 2020

@fazilhero Thank you for the update! Wondering if you load k9s using deployments vs pods ie

k9s -c dp -n MY_XXX_NAMESPACE

Is it still slow coming up? How many deployments/nodes exist on your cluster?

If your cluster does not run a metrics-server there should not be pod columns for CPU/MEM.
Is this what you are seeing?

derailed on 1 May 2020

I just executed the same command. With pure kubectl get deployments took around 3 seconds while k9s took 15 seconds but i was able to see the interface within 4 seconds. Retrieving deployments took around 10 seconds. We have 1557 deployments at the moment of execution within the one namespace which is default. When select the certain deployment to see pods i was able to see mem, cpu columns also. Even clicking the certain deployment to retrieve pods seems about 5 seconds or more to get the pods(5 pods for the one im looking). Hope it helps

fazilhero on 1 May 2020

@fazilhero Thank you so much for this extra info! Every little bits helps... I think I've found one issue for sure... I'll try to address some of this in the next drop. Thank you for your patience!!

derailed on 5 May 2020

👍1

@fazilhero I've added a few things that should speed up some of this in v0.19.5. Not done yet so I'll leave this open. Please lmk if you see better results launching K9s in a single namespace. Tx!!

derailed on 16 May 2020

@derailed Much faster now! Thanks for your work 👍

rkiyanchuk on 16 May 2020

@zoresvit Thank you for reporting back and for your kind words!! So happy to hear this...

derailed on 16 May 2020

@derailed I also experienced a bit of a speed up with the release. I have couple of things I would like to add couple of suggestion for speed improvement:

We can start the k9s menu within milliseconds (if im not wrong) and it would be initially empty and as the data gets pulled we can see things in our screen. Currently this is not the case with large clusters as it is taking around 5-10 second before i see empty k9s screen and shortly after( another 5 second or so) i see pods, dp etc.
- I was thinking if we could have filtering while starting k9s. (this would also help loading time with large cluster as I don't need to investigate all pods but the ones im interested in. So If say something like this:
  
  k9s -n default -c pod --filter INSERT_FILTER_HERE
  
  I would assume this would be fast because im interested in only pods that fits my filter. This would eliminate the need for pulling data unnecessary data.

What do you think?

fazilhero on 31 May 2020

@fazilhero Thank you for the update Fazil! I totally agree with your first point and have been thinking about that. I'll need to noodle on your second point. I think it may make sense. For the time being does k9s -c deploy -n fred improves k9s load time on your particular cluster?

derailed on 1 Jun 2020

@derailed Yes it improves load time on large cluster but as you also indicated it's partially for now. What I also realized with medium and small sized clusters that transition through things easy. In large clusters, sometimes i experience a UI freeze for a second. For instance, I filter typing my pod name on screen and i get 10 results lets say. Navigating through pod to see containers and coming back a bit slower compared to small and medium size clusters. I feel like when i go back we are pulling all the data from cluster and then do the filtering again. I am not sure what the cause is. Just my opinion.

fazilhero on 1 Jun 2020

@derailed
Maybe add an option to not connect to any cluster when starting with -c context? I think this might get faster startup time by eliminating any wait time related to data fetching.

thllxb on 7 Aug 2020

Hi @derailed, I still have slowness on large clusters with around 1k pods. I'm using version 0.22.1.
We have 2 GKE clusters, both in version v1.15.12-gke.2.
On the first cluster with around 1k pods on 35 nodes, k9s is slow to start (7s), and browsing is also slow (10s to enter a deployment). Other k9s users on this cluster have the same problem.
On the second cluster with 163 pods / 5 hosts, k9s is much faster and is usable.
I can provide more information in order to help you to solve this problem. We were very happy to use k9s as a replacement for the GCP console which is not as user-friendly as k9s, but unfortunately it is actually not usable.

slavogiez on 27 Sep 2020

@slavogiez Thank you for your report! You're specifying initial start of k9s for pods/dps. How many dps do you have in your large cluster? Also once the pods/dps are initially loaded, do you still see slowness loadings the corresponding views? If so what do the timing numbers look then for pods and deployments ie do they remain the same after initial load? Also on the larger cluster what does timing look like if your start k9s in a given namespace ie k9s -n fred -c dp

derailed on 27 Sep 2020

@derailed Thanks for you quick answer!
We have 250 deployments on the large cluster and 5370 replica sets.
Slowness is still present after initial loading. For example, it is slow to browse inside a pod/container to check logs, and still slow to browse back to the list of pods (When I press the escape key to go back, it takes around 5 sec to display the previous screen).
I tried to start k9s with the command you provided and it doesn't seem to change anything.

slavogiez on 28 Sep 2020

@slavogiez Thank you for reporting back! I've made some optimization in the latest drop coming soon ie v0.23.0. Would you mind taking a peek and let us know if we've moved the needle a bit in the right direction?? Tx!

derailed on 28 Oct 2020

👍1

I can confirm this is still an issue. I have two large clusters, each with about 1k+ pods and about averaging 70+ nodes. Performance was pretty consistent on 0.22.1, and quite usable. After upgrading to 0.23.x, it takes about 10s+ to do just about anything, including:

switching contexts
switching between resource types (pods, nodes, etc)
describing or editing resources

Downgrading to 0.22.1 immediately resolved all of the performance issues experienced in 0.23.x

Version info:

MacOS Catalina 10.15.7
Kubectl 1.17.3
K8s 1.16.13 (EKS)

goodemk on 2 Nov 2020

@goodemk Thank you for reporting this! I've taken a quick pass here and think I might have found some issues indeed with v0.23.x drop. I'll push some changes next. Please let us know if we've moved the needle in the right direction ie better or worst. I am surprised the describe/edit is causing latency tho?? Thank you Michael!

derailed on 3 Nov 2020

@derailed I've also noticed the above slowness, when browsing our production cluster's node resources. Editing the resources didn't take too long though. Our cluster consists of about 60 nodes, with about 2000 pods altogether. Once the fix is out, I'll get back to you with the results.

szenti on 3 Nov 2020

@goodemk @szenti - Thank you for your kindness and help figuring this out! I did find a few perf issues with nodes/pods views relating to the newly introduced metric cols. Let's see if we're happier with v0.23.4 (Crossing fingers/toes). I have a better plan in the works but let see if we can make this more manageable in the short term... Thank you for your patience!

derailed on 3 Nov 2020

@derailed Following up after trying v0.23.4

I did some more thorough testing on this version to try and pinpoint where the performance really hits the wall. It would seem that K9s really struggles with any actions related to node resources (viewing/editing/switching-to/etc).

Concerning all resources _excluding_ nodes:

Switching to pods: ~instant
Switching from pods: ~instant
Editing and describing: ~instant

Concerning _only_ nodes:

Switching to nodes: ~10s
Switching from nodes: ~10s
Editing a node: ~10s
Describing a node: ~45s+ (!)
Viewing pods on a specific node: ~5s
Backing out from edit/describe/pod view on a node: ~5-10s

goodemk on 4 Nov 2020

👍1

@goodemk Thank you some much Michael for this great report and details! I see where things went south now and will fix in the next drop v0.23.5 (coming up next!). The new resources columns on node views on bigger clusters are indeed creating the lags. I'll axe this feature til we can figure out a better way to handle larger volumes.
Please let me know if we're happier... Tx!!

derailed on 5 Nov 2020

@derailed Testing out changes in v0.23.7

Performance has improved significantly for describing a node, but remains slow for other node operations. Other non-node features and operations appear to be working as normal.

Switching to nodes: ~10s (no change)
Switching from nodes: ~10s (no change)
Editing a node: ~10s (no change)
Describing a node: ~5s
Viewing pods on a specific node: ~5s (no change)
Backing out from edit/describe/pod view on a node: ~5-10s (no change)

goodemk on 6 Nov 2020

@goodemk Thank you so very much Michael for reporting back!
I am not quite sure why edit is taking so long since k9s is just shelling out to kubectl?
Could you share the some timing info for kubectl edit no xxx?
That said I think I've made some big improvements to the node view. Based on what I am seeing here at the ranch it should now be several factor(s) faster to render this view.
I'll drop v0.23.8 in hope you will have similar observations with your cluster. Mind you the initial node load will be not be timely on a 70+ node cluster but I expect subsequent loads to be significantly faster than v0.23.7.

Crossing both fingers and toes on that deal as theses made for a long day in the saddle...
Hoping we can put this perf issue to bed on this drop...

Thank you Michael!

derailed on 6 Nov 2020

FYI, on my side, i have a cluster with 150 nodes and 3000 pods on aks.

on 0.23.4
Switching secret to pods: instant
Switching pods to nodes: 80s
Switching from nodes to pods: 97s
Editing a node: 75s
back to node view: 113s
Describing a node: 79s
back to nodes view: 95s
Viewing pods on a specific node: 64s
back to node view: 96s

on 0.23.8
Switching secret to pods: instant
Switching pods to nodes: instant
Switching from nodes to pods: instant
Editing a node: instant
back to node view: instant
Describing a node: instant
back to nodes view: instant
Viewing pods on a specific node: instant
back to node view: instant