K9s: K9s extremely slow since 0.9.3

Created on 6 Jan 2020 · 39Comments · Source: derailed/k9s

Describe the bug
I've compared the versions 1a9a83b34cdd0c9b4e793ed6b4b5c16ea1a949a0 (0.9.3) and fbc25e6c4a49e31f8017089656aa7b841fe06a5f (0.11.0).

Also cross checked with 0.10.10 which is also very slow.

The latter is extremely slow compared to the first one. The latter takes ~4 seconds to switch the view and the first one takes ~0.5 seconds to switch the view.

To Reproduce
Steps to reproduce the behavior:

Download both releases 0.9.3 and 0.11.0
Run both binaries and compare speeds for switching views, deleting stuff, ...

Expected behavior
Maybe an improvement in speed, or a small decrease for features, but definitely not such a huge decrease in speed, this makes k9s kinda unusable, if I have to wait 5 seconds between each command

Screenshots
k9s

Versions (please complete the following information):

OS: Arch Linux, kernel 5.4.8-arch1-1
K9s: 0.9.3, 0.11.0, 0.10.10
K8s: 1.16.2

Additional context

performance

Source

cwrau

👍1

Most helpful comment

@cwrau - Thank you for the heads up.!! This is related to not running a metrics server. I'll fix! Will push in a bit. Sorry that's on me. My bad!!

@mycrEEpy LOL - Thank you!! I sure can use some positive vibes right at this moment, feeling the burn of midnight oil. I can't tell you how awesome your comment is and how it makes me feel. Hopefully all will be better soon... so we can move on from the run slow, crash fast predicament we're currently in ;(

derailed on 9 Jan 2020

❤2

All 39 comments

@cwrau Thank you for this great issue! I have yet to dive into perf since refactor in 0.10. 4secs seems wrong. My expectation is we should be a bit slower on first load, but subsequent loads should be much faster. Is this not the case? If you turn on debug, I believe I am tracking initial load times. Would you mind attaching it to this issue so we can take a look? Thank you for this!!

derailed on 6 Jan 2020

Even when loading the same view again and again, it's still slow 😞

How do I turn on debug? Changing the logLevel to debug didn't visibly change something

cwrau on 6 Jan 2020

👍1

@cwrau you can run k9s info for the logs location. Then run k9s -l debug

derailed on 6 Jan 2020

@cwrau Also if the perf issue only around dp and po - or all other resources load slow for you too? If so please do pipe in with details. Thank you Chris!!

derailed on 6 Jan 2020

k9s-cwr.log

Seems to affect all resources, except the CRDs

cwrau on 6 Jan 2020

@cwrau Thank you for sending this in!

What is the output of this command time kubectl get po -A -o wide on the exact same cluster?

derailed on 6 Jan 2020

PATH= kubectl --context EU-2-admin@EU-2 get po -A -o wide 0.12s user 0.02s system 94% cpu 0.149 total

cwrau on 6 Jan 2020

Yikes, Thanks Chris!! OK I'll take a look...

derailed on 6 Jan 2020

@cwrau could you also attached the logs for viewing say a configmap, a secret or a persistentvolume or other slow resources beside pods? Thank you!!

derailed on 6 Jan 2020

@cwrau - please give 0.11.2 a shot and see if we're happier?? Tx!! If not please attach the new logs on various resources. Not seeing anything close to a sec on both my local or remote clusters. So something about your env is throwing K9s off. Not sure if lack of metrics-server causes this issue. I'll have to further research if so...

derailed on 6 Jan 2020

k9s-cwr.log

Still very slow 😞

cwrau on 7 Jan 2020

Hum... How about configmaps and secrets? I've just build a brand new eks cluster and I am seeing ~100ms refresh times with no metrics. I suspect something is hoarked with your cluster or you're getting throttled by aws. Are you going thru a VPN or some proxy? Could you try another EKS cluster or a different network? Looking at the log fetching the resource really puts the brakes on, so I am guessing some kind of network issue?? I'll keep digging but not seeing anything trivial at the moment ;(

Any folks using EKS + K9s could pipe in here as I am fresh out of ideas? Is K9s inherently slow on EKS?? If so please add details here. Thank you!!

derailed on 7 Jan 2020

I'm not on aws or any cloud, just our local cluster.

But interestingly our second local cluster, which is even closer, is quite fast.

But even if our cluster is for some reason a little bit slower, version 0.9.3 works perfectly, so something must have changed on your end as well?

cwrau on 7 Jan 2020

@cwrau Sorry my bad mixing issues here and thought you were on EKS. This is exactly why I had ask you for the logs for secrets/cms as k9s fetch these like v0.9 and was hoping to see a delta??

derailed on 7 Jan 2020

k9s-cwr.log

Although the times in the log don't reflect the real times.

In the log it's always 8ms or something, but the display refreshes only after ~2s.

cwrau on 7 Jan 2020

@cwrau Ah! ok that's much better and what I would expect.

Thank you for sending this Chris, You rock!!

This is on 0.11.2 correct?

Now I have something to look at. You are right we refresh the display on a 2sec interval by default. But something is toast here on this particular cluster loading the resources and populating the cache as I would expect that may be the initial access to be slower but subsequent access to be much much faster. But still that's a 2 sec load for very few resources... I have several clusters here at the ranch but none of them so far exhibit these symptoms with K9s and all load resources in ms time, so might be tough for me to track this down. I'll take another look at the code and see if I've borked something which is totally possible ;(

derailed on 7 Jan 2020

No problem! 😄

Yes, it's on 0.11.2

If the fetching of the secrets is so fast (whyever it's not that fast for the other resources 🤷), why is the eventual refresh so slow? 🤔

I tried looking at the diff between 0.9.3 and 0.11.2, but I barely understand Go, let alone a completely new codebase 😅

If there is anything I can do to help, I'd be glad to!

cwrau on 8 Jan 2020

@cwrau - Ok let's try out 0.11.3 and see if we're happier. Think I've found a potential but not really sure since I wasn't able to repro in the first place. So this is a hunch at best so hopefully better. In any case I've add more instrumentation so hopefully will see something else.

Can u please send out the new logs - by cycling thru no, po, dp, cm, cronjob. Thanks for your help!
Fingers and toes crossed...

derailed on 8 Jan 2020

I'm terribly sorry to tell you that 0.11.3 crashes 😢

k9s-cwr.log

cwrau on 9 Jan 2020

@derailed just some positive vibes, 0.11.3 feels a lot faster than all previous releases, awesome work!

mycrEEpy on 9 Jan 2020

@cwrau - Thank you for the heads up.!! This is related to not running a metrics server. I'll fix! Will push in a bit. Sorry that's on me. My bad!!

derailed on 9 Jan 2020

❤2

@derailed - No worries, you're doing an awesome job! K9s is, besides chrome, my most used, and loved, software 🤗

cwrau on 10 Jan 2020

Wow, we had to install the metrics-server so we could use hpas, and now k9s is blazingly fast!! Even faster than before! ⚡ And it doesn't crash! 🎉

So it's really the metrics you're fetching, I think...

cwrau on 10 Jan 2020

@cwrau Thank you!! I can't tell you how happy this makes me. This was a long week trying to figure out what the heck was up. I am so glad!! and mega bonus for no crashes.

I think I've fixed the metrics-server issue as you don't have to run one, but missed it in that excitement.
Thank you Chris for hanging with me, your patience and you support!!

derailed on 10 Jan 2020

I think we can close this now. 😄

I haven't had any performance issues since then, and if someone without a metrics server still has this issue, you can reopen this.

Thanks for your hard work and this amazing project!!!!! 🎉

cwrau on 20 Jan 2020

@cwrau Woosh!! That's very good news... Thank you so much Chris for your patience and kindness!

derailed on 21 Jan 2020

I'm experiencing very slow k9s as well with any version > 9.3 I interact with a few clusters and it "appears" that it's only one that is affected. But it is so slow it is mostly unusable. Reverting back to 9.3, it's very fast again. I'm not sure how to troubleshoot this or if it's even a k9s issue since it seems to be for one specific cluster. any thoughts on how I might debug it? Thanks

andrewhharmon on 22 Jan 2020

Do you have a metrics-server deployed?

You can check by running kubectl get pods.metrics.k8s.io -A.

If nothing gets returned, you don't have a metrics server

cwrau on 22 Jan 2020

Yes, that returned a ton of values. And I do see metrics-server-v0.3.1-76b9c6489d-dm545 running in my kube-system namespace

andrewhharmon on 22 Jan 2020

my metrics-server pod seems to be under cpu/memory pressure. every few days it crashes. Could that cause k9s to slow down if it's talking with the metrics server and the server is slow to respond?

andrewhharmon on 22 Jan 2020

First of all, your metrics-server is old, the latest version is 0.3.6.

But that aside, I think that could be the problem. My performance problem went away as soon as we had a metrics server. Our metrics-server is not under any significant load.

Could you try to raise it's cpu/memory limits?

@derailed But maybe it's worth looking into the necessity for a perfectly running metrics-server? Maybe try it with small timeouts, and if it fails, just discard that? I would assume a fast k9s is more important than displaying the cpu/memory usages?

cwrau on 22 Jan 2020

yeah, unfortunately i'm not the cluster administrator. I doubt they will tweak these settings if the only reason is one person's tool runs slowly :( Maybe i can find some other impact of a slow metrics server to tell them about :)

andrewhharmon on 22 Jan 2020

I mean, an old, constantly crashing pod is something to upgrade and adjust... 😉

cwrau on 22 Jan 2020

@andrewhharmon I think we need more details here so we can figure this out ie is all of K9s slow or only for particular resources? Sadly in v0.13.X I have removed some of the instrumentation I had put in for @cwrau when he was having issues. So Andrew could you download v0.12.0 and turn on debug logs ie k9s -l debug and share them here so we can hopefully see where the slowness comes from on that particular cluster. This could stem from so many areas so it's kind of hard to assert without further details. Also is kubectl slow on that cluster for the same resources you're trying to access?

derailed on 22 Jan 2020

k9s-andrew.log

so i launched v12 with debug. It took about 4s to display pods. Selected a pod and clicked Enter, which took about 4s to bring up the containers in the pod. Hit ESC to go back, that also took about 4s.

kubectl is very fast

andrewhharmon on 22 Jan 2020

So i'm running this in WSL in Windows. I decided to try k9s natively in windows, which required a restart. both windows and k9s seems faster. Not sure it it'll degrade again after a while, but i'll keep you posted if it does.

andrewhharmon on 22 Jan 2020

@andrewhharmon Thank you for the forward on the logs! Excellent job!! I will need to instrument the code a bit more but I do see an issue fetching the containers for your pod ie a 6sec call. I suspect this hints to a metrics-server issue as @cwrau pointed out. So either we're getting throttled or caught in a restart not sure. Let's keep tabs here and see if we you can get a strong repro. If so I'll add more instrumentation and we can take it from there. Also, as with anything else in K8sland the closer to the latest the better. Just point your admins to CVE reports that should prompt them to upgrade ;) Once again thank you for the great research and updates Andrew!

derailed on 22 Jan 2020

ok, thanks. btw, great job on this tool. It's amazing!

andrewhharmon on 22 Jan 2020

@andrewhharmon Very kind of you to say and so much appreciated!! Humbled to see this tool helps manage your clusters and at times... (read depending on the drop and how much sleep I've gotten 🐭) make K8s life a bit better.

derailed on 22 Jan 2020

Was this page helpful?

0 / 5 - 0 ratings