Keda: [Epic] Enable KEDA to be reliably scaled out

Created on 22 May 2019 · 18Comments · Source: kedacore/keda

Today KEDA runs as a single replica in the cluster. Ideally before we go 1.0 we would have a way for KEDA to scale out and partition across the available ScaledObjects for better reliability? (what would happen if KEDA crashed? Does it gracefully recover?). Also if I had hundreds of scaled objects each polling, I may need to scale out my KEDA controller. Ideally there would be some logic that enables multiple replicas to each poll event sources without needless replication.

Epic

Source

jeffhollan

All 18 comments

That's a fair point actually, we should see if we need to:

Make sure our instances are spread across the nodes
Number of ScaledObjects are partitioned across instances

tomkerkhove on 22 May 2019

what would happen if KEDA crashed? Does it gracefully recover?

The pod will always get restarted on a crash. KEDA itself is stateless, so it always recovers.

Given Kubernetes controller pattern of listing objects by type or by selectors on the objects themselves, what are the options there for multi instance controllers?

I was thinking we can let keda listen on all namespaces by default, but allow configuration to set the namespaces it listens to. That way you could partition your cluster however you like by namespaces.

ahmelsayed on 23 May 2019

Another way to partition in a single namespace also is to allow custom taints/tolerations on the ScaledObject which will allow users to decide on which ScaledObject goes to which KEDA controller instance.

yaron2 on 28 May 2019

Letting the scaled objects dictating partitions though might push more onus on the cloud operator when they don't know the scale we can support in KEDA per scale controller. I like the partitioning on namespaces though that might be too corase grained or can we can auto-assign the labels that @yaron2 is mentioning so we keep the partitioning logic internal? A scale controller when it sees a new scaled object assigns atomically a label that dictates which scale controller is in charge for that ScaledObject.

Aarthisk on 28 May 2019

Another possible option is to perform a leader lock. The idea here is that we can set the replicas to > 1 on the keda deployment. The first controller to come up sets two annotations on the keda deployment. The first annotation is the unique GUID for the controller itself eg. leader: AAAA-AAA-AAAA-AA. The second annotation is the time last updated. If the annotation is already set then the other pods will go into a sleep loop. The leader will keep setting the time last updated annotation on the deployment (heartbeat). Every 10 seconds they will check if the two annotations are set and if the time difference > 10 i.e. answering the questions - has the leader checked in? if the leader hasn't then the first controller to determine that locks the annotations.

patnaikshekhar on 18 Jul 2019

Seeing this as an issue already in our staging environments in testing.

For reference we have about 200 queues across two different namespaces / rabbitmq clusters.

Using the keda deployment works, but it's not keeping up with low interval patterns.

Reliably scaling out the pieces needed to ensure the intervals are kept would be great.

Also note we are seeing the time to list the metrics queuelength grow linearly with the number of queues we have. Right now it's pushing 23s and climbing.

sc-chad on 2 Aug 2019

I was able to fix most of my performance issues by changing the default replica count from 1 to 8. Just in case anyone else runs into the same issue.

Also a note the deployment from the helm chart seems to be an older version than what is deployed via the KedaScaleController.yaml.

sc-chad on 2 Aug 2019

Thanks @sc-chad - think there is a valid work item as well to do some load testing with KEDA to see how many queues are expected to be handled for a single replica on something like a standard AKS / GKE node

jeffhollan on 10 Sep 2019

@zroubalik - @anirudhgarg was going to look at some other ways we may be able to scale. Would be good if you could coordinate if you were looking at the namespace 'short term' fix

jeffhollan on 19 Sep 2019

Operator built by operator-sdk (I am currently workin on this) by default listens to a single namespace, it can be easily changed (it is just a configuration change) to listen to all namespaces.

There might is an option to scale the number of goroutines handling reconciliations in the operator via MaxConcurrentReconciles

zroubalik on 20 Sep 2019

👍1

@zroubalik just confirming if the WATCH_NAMESPACE will work with KEDA now given that we are running on operator SDK?

jeffhollan on 11 Nov 2019

@jeffhollan yes, setting this env should do the job and operator will serve just that namespace. This variable is on 2 places in the Deployment, one for operator container and one for metrics adapter container, you need to modify both.

You can run mutliple instances of KEDA on the cluster with this setting, but you need to modify the other resources, ie. ClusterRoles,APIService,... to avoid conflicts. In particular change the namespace of the operator deployment and rename conflicting ClusterRoles (or convert them to namespaced Roles). It is a very simple change.

https://github.com/operator-framework/operator-sdk/blob/master/doc/operator-scope.md

zroubalik on 13 Nov 2019

Thanks @zroubalik - I'm going to keep open until we get something like a helm chart or some deployment yamls that make what you described a bit easier to pull off.

jeffhollan on 17 Nov 2019

👍1

This might be relevant (in case the issue is confirmed)

470

zroubalik on 18 Nov 2019

If we want to run multiple KEDA controllers in the cluster, we will have to redesign the metrics adapter and decouple it from the KEDA operator.
ie. we will have one metrics adapter in the cluster and mutliple KEDA controllers pointing to the one metrics adapter.
I am curious if the perfomance problems are still relevant, since the operator was rewritten with operator-sdk framework and lot of the code (locks,..) were removed during the refactoring.

@sc-chad could you please confirm that you are still hitting the perfomance issues with v1.0?

zroubalik on 25 Nov 2019

Absolutely. Was waiting on 1.0 to retest my original setup. Thanks guys.

sc-chad on 25 Nov 2019

❤1

Thanks for re-testing @sc-chad!

tomkerkhove on 26 Nov 2019

Ok to close?

tomkerkhove on 4 May 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings