A clear and concise description of what the bug is.
When there messages in the queue and the HPA is scaled down to 0, the scaler should create at least one instance of a pod to create the subscription with STAN.
When the loop runs, The logic checks if the pending count of messages is greater than 0 and that the queue names match. It never takes into account if the subscription does not exist.
ScaledObject that can scale to 0apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
name: test-scaledobject
labels:
deploymentName: test
spec:
pollingInterval: 10 # Optional. Default: 30 seconds
cooldownPeriod: 30 # Optional. Default: 300 seconds
minReplicaCount: 0 # Optional. Default: 0
maxReplicaCount: 5 # Optional. Default: 100
scaleTargetRef:
deploymentName: test
triggers:
- type: stan
metadata:
natsServerMonitoringEndpoint: "stan.default:8222"
queueGroup: "place_indexer"
durableName: "ImDurable"
subject: myQueue
lagThreshold: "100"
{
"name": "myQueue",
"msgs": 500,
"bytes": 0,
"first_seq": 0,
"last_seq": 0,
"subscriptions": []
msgs field because there are no active subscriptions.minReplicaCount is set to 1, scaling works normally.Thanks @cwoolum for spotting this one. Its strange that I didn't see this one, sure tested it will min replicas 0. Anyway thanks!
I think i know how i missed this one. My initial deployment has a replica of 1. So atleast there is a registered subscriber. Once you get past that it can scale from 0 afterwards.
If it scales to zero though, the subscription will go away if you are draining and closing the connection. Even if you don't drain and close, the subscription will drop off after a short period(5 minutes?) And that prevents it from scaling back up.
I was testing it with this program https://github.com/balchua/gonuts. Subscriber seems to stick around.
https://github.com/balchua/gonuts/blob/c0ce3d6448fc1dafeaf5695136d008a04a696873/sub/main.go#L150
You are calling sc.Close() but not sc.Unsubscribe() when your subscriber app ends.
If your service scales up to 20 instances, you'll have 20 subscribers in STAN. Now say you scale back down to 1, each of those instance need to unsubscribe so that STAN doesn't keep trying to deliver messages to them, right?
I could be thinking about this the wrong way though.
Thanks for the explanation.馃憤 I remember setting the maxinflight to 1 so it round robins it to different subscribers. If the original subscriber goes down, i think stan will try to deliver it to the same subscriber until certain period of time. But i maybe wrong here too.
I do use MaxInFlight but set it to 50 since each one of my pods can handle that number of concurrent requests.
I saw that one of the newer versions of STAN/NATS does have a keepalive that it sends periodically to make sure a subscriber is still listening but I'm not sure of the relationship between a connection being open and a subscription existing for it. I'm going to open up an issue in the STAN repo and see if the team can provide some best practices on it.
I was able to find the docs for this. It looks like I was doing it wrong. For durable subscriptions, you don't want to Unsubscribe but only Close the connection.
The main difference between a queue subscription and a durable one, is that when the last member leaves the group, the state of the group will be maintained by the server. Later, when a member rejoins the group, the delivery will resume.
Note: For a durable queue subscription, the last member to unsubscribe (not simply close) causes the group to be removed from the server.
I'll try removing the Unsubscribe call from my code and see how that affects it.
@cwoolum thanks for checking the docs. Did it work for your when you removed the Unsubscribe?
I see the subscriptions are still there but I'm not sure if the messages are getting delivered successfully now. I'm seeing a large number of pending_count on the pods that were taken down as part of the scale down.
{
"name": "place.inserted",
"msgs": 653382,
"bytes": 358537710,
"first_seq": 1,
"last_seq": 653382,
"subscriptions": [
{
"client_id": "xxxplaceinserter-5595885df6-v57vq",
"inbox": "_INBOX.AQ77TV2TOF1DVJUUW109X8",
"ack_inbox": "_INBOX.INczX1devhkKAX12cfD3HS",
"queue_name": "UpdatedDurable:place_inserter",
"is_durable": true,
"is_offline": false,
"max_inflight": 100,
"ack_wait": 60,
"last_sent": 653379,
"pending_count": 0,
"is_stalled": false
},
{
"client_id": "xxxplaceinserter-7b79867b48-zf6js",
"inbox": "_INBOX.ZQCLS4F06A4AMM1V78KYZ8",
"ack_inbox": "_INBOX.INczX1devhkKAX12cfD3Qt",
"queue_name": "UpdatedDurable:place_inserter",
"is_durable": true,
"is_offline": false,
"max_inflight": 100,
"ack_wait": 60,
"last_sent": 653382,
"pending_count": 110,
"is_stalled": false
},
{
"client_id": "xxxplaceinserter-7b79867b48-tjwzt",
"inbox": "_INBOX.SD4GQBUAJY4L8SPRIUYNGH",
"ack_inbox": "_INBOX.INczX1devhkKAX12cfD4Dx",
"queue_name": "UpdatedDurable:place_inserter",
"is_durable": true,
"is_offline": false,
"max_inflight": 100,
"ack_wait": 60,
"last_sent": 653382,
"pending_count": 110,
"is_stalled": false
},
{
"client_id": "xxxplaceinserter-7b79867b48-2cd2v",
"inbox": "_INBOX.XAERJTARSEMGR9N3ZUHZ1J",
"ack_inbox": "_INBOX.INczX1devhkKAX12cfD44W",
"queue_name": "UpdatedDurable:place_inserter",
"is_durable": true,
"is_offline": false,
"max_inflight": 100,
"ack_wait": 60,
"last_sent": 653382,
"pending_count": 110,
"is_stalled": false
},
{
"client_id": "xxxplaceinserter-7b79867b48-mv9fr",
"inbox": "_INBOX.7PWDZ4ZLPWOA311VJKKC6V",
"ack_inbox": "_INBOX.INczX1devhkKAX12cfD3yr",
"queue_name": "UpdatedDurable:place_inserter",
"is_durable": true,
"is_offline": false,
"max_inflight": 100,
"ack_wait": 60,
"last_sent": 653376,
"pending_count": 110,
"is_stalled": false
},
{
"client_id": "xxxplaceinserter-7b79867b48-xzz2s",
"inbox": "_INBOX.DSA21NRUGOLG1DBW8CIP52",
"ack_inbox": "_INBOX.INczX1devhkKAX12cfD40k",
"queue_name": "UpdatedDurable:place_inserter",
"is_durable": true,
"is_offline": false,
"max_inflight": 100,
"ack_wait": 60,
"last_sent": 653378,
"pending_count": 110,
"is_stalled": false
}
]
}
After the scale down occurred, the number of subscribers dropped to 1. I'm not sure the time it took for that to happen though.
I opened an issue to ask the best practices for this.
https://github.com/nats-io/nats-streaming-server/issues/1001
As far as i know stan will direct the pending messages to available subscribers. But how long before it takes that decision.
I'm going to keep testing to make sure none of the messages are getting stuck. I keep running into the error Operation cannot be fulfilled on scaledobjects.keda.k8s.io \"mytest-scaledobject\": the object has been modified; please apply your changes to the latest version and try again. I've tried to delete both the HPA and ScaledObject multiple times but it keeps occurring. Any idea?
When this error shows up does it stop scaling? Care to show what kubectl get hpa looks like? And also your scaledobject yaml.
Thanks!
@cwoolum could you please decribe steps leading to this error + more details on ScaledObject and HPA. thanks
Okay, the modified issue was due to an old CPU based HPA conflicting with the Keda Scaler but I was able to recreate the issue with the pod not scaling from 0. Here are the requested items.
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
creationTimestamp: '2020-01-03T06:52:29Z'
finalizers:
- finalizer.keda.k8s.io
generation: 3
labels:
deploymentName: turnouttplaceindexer
name: turnouttplaceindexer-scaledobject
namespace: api
spec:
cooldownPeriod: 30
maxReplicaCount: 5
minReplicaCount: 0
pollingInterval: 10
scaleTargetRef:
deploymentName: turnouttplaceindexer
triggers:
- metadata:
durableName: UpdatedDurable
lagThreshold: '100'
natsServerMonitoringEndpoint: 'stan.default:8222'
queueGroup: place_indexer
subject: place.updated
type: stan
status:
externalMetricNames:
- lagThreshold
lastActiveTime: '2020-01-03T15:56:05Z'
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
name: keda-hpa-turnouttplaceindexer
namespace: api
selfLink: >-
/apis/autoscaling/v2beta1/namespaces/api/horizontalpodautoscalers/keda-hpa-turnouttplaceindexer
uid: 8a0bd442-13df-43db-9897-0b96ffae7e4f
resourceVersion: '25270090'
creationTimestamp: '2020-01-03T15:52:05Z'
ownerReferences:
- apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
name: turnouttplaceindexer-scaledobject
uid: 8f9f9d7f-022a-4f95-ba70-c30cb155337c
controller: true
blockOwnerDeletion: true
spec:
scaleTargetRef:
kind: Deployment
name: turnouttplaceindexer
apiVersion: apps/v1
minReplicas: 1
maxReplicas: 5
metrics:
- type: External
external:
metricName: lagThreshold
metricSelector:
matchLabels:
deploymentName: turnouttplaceindexer
targetAverageValue: '100'
status:
currentReplicas: 0
desiredReplicas: 0
currentMetrics: null
conditions:
- type: AbleToScale
status: 'True'
lastTransitionTime: '2020-01-03T15:52:20Z'
reason: SucceededGetScale
message: the HPA controller was able to get the target's current scale
- type: ScalingActive
status: 'False'
lastTransitionTime: '2020-01-03T15:57:12Z'
reason: ScalingDisabled
message: scaling is disabled since the replica count of the target is zero
- type: ScalingLimited
status: 'True'
lastTransitionTime: '2020-01-03T15:55:40Z'
reason: TooFewReplicas
message: >-
the desired replica count is increasing faster than the maximum scale
rate
Here is the response from STAN:
{
"name": "place.updated",
"msgs": 1000000,
"bytes": 131655679,
"first_seq": 1752831,
"last_seq": 2752830,
"subscriptions": [
{
"client_id": "turnouttplaceindexer-68c66f7545-dhwqc",
"inbox": "_INBOX.6Z7TV1D0B84HNLMDPIM8IW",
"ack_inbox": "_INBOX.INczX1devhkKAX12cfuiaw",
"queue_name": "UpdatedDurable:place_indexer",
"is_durable": true,
"is_offline": true,
"max_inflight": 100,
"ack_wait": 60,
"last_sent": 2751730,
"pending_count": 0,
"is_stalled": false
}
]
}
And here are the logs from KEDA
2020-01-03T15:54:24.630417316Z {"level":"info","ts":1578066864.6303,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
2020-01-03T15:54:24.730667858Z {"level":"info","ts":1578066864.7304275,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"scaledobject-controller"}
2020-01-03T15:54:24.830825399Z {"level":"info","ts":1578066864.8306682,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"scaledobject-controller","worker count":1}
2020-01-03T15:54:25.156227482Z {"level":"info","ts":1578066865.1561325,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"api","Request.Name":"turnouttplaceindexer-scaledobject"}
2020-01-03T15:54:25.156241982Z {"level":"info","ts":1578066865.1561651,"logger":"controller_scaledobject","msg":"Detecting ScaleType from ScaledObject","Request.Namespace":"api","Request.Name":"turnouttplaceindexer-scaledobject"}
2020-01-03T15:54:25.156247382Z {"level":"info","ts":1578066865.1561697,"logger":"controller_scaledobject","msg":"Detected ScaleType = Deployment","Request.Namespace":"api","Request.Name":"turnouttplaceindexer-scaledobject"}
2020-01-03T15:54:25.181735746Z {"level":"info","ts":1578066865.1815946,"logger":"controller_scaledobject","msg":"Updated HPA according to ScaledObject","Request.Namespace":"api","Request.Name":"turnouttplaceindexer-scaledobject","HPA.Namespace":"api","HPA.Name":"keda-hpa-turnouttplaceindexer"}
2020-01-03T15:57:10.682897812Z {"level":"info","ts":1578067030.6824272,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"api","Request.Name":"turnouttplaceindexer-scaledobject"}
2020-01-03T15:57:10.684467522Z {"level":"info","ts":1578067030.682496,"logger":"controller_scaledobject","msg":"Detecting ScaleType from ScaledObject","Request.Namespace":"api","Request.Name":"turnouttplaceindexer-scaledobject"}
2020-01-03T15:57:10.684494823Z {"level":"info","ts":1578067030.6825025,"logger":"controller_scaledobject","msg":"Detected ScaleType = Deployment","Request.Namespace":"api","Request.Name":"turnouttplaceindexer-scaledobject"}
2020-01-03T15:57:10.695899496Z {"level":"info","ts":1578067030.6957793,"logger":"controller_scaledobject","msg":"Updated HPA according to ScaledObject","Request.Namespace":"api","Request.Name":"turnouttplaceindexer-scaledobject","HPA.Namespace":"api","HPA.Name":"keda-hpa-turnouttplaceindexer"}
2020-01-03T15:57:10.718530941Z {"level":"info","ts":1578067030.7182388,"logger":"scalehandler","msg":"Successfully scaled deployment to 0 replicas","ScaledObject.Namespace":"api","ScaledObject.Name":"turnouttplaceindexer-scaledobject","ScaledObject.ScaleType":"deployment","Deployment.Namespace":"api","Deployment.Name":"turnouttplaceindexer"}
There are messages pending for this subscription but Keda doesn't seem to be picking them up. The pods do call close on the NATS and STAN connections but do not Unsubscribe.
@cwoolum i think i know what the issue is, the stan scaler actually uses the channels' msg attribute and subtract that using the subscriber's last_sent. Unfortunately last_sent is a sequence number.
Looking at the STAN response you provided, msg=1000000 and last_sent=2751730 the resulting pending lag will be negative.
I think the better approach is to use the channel information's last_seq instead of msg.
Wdyt?
I think that makes sense. I created PR #533 to make the change. It also adds additional logging to help diagnose issues.
@cwoolum does PR https://github.com/kedacore/keda/issues/519 addresses this issue?
Yes, I just did a full deploy using the master image tag and scale up/down is working for me.
Thanks @cwoolum