Faas: OpenFaaS function is scaled down to zero when min and max replicas are set to 1

Created on 24 Oct 2018  路  18Comments  路  Source: openfaas/faas

Hi,

I have a problem with OpenFaaS when setting the min and max replicas to 1, to avoid scaling of the function.

labels:
com.openfaas.scale.min: "1"
com.openfaas.scale.max: "1"

However the serving pod is scaled down to zero once in every minute.

Expected Behaviour

The serving pod of the function shouldn't be scaled down to zero.

Current Behaviour

Serving pod of the function is scaled down to zero in every minute

Output of kubectl get deployments -n openfaas-fn -w
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
myfunc 0 1 1 1 116m
myfunc 0 1 1 1 116m
myfunc 0 0 0 0 116m
myfunc 1 0 0 0 116m
myfunc 1 0 0 0 116m
myfunc 1 0 0 0 116m
myfunc 1 1 1 0 116m
myfunc 1 1 1 1 116m
myfunc 0 1 1 1 117m
myfunc 0 1 1 1 117m
myfunc 0 0 0 0 117m
myfunc 1 0 0 0 118m
myfunc 1 0 0 0 118m
myfunc 1 0 0 0 118m
myfunc 1 1 1 0 118m
myfunc 1 1 1 1 118m

Possible Solution

I have investigated the code of OpenFaaS and found the bottom lines in faas/gateway/handlers/alerthandler.go:

// CalculateReplicas decides what replica count to set depending on current/desired amount
func CalculateReplicas(status string, currentReplicas uint64, maxReplicas uint64, minReplicas uint64, scalingFactor uint64) uint64 {
newReplicas := currentReplicas
step := uint64((float64(maxReplicas) / 100) * float64(scalingFactor))

if status == "firing" {
    if currentReplicas == 1 {
        newReplicas = step
    } else {
        if currentReplicas+step > maxReplicas {
            newReplicas = maxReplicas
        } else {
            newReplicas = currentReplicas + step
        }
    }
} else { // Resolved event.
    newReplicas = minReplicas
}

return newReplicas

}

When calculating steps and it happens that maxReplicas is 1 and scaling factor is not set, so its value is 20% by default (according to https://docs.openfaas.com/architecture/autoscaling/) the step value is set to 0 after converting the whole calculation to uint64.

According to the bottom lines, newReplcias are set to 0 in my case.
if currentReplicas == 1 {
newReplicas = step

Solution is to set newReplicas to 1 in case maxReplicas is 1 and steps is 0 when currentReplicas equals to 1.

Steps to Reproduce (for bugs)

Append the bottom lines to the function's yml file, and redeploy the function.
labels:
com.openfaas.scale.min: "1"
com.openfaas.scale.max: "1"

After starting to call the function with curl or ab or whatever, the serving pod is scaled down to zero.

Context

I wanted to try how OpenFaaS works without scaling.

Your Environment

  • FaaS-CLI version ( Full output from: faas-cli version ): 0.7.7

  • Docker version docker version (e.g. Docker 17.0.05 ): 18.06.1-ce

  • Are you using Docker Swarm or Kubernetes (FaaS-netes)? FaaS-netes

  • Operating System and version (e.g. Linux, Windows, MacOS): Ubuntu 16.04, Kernel 4.4.0-134-generic

question support

All 18 comments

Hello @szefoka . The faas-idler is responsible to this behaviour. In values.yaml file there is faasIdler: which should have dryRun set to true to not actually scale your functions down to 0 in case they are not invoked. Turn that on to true if it isn't.

Derek add label: question

Derek add label: support

Hi let me ping @Templum, Simon PTAL?

Martin please can you see if you can reproduce this issue?

Alex

Hello @szefoka . The faas-idler is responsible to this behaviour. In values.yaml file there is faasIdler: which should have dryRun set to true to not actually scale your functions down to 0 in case they are not invoked. Turn that on to true if it isn't.

Hi @martindekov! Actually the value of dryRun was originally set to true, so it might be some other issue I think.

I will try to reproduce it. However would like to know if faas idler is involved in the setup, as this could also play a role.

I have deployed the faas-idler with dryrun set to true. The behavior is the same for me with or without it.

I have watched the logs of faas-idler I got this:

{{200 myfunc} [1.540401506699e+09 2.301694915254237]}
{{502 myfunc} [1.540401506699e+09 0]}
2018/10/24 17:18:26 Skip: myfunc due to missing label

{{200 myfunc} [1.540401536704e+09 2.301694915254237]}
{{502 myfunc} [1.540401536704e+09 0]}
2018/10/24 17:18:56 Skip: myfunc due to missing label

{{200 myfunc} [1.540401566709e+09 3.7016949152542367]}
{{502 myfunc} [1.540401566709e+09 0.003389830508474576]}
2018/10/24 17:19:26 Skip: myfunc due to missing label

{{200 myfunc} [1.540401596717e+09 4.098305084745762]}
{{502 myfunc} [1.540401596717e+09 0.003389830508474576]}
2018/10/24 17:19:56 Skip: myfunc due to missing label

I looked into the code of the faas-idler (https://github.com/openfaas-incubator/faas-idler/blob/master/main.go), as I understand it checks if the "com.openfaas.scale.zero" label is set to true. If the value is false or not 1 the above log-lines are printed.

I append 'com.openfaas.scale.zero: "false"' to the function yml, just to be sure, though scaling to zero also happened in the case of running faas-idler with dryRun set to true.

Thank you for providing the debug info. Please do you also have the logs from the gateway?

I cut the part of the log which is related to my function's scaling down to zero and then up to one.

2018/10/24 20:58:13 Forwarded [POST] to /function/myfunc - [200] - 0.046548 seconds
2018/10/24 20:58:13 Forwarded [POST] to /function/myfunc - [200] - 0.047080 seconds
2018/10/24 20:58:13 Forwarded [POST] to /function/myfunc - [200] - 0.044991 seconds
2018/10/24 20:58:13 Alert received.
2018/10/24 20:58:13 {"receiver":"scale-up","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"APIHighInvocationRate","function_name":"myfunc","monitor":"faas-monitor","service":"gateway","severity":"major","value":"16.4"},"annotations":{"description":"High invocation total on ","summary":"High invocation total on "},"startsAt":"2018-10-24T20:58:08.324404523Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-6c7d64d46b-hl5nx:9090/graph?g0.expr=sum+by%28function_name%29+%28rate%28gateway_function_invocation_total%7Bcode%3D%22200%22%7D%5B10s%5D%29%29+%3E+5\u0026g0.tab=1"}],"groupLabels":{"alertname":"APIHighInvocationRate","service":"gateway"},"commonLabels":{"alertname":"APIHighInvocationRate","function_name":"myfunc","monitor":"faas-monitor","service":"gateway","severity":"major","value":"16.4"},"commonAnnotations":{"description":"High invocation total on ","summary":"High invocation total on "},"externalURL":"http://alertmanager-ccd8559-hvpdn:9093","version":"4","groupKey":"{}:{alertname=\"APIHighInvocationRate\", service=\"gateway\"}"}

2018/10/24 20:58:13 [Scale] function=myfunc 1 => 0.
2018/10/24 20:58:13 Forwarded [POST] to /function/myfunc - [200] - 0.058671 seconds
...
2018/10/24 20:58:13 Forwarded [POST] to /function/myfunc - [200] - 0.066349 seconds
2018/10/24 20:58:13 Forwarded [GET] to /system/function/myfunc - [200] - 0.002305 seconds
2018/10/24 20:58:13 Forwarded [POST] to /function/myfunc - [200] - 0.051610 seconds
...
2018/10/24 20:58:16 Forwarded [POST] to /function/myfunc - [200] - 0.053221 seconds
2018/10/24 20:58:16 Forwarded [GET] to /system/function/myfunc - [200] - 0.002099 seconds
2018/10/24 20:58:16 Forwarded [POST] to /function/myfunc - [200] - 0.048294 seconds
...
2018/10/24 20:58:18 Forwarded [POST] to /function/myfunc - [200] - 0.047385 seconds
2018/10/24 20:58:18 Forwarded [GET] to /system/function/myfunc - [200] - 0.001952 seconds
2018/10/24 20:58:18 Forwarded [GET] to /system/functions - [200] - 0.005494 seconds
2018/10/24 20:58:21 Forwarded [GET] to /healthz - [200] - 0.000476 seconds
2018/10/24 20:58:21 Forwarded [GET] to /system/function/myfunc - [200] - 0.002725 seconds
2018/10/24 20:58:21 Forwarded [GET] to /system/functions - [200] - 0.002115 seconds
2018/10/24 20:58:23 Forwarded [GET] to /system/function/myfunc - [200] - 0.002897 seconds
2018/10/24 20:58:25 Forwarded [GET] to /system/functions - [200] - 0.002473 seconds
2018/10/24 20:58:26 Forwarded [GET] to /system/function/myfunc - [200] - 0.002498 seconds
2018/10/24 20:58:28 Forwarded [GET] to /system/functions - [200] - 0.006393 seconds
2018/10/24 20:58:28 Forwarded [GET] to /system/function/myfunc - [200] - 0.006327 seconds
2018/10/24 20:58:31 Forwarded [GET] to /healthz - [200] - 0.000640 seconds
2018/10/24 20:58:31 Forwarded [GET] to /system/function/myfunc - [200] - 0.002785 seconds
2018/10/24 20:58:32 Forwarded [GET] to /system/functions - [200] - 0.003320 seconds
2018/10/24 20:58:33 Forwarded [GET] to /system/function/myfunc - [200] - 0.003518 seconds
2018/10/24 20:58:35 Forwarded [GET] to /system/functions - [200] - 0.003398 seconds
2018/10/24 20:58:35 Forwarded [GET] to /system/functions - [200] - 0.002135 seconds
2018/10/24 20:58:36 Forwarded [GET] to /system/function/myfunc - [200] - 0.002754 seconds
2018/10/24 20:58:38 Forwarded [GET] to /system/function/myfunc - [200] - 0.004848 seconds
2018/10/24 20:58:39 Forwarded [GET] to /system/functions - [200] - 0.009307 seconds
2018/10/24 20:58:41 Forwarded [GET] to /healthz - [200] - 0.000443 seconds
2018/10/24 20:58:41 Forwarded [GET] to /system/function/myfunc - [200] - 0.002679 seconds
2018/10/24 20:58:42 Forwarded [GET] to /system/functions - [200] - 0.003117 seconds
2018/10/24 20:58:43 Alert received.
2018/10/24 20:58:43 {"receiver":"scale-up","status":"resolved","alerts":[{"status":"resolved","labels":{"alertname":"APIHighInvocationRate","function_name":"myfunc","monitor":"faas-monitor","service":"gateway","severity":"major","value":"16.4"},"annotations":{"description":"High invocation total on ","summary":"High invocation total on "},"startsAt":"2018-10-24T20:58:08.324404523Z","endsAt":"2018-10-24T20:58:38.324404523Z","generatorURL":"http://prometheus-6c7d64d46b-hl5nx:9090/graph?g0.expr=sum+by%28function_name%29+%28rate%28gateway_function_invocation_total%7Bcode%3D%22200%22%7D%5B10s%5D%29%29+%3E+5\u0026g0.tab=1"}],"groupLabels":{"alertname":"APIHighInvocationRate","service":"gateway"},"commonLabels":{"alertname":"APIHighInvocationRate","function_name":"myfunc","monitor":"faas-monitor","service":"gateway","severity":"major","value":"16.4"},"commonAnnotations":{"description":"High invocation total on ","summary":"High invocation total on "},"externalURL":"http://alertmanager-ccd8559-hvpdn:9093","version":"4","groupKey":"{}:{alertname=\"APIHighInvocationRate\", service=\"gateway\"}"}

2018/10/24 20:58:43 [Scale] function=myfunc 0 => 1.
2018/10/24 20:58:43 Forwarded [GET] to /system/function/myfunc - [200] - 0.002687 seconds
2018/10/24 20:58:46 Forwarded [GET] to /system/functions - [200] - 0.003025 seconds
2018/10/24 20:58:46 Forwarded [GET] to /system/function/myfunc - [200] - 0.003477 seconds
2018/10/24 20:58:48 Forwarded [GET] to /system/function/myfunc - [200] - 0.002289 seconds
2018/10/24 20:58:49 Forwarded [GET] to /system/functions - [200] - 0.003028 seconds
2018/10/24 20:58:51 Forwarded [GET] to /healthz - [200] - 0.000638 seconds
2018/10/24 20:58:51 Forwarded [GET] to /system/function/myfunc - [200] - 0.002641 seconds
2018/10/24 20:58:53 Forwarded [GET] to /system/function/myfunc - [200] - 0.002131 seconds
2018/10/24 20:58:53 Forwarded [GET] to /system/functions - [200] - 0.002839 seconds
2018/10/24 20:58:56 Forwarded [GET] to /system/functions - [200] - 0.002223 seconds
2018/10/24 20:58:56 Forwarded [GET] to /system/function/myfunc - [200] - 0.002321 seconds
2018/10/24 20:58:58 Forwarded [GET] to /system/function/myfunc - [200] - 0.003331 seconds
2018/10/24 20:59:00 Forwarded [GET] to /system/functions - [200] - 0.005705 seconds
2018/10/24 20:59:01 Forwarded [GET] to /healthz - [200] - 0.000642 seconds
2018/10/24 20:59:01 Forwarded [GET] to /system/function/myfunc - [200] - 0.003134 seconds
2018/10/24 20:59:03 Forwarded [GET] to /system/functions - [200] - 0.003607 seconds
2018/10/24 20:59:03 Forwarded [GET] to /system/function/myfunc - [200] - 0.002101 seconds
2018/10/24 20:59:05 Forwarded [GET] to /system/functions - [200] - 0.002820 seconds
2018/10/24 20:59:06 Forwarded [GET] to /system/function/myfunc - [200] - 0.002785 seconds
2018/10/24 20:59:07 Forwarded [GET] to /system/functions - [200] - 0.003209 seconds
2018/10/24 20:59:08 Forwarded [GET] to /system/function/myfunc - [200] - 0.002848 seconds
2018/10/24 20:59:10 Forwarded [GET] to /system/functions - [200] - 0.002861 seconds
2018/10/24 20:59:11 Forwarded [GET] to /system/function/myfunc - [200] - 0.005427 seconds
2018/10/24 20:59:11 Forwarded [GET] to /healthz - [200] - 0.000305 seconds
2018/10/24 20:59:13 Forwarded [GET] to /system/function/myfunc - [200] - 0.002615 seconds
2018/10/24 20:59:14 Forwarded [GET] to /system/functions - [200] - 0.003211 seconds
2018/10/24 20:59:16 Forwarded [GET] to /system/function/myfunc - [200] - 0.002498 seconds
2018/10/24 20:59:17 Forwarded [GET] to /system/functions - [200] - 0.002666 seconds
2018/10/24 20:59:18 error with upstream request to: , Post http://myfunc.openfaas-fn.svc.cluster.local.:8080: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/10/24 20:59:18 Forwarded [POST] to /function/myfunc - [502] - 60.000223 seconds
2018/10/24 20:59:18 Forwarded [POST] to /function/myfunc - [200] - 0.050617 seconds
2018/10/24 20:59:18 Forwarded [POST] to /function/myfunc - [200] - 0.050752 seconds
2018/10/24 20:59:18 Forwarded [POST] to /function/myfunc - [200] - 0.048583 seconds
2018/10/24 20:59:18 Forwarded [POST] to /function/myfunc - [200] - 0.046774 seconds
2018/10/24 20:59:18 Forwarded [POST] to /function/myfunc - [200] - 0.052220 seconds
2018/10/24 20:59:18 Forwarded [GET] to /system/function/myfunc - [200] - 0.002421 seconds
2018/10/24 20:59:18 Forwarded [POST] to /function/myfunc - [200] - 0.050028 seconds

@Martind were you able to reproduce this?

I am on it

@szefoka thank you for this report. You can disable scaling by scaling Alertmanager to 0 replicas. In the meantime we're going to look into this as a priority.

Alex

Thanks @alexellis this solved my problem temporarily.
I'm not sure if it would do the trick, but I think if I would modify the CalculateReplicas function in /gateway/handlers/alerthandler.go, from this
...
if status == "firing" {
if currentReplicas == 1 {
newReplicas = step
}
...

to something like this

...
if status == "firing" {
if currentReplicas == 1 {
if step == 0 {
newReplicas = 1
} else {
newReplicas = step
}
}
...
would solve this issue.

Thanks,
David

Yeah I was able to replicate the issue with 2500+ requests. Thanks for sharing this with us @szefoka 馃檪

Thanks for the quick fix! @alexellis :)

This was released in the 0.9.8 image available on the Docker Hub.

Derek close: fixed in this commit

@szefoka @Templum @ivanayov this introduced regression into OpenFaaS Cloud where we have a minimum replica count of 1 and max of 4 with proportional scaling of 20%. I believe it used to work before introducing this patch.

Reproduce error with:


func TestScaling_1Min_4Max_Step1(t *testing.T) {
    minReplicas := uint64(1)
    maxReplicas := uint64(4)
    scalingFactor := uint64(50)
    current := uint64(1)

    want := uint64(2)

    newReplicas := CalculateReplicas("firing", current, maxReplicas, minReplicas, scalingFactor)
    if newReplicas != want {
        t.Logf("Replicas - want: %d, got: %d", want, newReplicas)
        t.Fail()
    }
}

func TestScaling_1Min_4Max_Step2(t *testing.T) {
    minReplicas := uint64(1)
    maxReplicas := uint64(4)
    scalingFactor := uint64(50)
    current := uint64(2)

    want := uint64(4)

    newReplicas := CalculateReplicas("firing", current, maxReplicas, minReplicas, scalingFactor)
    if newReplicas != want {
        t.Logf("Replicas - want: %d, got: %d", want, newReplicas)
        t.Fail()
    }
}

Do we believe it is correct for no autoscaling to take place between the range of 1 - 4 replicas? Clearly 20% of 1 is less than 1 so would round down.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ndarilek picture ndarilek  路  4Comments

ndarilek picture ndarilek  路  3Comments

alexellis picture alexellis  路  4Comments

alexellis picture alexellis  路  6Comments

karuppiah7890 picture karuppiah7890  路  4Comments