Faas: Question: How does the Auto-scaling work ?

Created on 15 Nov 2017  路  10Comments  路  Source: openfaas/faas


Using your pingurl sample, I set 5 systems to constantly use the function using the loop you used in your video getting nodeinfo.

'while [ true ] ; do curl -4 http://domain.com/function/url-ping -d "http://www.google.com/" ; done'

The (gateway_function_invocation_total) or functions per second [f/p/s] was minimum 6 to trigger the scaling, but no higher than 8.2

During this time, every 40s it increased the replicas in batches of 5 to the 40 you mentioned in your video as the current hard limit. It did not stop at 5 and wait for more f/p/s to increase to the next level. Is this the intended behavior?

I first stopped one system and the f/p/s dropped to just above 6, and upon stopping a 2nd server with only 3 servers doing the function loops, the f/p/s dropped under 5. Within a few seconds the replicas went from 40 straight to 1.

I saw no difference on the servers (exec the function call) with 1 replica handling the demand to the 40 in the end. It appears to be a binary scaling effort, with either 1 replica warmed up to handle demand, or, the moment you start using the function with more than 5 f/p/s, it scales up to maximum.

Shouldn't the auto-scaling only scale the replicas based on demand level? Or am I misunderstanding how the autoscaling works? or, is there something wrong with my setup?) Using k8s on own hardware deployed via juju - The Canonical Distribution Of Kubernetes

Expected Behaviour


Replica increase should only increase based on the demand

Current Behaviour


Replicas increases to maximum over several minutes the moment the APIHighInvocationRate is triggered

Possible Solution


As I don't know how the auto-scaling is meant to work, or if this is intended or not, or my setup, I can't provide a definitive solution, but what comes to mind is once the min/max is implemented, some way to set how the auto scaling should scale over time based on the functions per second ?

Steps to Reproduce (for bugs)


  1. run 'while [ true ] ; do curl -4 http://domain.com/function/url-ping -d "http://www.google.com/" ; done' on several servers to get f/p/s above the threshold of 5.
  2. Watch the replica count go up by 5 every 40s
  3. stop server loops 1 by 1 to get under 5 f/p/s and replica count drops to 1
    4.

Context


I am evaluation using this to scale my processing of data (think website crawler) with ongoing processing over time and achieving greater efficiency with my infrastructure by using the auto-scaling nature of this system and re-using the same clusters by scheduling when batch runs utilize the pods.

Your Environment

  • Docker version docker version (e.g. Docker 17.0.05 ):
    System info
    Machine ID:
    1e2ddb8b6dbfb24ac69b3be75a0b91e5
    System UUID:
    41827A73-9E11-42F5-9241-E86E6B7517DB
    Boot ID:
    bbbddd12-6c8b-4239-838f-643680cd7895
    Kernel Version:
    4.4.0-87-generic
    OS Image:
    Ubuntu 16.04.3 LTS
    Container Runtime Version:
    docker://1.13.1
    Kubelet Version:
    v1.8.2
    Kube-Proxy Version:
    v1.8.2
    Operating system:
    linux
    Architecture:
    amd64
  • Are you using Docker Swarm or Kubernetes (FaaS-netes)?
    Kubernetes (FaaS-netes)
  • Operating System and version (e.g. Linux, Windows, MacOS):
    Linux
  • Link to your project or a code example to reproduce issue:
question revisit

Most helpful comment

@stormwalkerec
You could try setting the alert policy to something like:

sum by(function_name) (rate(gateway_function_invocation_total[10s])
) / sum by(function_name) (avg_over_time(gateway_service_count[10s])) > 5

as opposed to the default:

sum by(function_name) (rate(gateway_function_invocation_total{code="200"}[10s])) > 5

In English this means: more than 5 function calls per second per replica will trigger a scale up.
I believe this seems to makes more sense.

All 10 comments

@stefanprodan has spent lots of time with the auto-scaling. @johnmccabe did we document this anywhere other than in the blog posts?

I also meet this problem too. I trace the code and found the flow may control by configurmaps.yaml. But when I try to send the same 'scale-up' event the autoscaling not trigger as APIHighInvocationRate. So I use kubectl auto scaling instead the original way which openfaas provided....

@alexellis I don't believe so, it probably warrants a guide with some examples.

Derek add label: question

@stormwalkerec I'm not clear what the question is.

You have described how the scaling works - in blocks of 5 replicas for every alert. If you need to customize the alerts they are open source and available in the repository. Check Prometheus' "alerts" page for more info.

Please try to summarize what kind of response you need or whether we can close this issue.

It is not scaling in what I understand is scaling ... ie, 6 f/p/s = 5 works, ok. wait till more than 10 f/p/s to initiated the next 5 workers.

The current system is: If there is more than 5 f/p/s, keep increasing worker count till 40 every 40s. at the end, your having 40 works processing only 5.1 f/p/s, ie, HUGE wasted resources, and I fail to see how this is scaling (Increase based on demand) ....

In essence, the current behavior is binary scaling.

If f/p/s under 5, 1 worker
If f/p/s is sustained at 5.1, increase to 40 workers

addendum:

I see there is some work towards setting more parameters for a function (warm worker count, amount to scale past current 40 hard limit) etc, maybe this is where we can set the scaling factor? As a simple function pinging a server is vastly different to a function that processes a whole web page ...

@stormwalkerec if you want to have linear scaling with different scaling rules (CPU/RAM/IO/RPS/etc) tailored to your functions, you should consider using Kubernetes HPA. But HPA will not you far if you're dealing with massive load, you also need to scale your infrastructure. For this you could use cluster autoscaling so your nodes will scale up along with your pods. GKE has support for that see /cluster-autoscaler.

@stormwalkerec
You could try setting the alert policy to something like:

sum by(function_name) (rate(gateway_function_invocation_total[10s])
) / sum by(function_name) (avg_over_time(gateway_service_count[10s])) > 5

as opposed to the default:

sum by(function_name) (rate(gateway_function_invocation_total{code="200"}[10s])) > 5

In English this means: more than 5 function calls per second per replica will trigger a scale up.
I believe this seems to makes more sense.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alexellis picture alexellis  路  5Comments

jvice152 picture jvice152  路  7Comments

ndarilek picture ndarilek  路  3Comments

FlankMiao picture FlankMiao  路  4Comments

derailed picture derailed  路  6Comments