Faas: Question: How does the Auto-scaling work ?

Created on 15 Nov 2017 · 10Comments · Source: openfaas/faas

Using your pingurl sample, I set 5 systems to constantly use the function using the loop you used in your video getting nodeinfo.

'while [ true ] ; do curl -4 http://domain.com/function/url-ping -d "http://www.google.com/" ; done'

The (gateway_function_invocation_total) or functions per second [f/p/s] was minimum 6 to trigger the scaling, but no higher than 8.2

During this time, every 40s it increased the replicas in batches of 5 to the 40 you mentioned in your video as the current hard limit. It did not stop at 5 and wait for more f/p/s to increase to the next level. Is this the intended behavior?

I first stopped one system and the f/p/s dropped to just above 6, and upon stopping a 2nd server with only 3 servers doing the function loops, the f/p/s dropped under 5. Within a few seconds the replicas went from 40 straight to 1.

I saw no difference on the servers (exec the function call) with 1 replica handling the demand to the 40 in the end. It appears to be a binary scaling effort, with either 1 replica warmed up to handle demand, or, the moment you start using the function with more than 5 f/p/s, it scales up to maximum.

Shouldn't the auto-scaling only scale the replicas based on demand level? Or am I misunderstanding how the autoscaling works? or, is there something wrong with my setup?) Using k8s on own hardware deployed via juju - The Canonical Distribution Of Kubernetes

Expected Behaviour

Replica increase should only increase based on the demand

Current Behaviour

Replicas increases to maximum over several minutes the moment the APIHighInvocationRate is triggered

Possible Solution

As I don't know how the auto-scaling is meant to work, or if this is intended or not, or my setup, I can't provide a definitive solution, but what comes to mind is once the min/max is implemented, some way to set how the auto scaling should scale over time based on the functions per second ?

Steps to Reproduce (for bugs)

run 'while [ true ] ; do curl -4 http://domain.com/function/url-ping -d "http://www.google.com/" ; done' on several servers to get f/p/s above the threshold of 5.
Watch the replica count go up by 5 every 40s
stop server loops 1 by 1 to get under 5 f/p/s and replica count drops to 1
4.

Context

I am evaluation using this to scale my processing of data (think website crawler) with ongoing processing over time and achieving greater efficiency with my infrastructure by using the auto-scaling nature of this system and re-using the same clusters by scheduling when batch runs utilize the pods.

Your Environment

Docker version docker version (e.g. Docker 17.0.05 ):
System info
Machine ID:
1e2ddb8b6dbfb24ac69b3be75a0b91e5
System UUID:
41827A73-9E11-42F5-9241-E86E6B7517DB
Boot ID:
bbbddd12-6c8b-4239-838f-643680cd7895
Kernel Version:
4.4.0-87-generic
OS Image:
Ubuntu 16.04.3 LTS
Container Runtime Version:
docker://1.13.1
Kubelet Version:
v1.8.2
Kube-Proxy Version:
v1.8.2
Operating system:
linux
Architecture:
amd64
Are you using Docker Swarm or Kubernetes (FaaS-netes)?
Kubernetes (FaaS-netes)
Operating System and version (e.g. Linux, Windows, MacOS):
Linux
Link to your project or a code example to reproduce issue:

question revisit

Source

stormwalkerec

Most helpful comment

@stormwalkerec
You could try setting the alert policy to something like:

sum by(function_name) (rate(gateway_function_invocation_total[10s])
) / sum by(function_name) (avg_over_time(gateway_service_count[10s])) > 5

as opposed to the default:

sum by(function_name) (rate(gateway_function_invocation_total{code="200"}[10s])) > 5

In English this means: more than 5 function calls per second per replica will trigger a scale up.
I believe this seems to makes more sense.

polvoazul on 7 Sep 2018

👍2

All 10 comments

@stefanprodan has spent lots of time with the auto-scaling. @johnmccabe did we document this anywhere other than in the blog posts?

alexellis on 15 Nov 2017

I also meet this problem too. I trace the code and found the flow may control by configurmaps.yaml. But when I try to send the same 'scale-up' event the autoscaling not trigger as APIHighInvocationRate. So I use kubectl auto scaling instead the original way which openfaas provided....

rayhero on 16 Nov 2017

@alexellis I don't believe so, it probably warrants a guide with some examples.

johnmccabe on 16 Nov 2017

Derek add label: question

rgee0 on 18 Nov 2017

@stormwalkerec I'm not clear what the question is.

You have described how the scaling works - in blocks of 5 replicas for every alert. If you need to customize the alerts they are open source and available in the repository. Check Prometheus' "alerts" page for more info.

Please try to summarize what kind of response you need or whether we can close this issue.

alexellis on 25 Nov 2017

It is not scaling in what I understand is scaling ... ie, 6 f/p/s = 5 works, ok. wait till more than 10 f/p/s to initiated the next 5 workers.

The current system is: If there is more than 5 f/p/s, keep increasing worker count till 40 every 40s. at the end, your having 40 works processing only 5.1 f/p/s, ie, HUGE wasted resources, and I fail to see how this is scaling (Increase based on demand) ....

In essence, the current behavior is binary scaling.

If f/p/s under 5, 1 worker
If f/p/s is sustained at 5.1, increase to 40 workers

stormwalkerec on 25 Nov 2017

addendum:

I see there is some work towards setting more parameters for a function (warm worker count, amount to scale past current 40 hard limit) etc, maybe this is where we can set the scaling factor? As a simple function pinging a server is vastly different to a function that processes a whole web page ...

stormwalkerec on 26 Nov 2017

@stormwalkerec if you want to have linear scaling with different scaling rules (CPU/RAM/IO/RPS/etc) tailored to your functions, you should consider using Kubernetes HPA. But HPA will not you far if you're dealing with massive load, you also need to scale your infrastructure. For this you could use cluster autoscaling so your nodes will scale up along with your pods. GKE has support for that see /cluster-autoscaler.

stefanprodan on 27 Nov 2017

Please see - https://docs.openfaas.com/architecture/autoscaling/

alexellis on 18 Apr 2018

@stormwalkerec
You could try setting the alert policy to something like:

sum by(function_name) (rate(gateway_function_invocation_total[10s])
) / sum by(function_name) (avg_over_time(gateway_service_count[10s])) > 5

as opposed to the default:

sum by(function_name) (rate(gateway_function_invocation_total{code="200"}[10s])) > 5

In English this means: more than 5 function calls per second per replica will trigger a scale up.
I believe this seems to makes more sense.

polvoazul on 7 Sep 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Amend Derek users

alexellis · 5Comments

Question: stateless microservice healthchecks

jvice152 · 7Comments

Question: How can I set custom headers on function responses?

ndarilek · 3Comments

Question about Golang blog post

FlankMiao · 4Comments

How to set Docker image version?

derailed · 6Comments