Faas: Function invocation incorrect for multiple namespaces

Created on 21 Dec 2019  路  4Comments  路  Source: openfaas/faas

My actions before raising this issue

Expected Behaviour

I have wordcount function in openfaas-fn and in another-ns namespaces

I invoke the wordcount function multiple times in both namespaces in my web UI portal. The invocation count in the web UI shows the correct number of invocations in both namespaces. And the CLI also shows same correct count as the web UI portal when function is invoked from web UI portal. When CLI invokes the function using $ faas invoke wordcount or using $ faas invoke wordcount -n <namespace>, the count is shown correctly in both CLI and web UI portal

Current Behaviour

I invoked the wordcount function multiple times in both namespaces in my web UI portal. But the invocation count in the web UI doesn't increase, it shows the same count, in both namespaces. And the CLI also shows same count as the web UI portal when invoked from web UI portal. But when CLI invokes the function using $ faas invoke wordcount , the count increases in both CLI and web UI portal

Steps to Reproduce (for bugs)

  1. install openfaas in k8s with k3sup. this installs with openfaas-fn as the default namespace for functions. login to the gateway in the cli

  2. deploy functions in openfaas-fn using this

$ faas deploy -f https://raw.githubusercontent.com/openfaas/faas/master/stack.yml
  1. add another namespace in k8s
$ kubectl create ns another-ns
$ kubectl annotate namespace/another-ns openfaas="1"
$ # check namespaces list
$ faas namespaces
  1. deploy wordcount function in the new namespace using the below yaml file
# stack.yml
provider:
  name: openfaas
  gateway: http://127.0.0.1:8080  # can be a remote server

functions:
  wordcount:
    lang: dockerfile
    image: functions/alpine:latest
    fprocess: "wc"
    skip_build: true
    namespace: another-ns
$ faas deploy -f stack.yml
  1. Go to the web UI portal, invoke the function in openfaas-fn namespace and in another-ns namespace

You will notice that the count doesn't increase in the UI, and it doesn't increase in the CLI too, when doing $ faas list or $ faas list -n openfaas-fn or $ faas list -n another-ns

  1. Invoke the function using the CLI
$ faas invoke wordcount
...
$ faas list
...

Also check web UI portal. the count shows up now, increases, but is the number of CLI invocations

  1. Invoke the function using the CLI but with namespace flag. You will notice count doesn't increase and is wrong
$ faas invoke wordcount -n openfaas-fn
...
$ faas invoke wordcount -n another-ns
...
$ faas list
$ faas list -n openfaas-fn
$ faas list -n another-ns
...

Context

I was just trying out openfaas. Noticed that my invocation count shows up wrong

Your Environment

  • FaaS-CLI version ( Full output from: faas-cli version ):
CLI:
 commit:  73004c23e5a4d3fdb7352f953247473477477a64
 version: 0.11.3

Gateway
 uri:     http://127.0.0.1:8080
 version: 0.18.7
 sha:     59b7839236098820e73ed25301258b722c3d33e4
 commit:  Change how and when we fetch and parse namespace info


Provider
 name:          faas-netes
 orchestration: kubernetes
 version:       0.9.15
 sha:           41c33f9f7c29e8276bd01387f78d6f0cff847890
  • Docker version docker version (e.g. Docker 17.0.05 ):
# minikube vm docker
Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea838
 Built:             Wed Nov 13 07:22:05 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.12
  Git commit:       633a0ea838
  Built:            Wed Nov 13 07:28:45 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          commit: d736ef14f0288d6993a1845745d6756cfc9ddd5a
  GitCommit:
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
  • Are you using Docker Swarm or Kubernetes (FaaS-netes)?
    Kubernetes

  • Operating System and version (e.g. Linux, Windows, MacOS):
    MacOS Catalina. v 10.15.2

  • Code example or link to GitHub repo or gist to reproduce problem:
    Provided it all above 馃槃

  • Other diagnostic information / logs from troubleshooting guide

Initially I kind of assumed that the invocation count comes from prometheus, later it turned out to be the truth. So, I was seeing Grafana and Prometheus and seeing the metrics for invocation count. This is what Prometheus shows when I try what I have described above:

prometheus metrics

You can see below how the CLI shows the number as 6 for wordcount function, in both namespaces. My default is openfaas-fn when nothing is provided

Screen Shot 2019-12-21 at 9 28 46 PM

In prometheus, the metric which has the value 6 is this:

gateway_function_invocation_total{app="gateway",code="200",function_name="wordcount",instance="172.17.0.17:8082",job="kubernetes-pods",kubernetes_namespace="openfaas",kubernetes_pod_name="gateway-6c94b87f84-xhqzb",pod_template_hash="6c94b87f84"}   6

If you notice the function_name label value, it's wordcount. This is the count for the number of invocations from the CLI with $ faas invoke wordcount

And there are two other metrics with different values, which relate to wordcount function

gateway_function_invocation_total{app="gateway",code="200",function_name="wordcount.another-ns",instance="172.17.0.17:8082",job="kubernetes-pods",kubernetes_namespace="openfaas",kubernetes_pod_name="gateway-6c94b87f84-xhqzb",pod_template_hash="6c94b87f84"}    11
gateway_function_invocation_total{app="gateway",code="200",function_name="wordcount.openfaas-fn",instance="172.17.0.17:8082",job="kubernetes-pods",kubernetes_namespace="openfaas",kubernetes_pod_name="gateway-6c94b87f84-xhqzb",pod_template_hash="6c94b87f84"}   22

Notice the function_name label values, it's wordcount.another-ns and wordcount.openfaas-fn. This is the count of invocations that happened when I invoked in web UI.

But it doesn't show up as that though, it shows up as 6

Screen Shot 2019-12-21 at 9 34 11 PM
Screen Shot 2019-12-21 at 9 34 00 PM

On checking a bit of code for how the metrics comes, some assumptions and intuitions based on input and output and how it's all related and connecting the dots, this is what I can say:

The key difference is how the request goes to the gateway. When I do a CLI invocation

$ faas invoke wordcount

gateway log is like

gateway-6c94b87f84-xhqzb gateway 2019/12/21 16:08:10 Forwarded [POST] to /function/wordcount - [200] - 0.019437 seconds

and for following invocations

$ faas invoke wordcount -n openfaas-fn

or invoking in web UI portal in openfaas-fn namespace

the gateway log is like

gateway-6c94b87f84-xhqzb gateway 2019/12/21 16:07:04 Forwarded [POST] to /function/wordcount.openfaas-fn - [200] - 0.033784 seconds

for another-ns

$ faas invoke wordcount -n another-ns

or invoking in web UI portal in another-ns namespace

the gateway log is like

gateway-6c94b87f84-xhqzb gateway 2019/12/21 16:10:22 Forwarded [POST] to /function/wordcount.another-ns - [200] - 0.017615 seconds

The invocation count is shown using the response from the gateway for list functions API. Checking the code, gateway uses the following code to find the invocation count using the prometheus metrics data

https://github.com/openfaas/faas/blob/03dc8824d2074d0852fe7123e41ac5baef5709a1/gateway/server.go#L155

https://github.com/openfaas/faas/blob/df97efafae36ce7093ad353e3e6acc0e93d6300e/gateway/metrics/add_metrics.go#L53-L55

https://github.com/openfaas/faas/blob/df97efafae36ce7093ad353e3e6acc0e93d6300e/gateway/metrics/add_metrics.go#L64

https://github.com/openfaas/faas/blob/df97efafae36ce7093ad353e3e6acc0e93d6300e/gateway/metrics/add_metrics.go#L91

You can see how the metric label's function name and the function name from the provider (? not sure about the term 馃槄) are matched, without considering namespace. So, seeing the above prometheus metrics data, naturally the value 6 will come, no matter what namespace the user is looking at in CLI or web ui portal.

So, that's one issue, reading of the invocations count data. I think to fix it - just adding the namespace along with name like <name>.<namespace> should work. And a test for it too!

Next issue is, how did the wrong data even get into prometheus in the first place? There are three sets of invocation counts, but only two namespaces. The metric with label function_name=wordcount is not a correct one, there should always be a namespace to be specific about which namespace the count refers to, even though if there's just one. I guess I'm right about this, considering every function Must be in a namespace and it's very clear that multiple namespaces is supported, namespace must be part of the label value. Do correct if I'm missing something 馃槄

And checking the code, the gateway is what exposes the metrics at port 8082. And prometheus scrapes these metrics.

Looking at the metrics data, the metrics seem to be right for web UI portal invocations and for CLI invocations with namespace flag, except for the one with function_name=wordcount label, which got created from CLI invocations without namespace flag. This is how the gateway logs look for such a case

gateway-6c94b87f84-xhqzb gateway 2019/12/21 16:08:10 Forwarded [POST] to /function/wordcount - [200] - 0.019437 seconds

Now why does this log matter? The url path in this matters, which is /function/wordcount

My guess based on the code - when requests are made, they go through these parts of the code

https://github.com/openfaas/faas/blob/03dc8824d2074d0852fe7123e41ac5baef5709a1/gateway/server.go#L204-L206

https://github.com/openfaas/faas/blob/03dc8824d2074d0852fe7123e41ac5baef5709a1/gateway/server.go#L110

functionNotifiers is here and has prometheus in it

https://github.com/openfaas/faas/blob/03dc8824d2074d0852fe7123e41ac5baef5709a1/gateway/server.go#L83

And the notify call is made here

https://github.com/openfaas/faas/blob/238ce1be23c327bcb0dc1c1b83e3c623d65850d2/gateway/handlers/forwarding_proxy.go#L69

And for prometheus, the implementation is here

https://github.com/openfaas/faas/blob/238ce1be23c327bcb0dc1c1b83e3c623d65850d2/gateway/handlers/notifiers.go#L49

and this is where the service name is obtained

https://github.com/openfaas/faas/blob/238ce1be23c327bcb0dc1c1b83e3c623d65850d2/gateway/handlers/notifiers.go#L51

https://github.com/openfaas/faas/blob/238ce1be23c327bcb0dc1c1b83e3c623d65850d2/gateway/handlers/notifiers.go#L76

And here is where the metric is created for prometheus to scrape

https://github.com/openfaas/faas/blob/238ce1be23c327bcb0dc1c1b83e3c623d65850d2/gateway/handlers/notifiers.go#L59-L61

This is all good, but I think the service name will be wordcount when url is /function/wordcount, but it will be wordcount.openfaas-fn when url is /function/wordcount.openfaas-fn, and so metric also will be wrong.

The following is speculation - Have to check CLI code for this.

To fix this, I think the CLI has to make calls with namespace in the url path like /function/wordcount.openfaas-fn if openfaas-fn is the default namespace. I think this is not happening, but it still works because behind the scenes, even with /function/wordcount, as the default namespace function is taken up somehow and it all works, but to make the invocation count work, we might have to pull in namespaces and use the first one as the default according to this idea and then use that for the request

I'll check CLI code next to understand better 馃槃 and also check web UI portal code, and then post more here about my findings

Most helpful comment

Hi @karuppiah7890 i also noticed this the other day and was thinking about the correct fix. I think the most direct way top to fix this can be found in the scaling handler https://github.com/openfaas/faas/blob/b9a3476bcaa741f9e60bc6e9c5c2561424daeb23/gateway/handlers/scaling.go#L33

We should pass the default namespace to the PrometheusNotifier struct https://github.com/openfaas/faas/blob/238ce1be23c327bcb0dc1c1b83e3c623d65850d2/gateway/server.go#L75-L77 so that it knows what the default namespace is. Similar to how we do it here https://github.com/openfaas/faas/blob/238ce1be23c327bcb0dc1c1b83e3c623d65850d2/gateway/server.go#L94

And then using the same methods as in the scaling handler, we can then set the metric correctly.

Additionally, the AddMetricsHandler will need to be updated, once we are consistently setting the namespace. We either need to add the namespace to prometheus query https://github.com/openfaas/faas/blob/df97efafae36ce7093ad353e3e6acc0e93d6300e/gateway/metrics/add_metrics.go#L53-L55 or we need to pass the namespace to the mixing https://github.com/openfaas/faas/blob/df97efafae36ce7093ad353e3e6acc0e93d6300e/gateway/metrics/add_metrics.go#L78 so it can filter the metrics for the requested namespace.

The namespace would need to be selected from the GET parameters and looks like this https://github.com/openfaas/faas-netes/blob/7a645a75749a130da130fc8fc77884e712fbac5e/handlers/reader.go#L25-L31 This of course requires constructing the handler so that it knows the default namespace, passing it here https://github.com/openfaas/faas/blob/b9a3476bcaa741f9e60bc6e9c5c2561424daeb23/gateway/metrics/add_metrics.go#L17

The two changes _should_ fix the issue of saving and the selecting the correct metric per namespace, including the default namespace. I think this is the shortest and easiest way to fix the behavior. Long term, we might also want to consider adding the namespace to the prometheus labels.

In summary:

  1. pass the default namespace to PrometheusNotifier
  2. pass the default namespace to AddMetricsHandler
  3. During notify, ensure that the function name is formatted as <function name>.<namespace>, inserting the default namespace as needed
  4. during the metrics mixin, make sure that we filter the metrics correctly by reading the requested namespace from the GET parameters and then filtering thee query/response from prometheus

All 4 comments

@LucasRoesler @Waterdrips PTAL

/set title: Function invocation incorrect for multiple namespaces

Hi @karuppiah7890 i also noticed this the other day and was thinking about the correct fix. I think the most direct way top to fix this can be found in the scaling handler https://github.com/openfaas/faas/blob/b9a3476bcaa741f9e60bc6e9c5c2561424daeb23/gateway/handlers/scaling.go#L33

We should pass the default namespace to the PrometheusNotifier struct https://github.com/openfaas/faas/blob/238ce1be23c327bcb0dc1c1b83e3c623d65850d2/gateway/server.go#L75-L77 so that it knows what the default namespace is. Similar to how we do it here https://github.com/openfaas/faas/blob/238ce1be23c327bcb0dc1c1b83e3c623d65850d2/gateway/server.go#L94

And then using the same methods as in the scaling handler, we can then set the metric correctly.

Additionally, the AddMetricsHandler will need to be updated, once we are consistently setting the namespace. We either need to add the namespace to prometheus query https://github.com/openfaas/faas/blob/df97efafae36ce7093ad353e3e6acc0e93d6300e/gateway/metrics/add_metrics.go#L53-L55 or we need to pass the namespace to the mixing https://github.com/openfaas/faas/blob/df97efafae36ce7093ad353e3e6acc0e93d6300e/gateway/metrics/add_metrics.go#L78 so it can filter the metrics for the requested namespace.

The namespace would need to be selected from the GET parameters and looks like this https://github.com/openfaas/faas-netes/blob/7a645a75749a130da130fc8fc77884e712fbac5e/handlers/reader.go#L25-L31 This of course requires constructing the handler so that it knows the default namespace, passing it here https://github.com/openfaas/faas/blob/b9a3476bcaa741f9e60bc6e9c5c2561424daeb23/gateway/metrics/add_metrics.go#L17

The two changes _should_ fix the issue of saving and the selecting the correct metric per namespace, including the default namespace. I think this is the shortest and easiest way to fix the behavior. Long term, we might also want to consider adding the namespace to the prometheus labels.

In summary:

  1. pass the default namespace to PrometheusNotifier
  2. pass the default namespace to AddMetricsHandler
  3. During notify, ensure that the function name is formatted as <function name>.<namespace>, inserting the default namespace as needed
  4. during the metrics mixin, make sure that we filter the metrics correctly by reading the requested namespace from the GET parameters and then filtering thee query/response from prometheus

A fix is being worked on by @viveksyngh in https://github.com/openfaas/faas/pull/1488 - I'm not sure if it follows the above 1:1, but we are reviewing it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alexellis picture alexellis  路  7Comments

alexellis picture alexellis  路  5Comments

edouardkleinhans picture edouardkleinhans  路  8Comments

VenkateshSrini picture VenkateshSrini  路  7Comments

jvice152 picture jvice152  路  7Comments