K3s: Getting Real Client IP with k3s

Created on 18 Apr 2020  路  59Comments  路  Source: k3s-io/k3s

Is your feature request related to a problem? Please describe.
I am unable to obtain Real Client IP when using k3s and Traefik v2.2. I always get the cluster IP.

Kernel version
4.4.0-174-generic
OS Image
Ubuntu 16.04.6 LTS
Container runtime version
containerd://1.3.0-k3s.4
kubelet version
v1.16.3-k3s.2
kube-proxy version
v1.16.3-k3s.2
Operating system
linux
Architecture
amd64


Images
traefik:2.2.0

Describe the solution you'd like
I would like to obtain the client IP.

Describe alternatives you've considered
I already set
externalTrafficPolicy: Local
in Traefik's Service.
Additional context
Issue can be reproduced by deploying containous/whoami image in cluster
Expected Response

Hostname: a19d325823bb
IP: 127.0.0.1
IP: 10.0.0.147
IP: 172.18.0.4
RemoteAddr: 10.0.0.144:56246
GET / HTTP/1.1
Host: whoami.civo.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Upgrade-Insecure-Requests: 1
X-Apache-Ip: 102.69.228.66
X-Forwarded-For: 102.69.228.66, 102.69.228.66, 172.18.0.1
X-Forwarded-Host: whoami.civo.com
X-Forwarded-Port: 443
X-Forwarded-Proto: https
X-Forwarded-Server: bc3b51f28353
X-Real-Ip: 102.69.228.66

Current Response

Hostname: whoami-76d6dfb846-jltlm
IP: 127.0.0.1
IP: ::1
IP: 192.168.0.33
IP: fe80::7863:88ff:fe45:2ad5
RemoteAddr: 192.168.1.4:36146
GET / HTTP/1.1
Host: who.civo.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Upgrade-Insecure-Requests: 1
X-Forwarded-For: 192.168.0.5
X-Forwarded-Host: who.civo.com
X-Forwarded-Port: 443
X-Forwarded-Proto: https
X-Forwarded-Server: traefik-8477c7d8f-fbhdg
X-Real-Ip: 192.168.0.5

Service LoadBalancer Logs

+ trap exit TERM INT
/usr/bin/entry: line 6: can't create /proc/sys/net/ipv4/ip_forward: Read-only file system
+ echo 1
+ true
+ cat /proc/sys/net/ipv4/ip_forward
+ '[' 1 '!=' 1 ]
+ iptables -t nat -I PREROUTING '!' -s 192.168.183.229/32 -p TCP --dport 8080 -j DNAT --to 192.168.183.229:8080
+ iptables -t nat -I POSTROUTING -d 192.168.183.229/32 -p TCP -j MASQUERADE
+ '[' '!' -e /pause ]
+ mkfifo /pause

Related
https://github.com/rancher/k3s/pull/955
Related Discussion
https://github.com/rancher/k3s/issues/679#issuecomment-516367437

@erikwilson @btashton

Most helpful comment

A DaemonSet or just using a regular Deployment in the end doesn't matter in terms of getting the real client IP. It just determines how many pods there will be and on which nodes they run. DaemonSet + NodeSelector can be a nice choice.

The key point is to remove the Loadbalancer service that is using Klipper* and using HostPort or HostNetwork to directly bind to the external IP of the node.

In the end I settled on a Deployment with HostPort for my Traefik instances and using external-dns to automatically update DNS A records for the domain pointing to the HTTP ingress points.

*: Because it masks all the real IPs due to its NAT and does not support the Proxy Protocol.

All 59 comments

I can validate that this works properly for me with Traefik 2.2 in a 3 node Ubuntu cluster with externalTrafficPolicy: Local. Is there anything else unique about your configuration?

Hey @brandond I don't believe so. However, I'm using a managed cloud k3s service.
What is your environment?

Just 3 bare metal nodes running k3s on Ubuntu 19.10. Flannel is in host-gw mode, using Traefik 2.2 for ingress, and MetalLB in bgp mode for external services.

Thanks @brandond
Are using Type=LoadBalancer for Traefik Service?
Could you share your k3s startup config and MetalLB ConfigMap?

Hey @brandond , any update on this?

@jawabuu I'm now using Traefik 3.1 with the KubernetesCRD provider

here's my traefik service:

---
apiVersion: v1
kind: Service
metadata:
  name: traefik
  namespace: traefik
  labels:
    app: traefik
    chart: "traefik-3.1.0"
    release: "traefik"
    heritage: "Helm"
spec:
  type: LoadBalancer
  selector:
    app: traefik
    release: traefik
  ports:
  - port: 80
    name: web
    targetPort: "web"
  - port: 443
    name: websecure
    targetPort: "websecure"
  - port: 9000
    name: traefik
    targetPort: "traefik"
  loadBalancerIP: 10.0.3,80
  loadBalancerSourceRanges:
  - "0.0.0.0/0"
  externalTrafficPolicy: Local

For metallb, I found that using bgp works best. I have a Ubiquiti USG as my router, and set it up to peer with all three of my nodes.

---
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    peers:
    - peer-address: 10.0.1.1
      peer-asn: 64512
      my-asn: 64512
      hold-time: 120s
    address-pools:
    - name: manual
      auto-assign: false
      protocol: bgp
      addresses:
      - 10.0.3.10-10.0.3.99
    - name: default
      auto-assign: true
      protocol: bgp
      addresses:
      - 10.0.3.100-10.0.3.254

Lucky me, I also use the Ubiquiti USG as my router so I am also interested how to peer with all three of your nodes.

Here's what /unifi/data/sites/default/config.gateway.json on my unifi pod looks like. I'm using coredns k8s_external to expose services to LAN hosts; from outside the LAN I have ports forwarded to the Traefik ingress for stuff that I actually want to publish.

{
  "protocols": {
    "bgp": {
      "64512": {
        "neighbor": {
          "10.0.1.20": {
            "remote-as": "64512"
          },
          "10.0.1.21": {
            "remote-as": "64512"
          },
          "10.0.1.22": {
            "remote-as": "64512"
          }
        },
        "parameters": {
          "router-id": "10.0.1.1"
        }
      }
    }
  },
  "service": {
    "dns": {
      "forwarding": {
        "cache-size": "10000",
        "except-interface": [
          "eth0"
        ],
        "options": [
          "filterwin2k",
          "local-ttl=60",
          "host-record=usg.khaus,10.0.1.1",
          "host-record=sw-core.khaus,10.0.1.2",
          "host-record=ap-house.khaus,10.0.1.6",
          "host-record=ap-garage.khaus,10.0.1.7",
          "host-record=seago.khaus,10.0.1.20",
          "host-record=maersk.khaus,10.0.1.21",
          "host-record=sealand.khaus,10.0.1.22",
          "srv-host=_etcd-client._tcp,seago.khaus,2379",
          "srv-host=_etcd-client._tcp,maersk.khaus,2379",
          "srv-host=_etcd-client._tcp,sealand.khaus,2379",
          "srv-host=_etcd-server._tcp,seago.khaus,2380",
          "srv-host=_etcd-server._tcp,maersk.khaus,2380",
          "srv-host=_etcd-server._tcp,sealand.khaus,2380",
          "server=/k3s.khaus/10.0.3.53",
          "server=75.75.75.75",
          "server=75.75.76.76"
        ]
      }
    }
  }
}

Hey @brandond I have narrowed down the issue - as best as I could :-) to using flannel as CNI.
Calico presents no such issues. Scouring other threads also shows the following which I can confirm.

https://github.com/kubernetes/kubernetes/issues/56934#issuecomment-588158416

https://github.com/rancher/k3s/issues/1175#issuecomment-605642062

https://github.com/rancher/k3s/issues/1175#issuecomment-617765307

https://github.com/jetstack/cert-manager/issues/2811

https://github.com/coreos/flannel/issues/1243

Calico works with Traefik 2.2 and MetalLB to resolve Real Client IP without any further configuration.

@jawabuu Those comments all seem to be about routing issues, not obtaining the original client IP? Either way, I'm using flannel in host-gw mode with traefik and metallb and getting the original address in the headers, so I know that it is doable. All you have to do is use externalTrafficPolicy: Local on the traefik service.

Hey @brandond you are correct. The comments are just meant to provide context that the default installation of the latest k3s with flannel may be having some issues.
I was also able to resolve Real IP using host-gw mode but only if I used a single node. Pods on different nodes could not reach each other.
I am also thinking of a more common scenario where people deploy k3s on a cloud server or where the user does not have access to configure router rules.
I would propose providing documentation that instructs how one can obtain Real-IP in k3s.

  1. Use externalTrafficPolicy: Local in Traefik Service or any service (Mandatory)
  2. Use;
    a) host-gw mode when using flannel with some configuration ,or,
    b) calico as CNI. This requires no further configuration

Ideally points 2 and 3 should be tested to identify edge cases.

thanks to both of you @brandond and @jawabuu
the real IP is so important for some web services like Matomo

Hey @brandond, upon further testing with flannel, I have found out that it is not necessary to use host-gw mode - which breaks in cloud environments anyway - to obtain source ip.
The appropriate flag is - --ip-masq=false but due to the embedded nature of k3s flannel similar to
https://github.com/rancher/k3s/issues/72
one needs to deploy k3s with --flannel-backend=none then manually deploy flannel and set it in the kube-flannel manifest.
Would you consider making - --ip-masq=false the default or adding a flag to set this easily?

i have the same issue using --flannel-backend=wireguard and setting externalTrafficPolicy: Local doesnt change anything, I have tried NodePort and LoadBalancer. Any ideas?

Hey @mschneider82 You will not be able to get source ip using the embedded flannel in k3s.
You need to use --flannel-backend=none and deploy your own flannel manifest with config - --ip-masq=false

See below

      containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.12.0-amd64
        command:
        - /opt/bin/flanneld
        args:
        - --ip-masq=false
        - --kube-subnet-mgr
        - --iface=${interface}

@jawabuu thanks for your answer, i can see it's is staticly set to true in source:
https://github.com/rancher/k3s/blob/master/pkg/agent/flannel/flannel.go#L74

The function has a option to set it to false. I will try it with "false" and compile k3s

Let me know if you are able to resolve source IP with that.

setting it to false results in:

# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.42.0.0/16
FLANNEL_SUBNET=10.42.0.1/24
FLANNEL_MTU=1500
FLANNEL_IPMASQ=false

but still not the correct real-ip :-(

Which deployment are you using to check source ip?

i am using the whoami as a backend

X-Forwarded-For: 10.42.0.0
X-Forwarded-Host: xxx
X-Forwarded-Port: 443
X-Forwarded-Proto: https
X-Forwarded-Server: traefik-86886d7bf-szmjz
X-Real-Ip: 10.42.0.0

What are you using as Ingress?

I have tried LoadBalancer and also NodePort, also wiped/rebooted the whole cluster. No success

Hey @mschneider82 You will not be able to get source ip using the embedded flannel in k3s.
You need to use --flannel-backend=none and deploy your own flannel manifest with config - --ip-masq=false

See below

      containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.12.0-amd64
        command:
        - /opt/bin/flanneld
        args:
        - --ip-masq=false
        - --kube-subnet-mgr
        - --iface=${interface}

Please try this then afterward compare with your compiled k3s.

Hey @jawabuu , It doesnt work for me with manual installed flannel / patched k3s binary:

  1. install k3s with --flannel-backend=none --no-deploy=servicelb --disable=traefik
  2. download https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml -> edited to --ip-masq=false appyled
  3. install traefik helm install traefik traefik/traefik
  4. kubectl edit svc traefik -> added
  externalIPs: 
  - 123.123.123.123
  externalTrafficPolicy: Local
  1. apply traefik configs https://gist.githubusercontent.com/mschneider82/71abaf55328627f2208e77ca9d802f9e/raw/0b1517a27daff82040b873727d7225d76f15ffd9/gistfile1.txt

any idea what i am doing wrong?

Use official/non-patched k3s binary

Share this output
kubectl -n <traefik namespace> get svc/traefik -o yaml

Hey @mschneider82 I believe you are also going to need --disable servicelb

@jawabuu i am on a clean installed: curl -sfL https://get.k3s.io | sh -s - server --flannel-backend=none --no-deploy=servicelb --disable=traefik --disable servicelb

installed flannel with ip-masq=false

output share:

apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: traefik
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2020-06-16T12:37:43Z"
  labels:
    app.kubernetes.io/instance: traefik
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: traefik
    helm.sh/chart: traefik-8.5.0
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:meta.helm.sh/release-name: {}
          f:meta.helm.sh/release-namespace: {}
        f:labels:
          .: {}
          f:app.kubernetes.io/instance: {}
          f:app.kubernetes.io/managed-by: {}
          f:app.kubernetes.io/name: {}
          f:helm.sh/chart: {}
      f:spec:
        f:ports:
          .: {}
          k:{"port":80,"protocol":"TCP"}:
            .: {}
            f:name: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
          k:{"port":443,"protocol":"TCP"}:
            .: {}
            f:name: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
        f:selector:
          .: {}
          f:app.kubernetes.io/instance: {}
          f:app.kubernetes.io/name: {}
        f:sessionAffinity: {}
        f:type: {}
    manager: Go-http-client
    operation: Update
    time: "2020-06-16T12:37:43Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:externalIPs: {}
        f:externalTrafficPolicy: {}
    manager: kubectl
    operation: Update
    time: "2020-06-16T12:39:39Z"
  name: traefik
  namespace: default
  resourceVersion: "852"
  selfLink: /api/v1/namespaces/default/services/traefik
  uid: 3e9e738f-e971-4f6b-b725-02b99907225e
spec:
  clusterIP: 10.43.93.173
  externalIPs:
  - 94.1.11.9
  externalTrafficPolicy: Local
  healthCheckNodePort: 31139
  ports:
  - name: web
    nodePort: 30625
    port: 80
    protocol: TCP
    targetPort: web
  - name: websecure
    nodePort: 32363
    port: 443
    protocol: TCP
    targetPort: websecure
  selector:
    app.kubernetes.io/instance: traefik
    app.kubernetes.io/name: traefik
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer: {}

still getting X-Real-Ip: 10.42.0.1

@mschneider82
Try the following;

  1. On your host
echo net.ipv4.ip_forward=1 >> /etc/sysctl.conf
sysctl -p
  1. Deploy metalLB and use that to assign the host IP to the LoadBalncer Service.

thanks, with metallb and using the hostip /32 as address pool it works!

Hey @mschneider82 Glad to hear that.
@brandond Any idea why k3s traefik servicelb would prevent correct resolution of source IP?

i have it also working with my patched k3s binary and metallb without installing flannel manually

just with:
sh -s - server --disable=traefik

Excellent, so the issue in this case is the servicelb

ok with the official binary it works too, it says FLANNEL_IPMASQ=true, but this doesnt matter. the solution is simple that metallb uses the host ip addr. I can see the x-real-ip was set correctly.

I have installed curl -sfL https://get.k3s.io | sh -s - server --disable=traefik --flannel-backend=wireguard

this is still important:

 externalTrafficPolicy: Local

after editing the externalTrafficPolicy with kubectl edit svc traefik , the traefik pods needs to be deleted to get the externalTrafficPolicy setting

Can you make it work without changing the flannel backend wgich is vxlan by default?

yes it worked with vxlan, too

So as a step by step, can you outline the steps with minimal changes required as a guide for other users?

```
curl -sfL https://get.k3s.io | sh -s - server --disable=traefik

install metallb

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml

On first install only

kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"

create config.yml for metallb:

apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: default
protocol: layer2
addresses:
- 123.123.123.123/32 <--- this is the external. HOST IP

apply that metallconf file

kube apply -f config.yml

thats needed by helm:

export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
helm repo add traefik https://containous.github.io/traefik-helm-chart
helm install traefik traefik/traefik

kubectl edit svc traefik
-> change Cluster to Local:
externalTrafficPolicy: Local

set some traefik config example from
https://gist.githubusercontent.com/mschneider82/71abaf55328627f2208e77ca9d802f9e/raw/0b1517a27daff82040b873727d7225d76f15ffd9/gistfile1.txt

@mschneider82 Thank you!! Seems servicelb without metallb would still make you unable to resolve source ip. Would be great if we could get it working with servicelb.

I'm not saying this is a good way to do it, but seems to work, it will bind traefik directly to the hosts (host being the node running the traefik pod) port 443/80 and possibly 8080 (can remove that probably) you will need to allow traffic to these ports in the input chain of your firewall.

I did this because I was running on a machine with only 1 ip so metalLB probably won't have worked.

1) disable traefik

2) Save the following as a yaml file and apply it with kubectl

Traefik v1.5 had hostnetwork set but they removed it in the later docs, I managed to fumble my way though enough of the config to combine the deamonset config from v1.5 with the newer traefik 2.2

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ingressroutes.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: IngressRoute
    plural: ingressroutes
    singular: ingressroute
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: middlewares.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: Middleware
    plural: middlewares
    singular: middleware
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ingressroutetcps.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: IngressRouteTCP
    plural: ingressroutetcps
    singular: ingressroutetcp
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ingressrouteudps.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: IngressRouteUDP
    plural: ingressrouteudps
    singular: ingressrouteudp
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: tlsoptions.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: TLSOption
    plural: tlsoptions
    singular: tlsoption
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: tlsstores.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: TLSStore
    plural: tlsstores
    singular: tlsstore
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: traefikservices.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: TraefikService
    plural: traefikservices
    singular: traefikservice
  scope: Namespaced

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: traefik-ingress-controller

rules:
  - apiGroups:
      - ""
    resources:
      - services
      - endpoints
      - secrets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
    resources:
      - ingresses/status
    verbs:
      - update
  - apiGroups:
      - traefik.containo.us
    resources:
      - middlewares
      - ingressroutes
      - traefikservices
      - ingressroutetcps
      - ingressrouteudps
      - tlsoptions
      - tlsstores
    verbs:
      - get
      - list
      - watch

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: traefik-ingress-controller

roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: traefik-ingress-controller
subjects:
  - kind: ServiceAccount
    name: traefik-ingress-controller
    namespace: traefik

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: traefik-ingress-controller
  namespace: traefik
---
kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: traefik-ingress-controller
  namespace: traefik
  labels:
    k8s-app: traefik-ingress-lb
spec:
  selector:
    matchLabels:
      k8s-app: traefik-ingress-lb
      name: traefik-ingress-lb
  template:
    metadata:
      labels:
        k8s-app: traefik-ingress-lb
        name: traefik-ingress-lb
    spec:
      serviceAccountName: traefik-ingress-controller
      terminationGracePeriodSeconds: 60
      hostNetwork: true
      containers:
      - image: traefik:v2.2
        name: traefik-ingress-lb
        ports:
        - name: http
          containerPort: 80
          hostPort: 80
        - name: https
          containerPort: 443
          hostPort: 443
        - name: admin
          containerPort: 8080
          hostPort: 8080
        securityContext:
          capabilities:
            drop:
            - ALL
            add:
            - NET_BIND_SERVICE
        args:
        - --api
        - --providers.kubernetescrd
        - --log.Level=DEBUG
        - --entryPoints.websecure.address=:443
        - --entryPoints.web.address=:80

Edit: Removed unneeded nodeport svc

I ran into the same problems... Sadly I was unable to get @mschneider82 's configuration to work.

However!! The configuration by @dragon2611 provided a critical clue: hostNetwork: true
With this _vital_ clue, I was able to get my kubernetes deployment running _on a single system_, with the Source IP visible.

I will include my setup which allow traefik and nginx to work, but with a major caveat that you cannot run nginx through a load balancer setup. That is to say, each node in the network will have a public facing ingress instancing. You are still free to use DNS-based load balancing or other software/hardware stacks on top.

Note: I am unable to use servicelb with nginx because it creates extra containers called svclb-nginx-nginx-ingress-controller-... which steal the ports... though you can hack it to work by killing those pods. Below, we will be installing metallb which seems to play nicer with our setup. (NOTE: Traefik hostNetwork:true does not have the extra svclb containers, and might actually support servicelb, but I do not recall if this configuration worked. Try at your own risk)

Installing k3s and nginx or traefik

Alright, here goes... Note that you can replace nginx with traefik and the same patches should work. I was able to get traefik working.

curl -sfL https://raw.githubusercontent.com/rancher/k3s/master/install.sh | sh -s - server -o /root/.kube/config --no-deploy=servicelb --disable=traefik --disable=servicelb

sudo helm repo update
sudo helm install nginx stable/nginx-ingress --namespace kube-system

sudo kubectl patch svc/nginx-nginx-ingress-controller -n kube-system --patch '{"spec":{"externalTrafficPolicy":"Local"}}'
sudo kubectl patch deployments/nginx-nginx-ingress-controller --patch '{"spec":{"template":{"spec":{"hostNetwork":true}}}}' -n kube-system

sudo kubectl get replicasets -n kube-system
# Find the oldest nginx-nginx-ingress-controller such as ABCDEFG-fooba and delete with
sudo kubectl delete replicasets -nginx-ingress-controller-ABCDEFG-fooba -n kube-system

The last bit seems to be a bug with nginx, that a duplicate replicaset is created after the deployment is patched. On traefik, that step is not necessary.

Installing metallb

The http server will actually be working without this step. If you only intend to run HTTP facing services, you may be able to skip installing a load balancer. The LoadBalancer service will show external IP but should already be functional.

However, some pods may require a functioning load balancer, so this will install a 1-node metallb setup, which should get your k3s install fully operational.

Create metalconfig.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: default
      protocol: layer2
      addresses:
      - 345.456.567.678/32 # <--- this is the external. HOST IP

Now run:

sudo kubectl create namespace metallb-system
sudo kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
sudo kubectl apply -f metalconfig.yml
sudo kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml

The version above (v0.9.3) may be outdated in the future. Consider replacing with main or the newest version found at https://github.com/metallb/metallb/tags

Anyway!! Hope this helps anyone who has stumbled across this issue and saves you the 2 solid days I spent trying things!! <3

This also occurs for me when using wormhole instead of flannel and ingress-nginx instead of traefik. Seems to me like a iptables issue where its mangling the source IP and unrelated to CNI and ingress controllers, given it happens no matter what CNI and ingress controller you use.

I ran a quick test using the following manifest:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: echo
spec:
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: echo
  template:
    metadata:
      labels:
        app: echo
    spec:
      containers:
        - name: echo
          image: mendhak/http-https-echo
          ports:
            - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: echo
spec:
  selector:
    app: echo
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
      name: http
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: echo
spec:
  rules:
    - host:
      http:
        paths:
          - path:
            backend:
              serviceName: echo
              servicePort: 80

Both tests running v1.18.6+k3s1.

Fresh install (Flannel+Traefik):

{
  "path": "/",
  "headers": {
    "host": "test.test.test",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.119 Safari/537.36",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "accept-encoding": "gzip, deflate",
    "accept-language": "en-US,en;q=0.9",
    "cache-control": "max-age=0",
    "if-none-match": "W/\"42e-schQ9K+CMtz0rs2iUmCFhulmyis\"",
    "upgrade-insecure-requests": "1",
    "x-forwarded-for": "10.42.0.1",
    "x-forwarded-host": "test.test.test",
    "x-forwarded-port": "80",
    "x-forwarded-proto": "http",
    "x-forwarded-server": "traefik-758cd5fc85-lmgq6",
    "x-real-ip": "10.42.0.1"
  },
  "method": "GET",
  "body": "",
  "fresh": false,
  "hostname": "test.test.test",
  "ip": "10.42.0.1",
  "ips": [
    "10.42.0.1"
  ],
  "protocol": "http",
  "query": {},
  "subdomains": [],
  "xhr": false,
  "os": {
    "hostname": "echo-b7dc887f-whm7m"
  },
  "connection": {}
}

Wormhole+nginx-ingress (no-traefik, no-flannel)

{
  "path": "/",
  "headers": {
    "host": "test.test.test",
    "x-request-id": "559dd3558e9028d1f2bfc688d6df968f",
    "x-real-ip": "10.42.0.1",
    "x-forwarded-for": "10.42.0.1",
    "x-forwarded-host": "test.test.test",
    "x-forwarded-port": "443",
    "x-forwarded-proto": "https",
    "x-scheme": "https",
    "upgrade-insecure-requests": "1",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.119 Safari/537.36",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "sec-fetch-site": "none",
    "sec-fetch-mode": "navigate",
    "sec-fetch-user": "?1",
    "sec-fetch-dest": "document",
    "accept-encoding": "gzip, deflate, br",
    "accept-language": "en-US,en;q=0.9",
    "if-none-match": "W/\"4e5-oFsNc1sm115PskYyy/wmrJhZNLE\"",
  },
  "method": "GET",
  "body": "",
  "fresh": false,
  "hostname": "test.test.test",
  "ip": "10.42.0.1",
  "ips": [
    "10.42.0.1"
  ],
  "protocol": "https",
  "query": {},
  "subdomains": [],
  "xhr": false,
  "os": {
    "hostname": "echo-b7dc887f-tmb69"
  },
  "connection": {}
}

I hit the same issue. Here's how I solved it:

My Environment:

  • Proxmox VE, install K3s with default options curl -sfL https://get.k3s.io | sh - (no VM, just plain on Proxmox itself)
  • Single node "cluster"
  • I'm on v1.17.4 atm

My steps:

  1. I enabled accesslog in Traefik with (kubectl -n kube-system edit cm traefik), saw the IP 10.42.0.1 as Client
  2. Set externalTrafficPolicy=Local with kubectl -n kube-system edit svc traefik
  3. Delete the traefik pod kubectl -n kube-system scale deploy/traefik --replicas=0 && kubectl -n kube-system scale deploy/traefik --replicas=1

Although that is almost the same as the initial comment stated what was done... Maybe it doesn't work in all cases?

@ccremer thank you for summing up, it worked.

To make these changes persistent (< 1.19) you can:

  1. Edit /var/lib/rancher/k3s/server/manifests/traefik.yaml
  2. Add externalTrafficPolicy: Local to valuesContent section

Helm will reload the configuration automatically.

Note that your changes will get lost when you restart k3s...

The sum up of @ccremer doesn't seem to work

My Environment:

OS: k3OS v0.11.0 5.4.0-37-generic
Kubelet Version: v1.18.6+k3s1
Kube Proxy Version: v1.18.6+k3s1
Containerd Version: 1.3.3-k3s2

Is there a reason why the hostNetwork=true approach I've described earlier doesn't work?

@lcotonea what is your load balancer setup? Did you try scaling down and back up? got more details?

Rancher 2 on home hosting by using K3S.

  1. Traeffik with :
  2. Edit /var/lib/rancher/k3s/server/manifests/traefik.yaml
  3. Add externalTrafficPolicy: Local to valuesContent section
  4. No reboot to keep configuration
  5. Scale UP & DOWN traeffik to insure setting application
    => The remote IP is always see as 10.42.0.1

  6. L7 configuration. Not a good idea because I want to expose a TCP network port (2222, it's a SSH service)

  7. L4 : not possible (home hosting)

  8. The basis : a HostPort. It works.

Same problem here
k3s
traefik
externalTrafficPolicy: Local, applied to Service, but nothing works

@lcotonea , any idea how to make this change permanent? your solution works, but only until reboot

I've also spent quite a few days trying to get this to work. I am using the helm chart for Traefik v2. My goal was to expose Traefik directly to the internet because I am not running inside a cloud environment that can give proprietary loadbalancers and Traefik is a loadbalancer after all. I didn't want a loadbalancer (implemented via NAT) in front of the actual loadbalancer. One reason was that it's unnecessary and the second bigger reason is that it breaks stuff like getting the real client IP. In an AWS or GCE environment you can turn on Proxy Protocol and Traefik supports it but as far as I could understand, k3s due to using simple IP routing based loadbalancing, does not support it and so the only way to get the real client IP for Traefik is if it accepts the incoming IP traffic directly without intermediary steps.

I've found a solution that worked for me:

Using HostPort or HostNetwork: true. But this wasn't easy to get working at all since k3s doesn't seem to propagate the necessary capability NET_BIND_SERVICE if specified so the container can't bind to port 80 or 443 if running as non-root.

Even with LoadBalancer service and externalTrafficPolicy: Local I was not getting the real IPs because of the NAT. I am not sure why. I guess because it still goes through the "svclb-traefik" service?

The solution has the problem that the container has to run as root for it to be able to bind to privileged ports. I had to add the following to the securityContext or the pod would not start:

  runAsNonRoot: false
  runAsUser: 0

It didn't help that half the How-Tos out there are either outdated or simply don't work (e.g. the capabilities problem) and that k3s comes with an outdated version of Traefik (1.7) adds to the confusion. Many places suggest turning on the Proxy Protocol and indeed Traefik supports that but not the Klipper based "Loadbalancer" service.

@arctica Try the config I posted on this issue a while ago,

I think the

securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE

Bit should do it.

@dragon2611 as I wrote in my post unfortunately the NET_BIND_SERVICE is not being propagated all the way when running as non-root from what I can tell. Have you tried this capability while running the pod as non-root? You might be running as root and then the capability is unnecessary anyways.

When I run the Traefik deployment as root:

/ # cat /proc/1/status|grep Cap
CapInh: 0000000000000400
CapPrm: 0000000000000400
CapEff: 0000000000000400
CapBnd: 0000000000000400
CapAmb: 0000000000000000
# capsh --decode=0000000000000400
0x0000000000000400=cap_net_bind_service

When running as non-root:

CapInh: 0000000000000400
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000000000000400
CapAmb: 0000000000000000

Note how the Effective and Permitted capabilities are cleared out (because Ambient is not set).

Unfortunately I've found this bug report for k8s about the problematic capabilities handling that has been open for 3 years and seems no fix in sight: https://github.com/kubernetes/kubernetes/issues/56374

I've also tried to set

sysctls:
- name: net.ipv4.ip_unprivileged_port_start
   value: 443

as part of the securityContext of the deployment but get the following error:

# deployments.apps "traefik" was not valid:
# * : Invalid value: "The edited file failed validation": ValidationError(Deployment.spec.template.spec.containers[0].securityContext): unknown field "sysctls" in io.k8s.api.core.v1.SecurityContext

So really running as root + HostPort seems to be the only solution for me to get the real client IP.

I'm running klipper, calico and istio and I'm always getting the ip address of the servicelb pod.
This definitely looks like an issue in https://github.com/k3s-io/klipper-lb

The solution using the DaemonSet suggested by @dragon2611 works quite well.

Here are some steps to install istio on your k3s cluster and expose the ports 80 and 443.
It will deploy the istio control plane and an istio ingressgateway.

Before you start, please make sure, that you're disabling servicelb.


My k3s server config in /etc/rancher/k3s/config.yaml

datastore-endpoint: "postgres://..."
token: "some-random-server-token"
flannel-backend: 'none'
disable-network-policy: true
disable: [traefik, metrics-server, servicelb]
node-ip: "node ip"
node-external-ip: "public ip"
cluster-cidr: "192.168.0.0/16"
node-label:
  - "node.kubernetes.io/instance-type=c2"
  - "topology.kubernetes.io/region=fra1"
kube-apiserver-arg:
  - "feature-gates=LegacyNodeRoleBehavior=false"

# Install istioctl on your machine (for mac use brew)
brew install istioctl

# Generate the manifest for the istio installation
istioctl manifest generate -f ./istio-operator-profile.yaml > install-istio.yaml

# Verify the install-istio.yaml file and then install it
kubectl apply -f install-istio.yaml


Content of istio-operator-profile.yaml

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: default
  hub: docker.io/istio
  tag: 1.8.1
  components:
    ingressGateways:
    - name: istio-ingressgateway
      enabled: true
      k8s:
        overlays:
          - kind: Deployment
            name: istio-ingressgateway
            patches:
              - path: kind
                value: DaemonSet
              - path: spec.strategy
              - path: spec.updateStrategy
                value:
                  type: RollingUpdate
                  rollingUpdate:
                    maxUnavailable: 1
              - path: spec.template.spec.containers.[name:istio-proxy].ports.[containerPort:8080].hostPort
                value: 80
              - path: spec.template.spec.containers.[name:istio-proxy].ports.[containerPort:8443].hostPort
                value: 443

Result:
image

A DaemonSet or just using a regular Deployment in the end doesn't matter in terms of getting the real client IP. It just determines how many pods there will be and on which nodes they run. DaemonSet + NodeSelector can be a nice choice.

The key point is to remove the Loadbalancer service that is using Klipper* and using HostPort or HostNetwork to directly bind to the external IP of the node.

In the end I settled on a Deployment with HostPort for my Traefik instances and using external-dns to automatically update DNS A records for the domain pointing to the HTTP ingress points.

*: Because it masks all the real IPs due to its NAT and does not support the Proxy Protocol.

Using IPVS with metallb solved the problem to me:

curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=latest sh -s - --disable=servicelb --kube-proxy-arg proxy-mode=ipvs

And install metallb as the documentation stands.

Wow, I think I got the ips working together with servicelb and cilium as CNI together with vxlan (it might work without vxlan too, but that way it's working on digitalocean across nodes). We're also replacing kube-proxy alltogether with cilium.

# Download all CNI plugins to `/opt/cni/bin`
mkdir -p /opt/cni/bin
curl -L -s https://github.com/containernetworking/plugins/releases/download/v0.9.0/cni-plugins-linux-amd64-v0.9.0.tgz | tar zxvf - -C /opt/cni/bin/

# Install k3s, replace node-ip and node-external-ip
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=latest sh -s - --flannel-backend=none --disable-kube-proxy --disable-network-policy --node-ip=10.135.0.1 --node-external-ip=42.140.10.10

# Then install cilium
helm install cilium cilium/cilium --version 1.9.1 \
--namespace=kube-system \
--set k8sServiceHost=localhost \
--set k8sServicePort=6443 \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=strict \
--set tunnel=vxlan \
--set containerRuntime.integration=containerd \
--set containerRuntime.socketPath=/var/run/k3s/containerd/containerd.sock

edit: maybe I was too early. Somehow only one node shows the correct ip. The other one doesn't. Does servicelb sometimes proxy to the pod ip directly instead of the service ip?
edit2: Using externalTrafficPolicy: Local makes that deterministic if there's a container running locally with that service. So a DaemonSet is not completely necessary, but to make it work all the time, one is desired. But the hostPort configs are definitely not needed anymore.

@marcbachmann cilium is an interesting project and quite powerful but it also uses quite a bit more resources than flannel (see https://itnext.io/benchmark-results-of-kubernetes-network-plugins-cni-over-10gbit-s-network-updated-august-2020-6e1b757b9e49)

A LoadBalancer with externalTrafficPolicy: Local should in theory do the trick when accessing a node that has the http service running (it didn't for me and I need to revisit this topic to check why). The problem I have with it is that it'll still run on all nodes (as a DaemonSet) and when tunneling the traffic to another node the IP gets lost again plus external-dns only sees one external IP for it (as seen in kubectl get svc -o wide). The result is an artificial bottleneck of one node getting all traffic and on top during failover scenarios the client IPs getting lost. The client IP getting lost brings also security related issues with it.

Cilium being able to operate on Layer 7 and understanding HTTP could maybe make it possible to have Cilium itself inject a X-Forwarded-For header. I have not looked into this. If one only needs very simple loadbalancing and access controls it might even be OK to forgo Traefik completely and just use what Cilium provides.

For my usecase of a HTTP loadbalancer in a production setup which is supposed to scale, the common implementation of the LoadBalancer service seems a bit counterproductive unless used with a cloudprovider that has a good implementation of it. Too many layers that provide no upside but come with all kinds of limitations. A limited loadbalancer in front of a real loadbalancer so to speak. Hence I ripped it out. For me, hostPort + proper software loadbalancer is superior to a LoadBalancer service like what comes by default in k3s. Traefik is a loadbalancer after all.

@marcbachmann @arctica
If you don't mind I'd like you to test out an implementation I have with k3s primarily influenced by my desire to solve the Real IP LoadBalancer issues I experienced with k3s.
https://github.com/jawabuu/kloud-3s
In my tests, I zeroed into 2 issues

  1. Using svclb never returns Client IP
  2. Embedded Flannel in k3s has - --ip-masq=true
Was this page helpful?
0 / 5 - 0 ratings