Ambassador: add support for envoy's TcpKeepalive

Created on 30 Oct 2019  路  7Comments  路  Source: datawire/ambassador

Please describe your use case / problem.
Hi,
we are seeing some stale connections on our k8s cluster, between ambassador and upstream services, not a lot of, but that affects SLA. Root cause of probably that something in the middle (conntrack. ipvs) just lost connection and when envoy want to use that removed connection we got RST.

Describe the solution you'd like
Enable keep_alive socket options in envoy.

Describe alternatives you've considered
no alternatives

stale

Most helpful comment

I just added global config support for keepalive :)

How you can use keepalive, as global configuration

---
apiVersion: v1
kind: Service
metadata:
  labels:
    service: ambassador
  name: ambassador
  annotations:
    getambassador.io/config: |
      ---
        apiVersion: ambassador/v1
        kind:  Module
        name:  ambassador
        config:
          keepalive:
            time: 2
            interval: 2
            probes: 100
spec:
  type: ClusterIP
  ports:
    - port: 443
      name: ambassador-https
      targetPort: 8443
  selector:
    service: ambassador

or in per service basis

apiVersion: ambassador/v1
kind: Mapping
name: tour-backend_mapping
connect_timeout_ms: 3000
prefix: /backend/
service: tour:8080
labels:
  ambassador:
    - request_label:
      - backend
keepalive:
  time: 10
  interval: 1
  probes: 100

i see expected configuration in envoy configuration and also underlying TCP connection is sending ACKs to the upstream

22:59:26.486121 IP tour.default.svc.cluster.local.8080 > ambassador-6f47699486-2wwtv.36200: Flags [.], ack 1, win 243, options [nop,nop,TS val 3058661490 ecr 707636676], length 0
22:59:28.278315 IP ambassador-6f47699486-2wwtv.36206 > tour.default.svc.cluster.local.8080: Flags [.], ack 1, win 237, options [nop,nop,TS val 707654864 ecr 3058661234], length 0
22:59:28.278411 IP tour.default.svc.cluster.local.8080 > ambassador-6f47699486-2wwtv.36206: Flags [.], ack 1, win 235, options [nop,nop,TS val 3058663282 ecr 707636452], length 0
22:59:28.534090 IP ambassador-6f47699486-2wwtv.36200 > tour.default.svc.cluster.local.8080: Flags [.], ack 1, win 245, options [nop,nop,TS val 707655119 ecr 3058661490], length 0
22:59:28.534206 IP tour.default.svc.cluster.local.8080 > ambassador-6f47699486-2wwtv.36200: Flags [.], ack 1, win 243, options [nop,nop,TS val 3058663538 ecr 707636676], length 0
22:59:28.534314 IP ambassador-6f47699486-2wwtv.36152 > tour.default.svc.cluster.local.8080: Flags [.], ack 1, win 237, options [nop,nop,TS val 707655120 ecr 3058661490], length 0
22:59:28.534395 IP tour.default.svc.cluster.local.8080 > ambassador-6f47699486-2wwtv.36152: Flags [.], ack 1, win 235, options [nop,nop,TS val 3058663538 ecr 707630585], length 0
22:59:30.326440 IP ambassador-6f47699486-2wwtv.36206 > tour.default.svc.cluster.local.8080: Flags [.], ack 1, win 237, options [nop,nop,TS val 707656912 ecr 3058663282], length 0
22:59:30.326548 IP tour.default.svc.cluster.local.8080 > ambassador-6f47699486-2wwtv.36206: Flags [.], ack 1, win 235, options [nop,nop,TS val 3058665330 ecr 707636452], length 0
22:59:30.582205 IP ambassador-6f47699486-2wwtv.36200 > tour.default.svc.cluster.local.8080: Flags [.], ack 1, win 245, options [nop,nop,TS val 707657167 ecr 3058663538], length 0
22:59:30.582207 IP ambassador-6f47699486-2wwtv.36152 > tour.default.svc.cluster.local.8080: Flags [.], ack 1, win 237, options [nop,nop,TS val 707657167 ecr 3058663538], length 0
22:59:30.582288 IP tour.default.svc.cluster.local.8080 > ambassador-6f47699486-2wwtv.36200: Flags [.], ack 1, win 243, options [nop,nop,TS val 3058665586 ecr 707636676], length 0
22:59:30.582288 IP tour.default.svc.cluster.local.8080 > ambassador-6f47699486-2wwtv.36152: Flags [.], ack 1, win 235, options [nop,nop,TS val 3058665586 ecr 707630585], length 0

鉂わ笍 鉂わ笍 鉂わ笍 鉂わ笍

All 7 comments

We also need TCP keepalive on upstream connections.

I've tested the PR and it works. What we would really prefer additionally is a means to enable keepalives without having to specify these values so kernel defaults will apply, and to be able to configure keepalives as a global default and not (necessarily) on each mapping individually.

I just added global config support for keepalive :)

How you can use keepalive, as global configuration

---
apiVersion: v1
kind: Service
metadata:
  labels:
    service: ambassador
  name: ambassador
  annotations:
    getambassador.io/config: |
      ---
        apiVersion: ambassador/v1
        kind:  Module
        name:  ambassador
        config:
          keepalive:
            time: 2
            interval: 2
            probes: 100
spec:
  type: ClusterIP
  ports:
    - port: 443
      name: ambassador-https
      targetPort: 8443
  selector:
    service: ambassador

or in per service basis

apiVersion: ambassador/v1
kind: Mapping
name: tour-backend_mapping
connect_timeout_ms: 3000
prefix: /backend/
service: tour:8080
labels:
  ambassador:
    - request_label:
      - backend
keepalive:
  time: 10
  interval: 1
  probes: 100

i see expected configuration in envoy configuration and also underlying TCP connection is sending ACKs to the upstream

22:59:26.486121 IP tour.default.svc.cluster.local.8080 > ambassador-6f47699486-2wwtv.36200: Flags [.], ack 1, win 243, options [nop,nop,TS val 3058661490 ecr 707636676], length 0
22:59:28.278315 IP ambassador-6f47699486-2wwtv.36206 > tour.default.svc.cluster.local.8080: Flags [.], ack 1, win 237, options [nop,nop,TS val 707654864 ecr 3058661234], length 0
22:59:28.278411 IP tour.default.svc.cluster.local.8080 > ambassador-6f47699486-2wwtv.36206: Flags [.], ack 1, win 235, options [nop,nop,TS val 3058663282 ecr 707636452], length 0
22:59:28.534090 IP ambassador-6f47699486-2wwtv.36200 > tour.default.svc.cluster.local.8080: Flags [.], ack 1, win 245, options [nop,nop,TS val 707655119 ecr 3058661490], length 0
22:59:28.534206 IP tour.default.svc.cluster.local.8080 > ambassador-6f47699486-2wwtv.36200: Flags [.], ack 1, win 243, options [nop,nop,TS val 3058663538 ecr 707636676], length 0
22:59:28.534314 IP ambassador-6f47699486-2wwtv.36152 > tour.default.svc.cluster.local.8080: Flags [.], ack 1, win 237, options [nop,nop,TS val 707655120 ecr 3058661490], length 0
22:59:28.534395 IP tour.default.svc.cluster.local.8080 > ambassador-6f47699486-2wwtv.36152: Flags [.], ack 1, win 235, options [nop,nop,TS val 3058663538 ecr 707630585], length 0
22:59:30.326440 IP ambassador-6f47699486-2wwtv.36206 > tour.default.svc.cluster.local.8080: Flags [.], ack 1, win 237, options [nop,nop,TS val 707656912 ecr 3058663282], length 0
22:59:30.326548 IP tour.default.svc.cluster.local.8080 > ambassador-6f47699486-2wwtv.36206: Flags [.], ack 1, win 235, options [nop,nop,TS val 3058665330 ecr 707636452], length 0
22:59:30.582205 IP ambassador-6f47699486-2wwtv.36200 > tour.default.svc.cluster.local.8080: Flags [.], ack 1, win 245, options [nop,nop,TS val 707657167 ecr 3058663538], length 0
22:59:30.582207 IP ambassador-6f47699486-2wwtv.36152 > tour.default.svc.cluster.local.8080: Flags [.], ack 1, win 237, options [nop,nop,TS val 707657167 ecr 3058663538], length 0
22:59:30.582288 IP tour.default.svc.cluster.local.8080 > ambassador-6f47699486-2wwtv.36200: Flags [.], ack 1, win 243, options [nop,nop,TS val 3058665586 ecr 707636676], length 0
22:59:30.582288 IP tour.default.svc.cluster.local.8080 > ambassador-6f47699486-2wwtv.36152: Flags [.], ack 1, win 235, options [nop,nop,TS val 3058665586 ecr 707630585], length 0

鉂わ笍 鉂わ笍 鉂わ笍 鉂わ笍

added documentation.

I would normally ask for a test, but I'm not coming up with any simple way to write that test, so I'm gonna go ahead and accept it. 馃槀

thank you @kflynn - I promise that before next PR I will prepare my local test env :)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings