Envoy: Rate limiting not working

Created on 15 May 2018 · 11Comments · Source: envoyproxy/envoy

Rate limit service

No request from envoy goes to my rate limit service

Hello, I am having long term issue and I had to give up. As a rate limit service I am using reference implementation lyft rate limit running on localhost.

I recreated config to small files, so I can demonstrate my issue.
This is e.g. my rate limit config:

domain: rate_per_ip
descriptors:
  - key: remote_address
    rate_limit:
      unit: minute
      requests_per_unit: 3

and this is my simple envoy config.yaml

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address: { address: 127.0.0.1, port_value: 9901 }

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 0.0.0.0, port_value: 10000 }
    filter_chains:
    - filters:

      - name: envoy.http_connection_manager
        config:
          use_remote_address: true 
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              rate_limits:
                - stage: 0
                  actions:
                     - remote_address: {}
              routes:
                - match: { prefix: "/" }
                  route:
                       host_rewrite: www.google.com
                       cluster: service_google

          http_filters:
          - name: envoy.rate_limit
            config:
                stage: 0
                domain: rate_per_ip
          - name: envoy.router

  clusters:
  - name: rate_limit_cluster
    type: STATIC
    connect_timeout: 0.25s
    lb_policy: ROUND_ROBIN
    hosts: [{ socket_address: { address: 127.0.0.1, port_value: 8081 }}]

  - name: service_google
    connect_timeout: 0.25s
    type: LOGICAL_DNS
    # Comment out the following line to test on v6 networks
    dns_lookup_family: V4_ONLY
    lb_policy: ROUND_ROBIN
    hosts: [{ socket_address: { address: google.com, port_value: 443 }}]
    tls_context: { sni: www.google.com }


rate_limit_service:
    grpc_service:
        envoy_grpc:
            cluster_name: rate_limit_cluster
        timeout: 0.25s

I intend it to do simple rate limit on ip address and then redirect me to google.com, but it never even tries to access my rate limit service.
I spent literally hours in docs, so sorry if I miss some newbie thing.
Thanks for any response.

question

Source

HappyStoic

Most helpful comment

@HappyStoic ah! actually just realized what might be the problem. Thanks for the data, it helped me realize this. The ratelimit cluster is not specified as an h2 cluster. Sorry I missed it in my original read of the config.

You want to add http2_protocol_options: {} to your cluster definition. Like:

  - name: rate_limit_cluster
    type: STATIC
    connect_timeout: 0.25s
    lb_policy: ROUND_ROBIN
    hosts: [{ socket_address: { address: 127.0.0.1, port_value: 8081 }}]
    http2_protocol_options: {}

junr03 on 16 May 2018

👍7 ❤4

All 11 comments

@HappyStoic from your config, I dont see anything odd. You say but it never even tries to access my rate limit service what stats are you looking at to assert this? Can you start up your setup, send a request, and then print out a dump of the stats and cluster output?

junr03 on 15 May 2018

@junr03 Yeah, I can run this setup and my request to localhost:10000 really gets me to google, however I am not limited with 3requests per minute. Also debug log of rate limit service localhost:6070/stats says that my domain rate_per_ip has 0 hits.
I even tried to redirect envoy limitrate cluster to special port on which I prepared socat tool (sending request to real ratelimit port) to monitor requests and nothing ever goes there from envoy (I expected atleast some non readable data since it is gRpc)

HappyStoic on 15 May 2018

@HappyStoic sorry what I meant by my last question Can you start up your setup, send a request, and then print out a dump of the stats and cluster output? was if you could do that and post here the output of your envoy admin stats and cluster dump.

junr03 on 15 May 2018

Well this lead me to finding my point of failure. Since it was docker container I had to correct addresses in config to get to host machine. Nevertheless I cannot deal now with fact, that envoy and lyft rate limit service seem to use default different protocols?

viz command line logs on rate limit:
2018/05/16 15:34:07 transport: http2Server.HandleStreams received bogus greeting from client: "POST /pb.lyft.ratelimit."

and on envoy docker container:
[2018-05-16 15:34:07.097][13][info][client] source/common/http/codec_client.cc:117] [C145] protocol error: http/1.1 protocol error: HPE_INVALID_CONSTANT

/clusters

rate_limit_cluster::default_priority::max_connections::1024
rate_limit_cluster::default_priority::max_pending_requests::1024
rate_limit_cluster::default_priority::max_requests::1024
rate_limit_cluster::default_priority::max_retries::3
rate_limit_cluster::high_priority::max_connections::1024
rate_limit_cluster::high_priority::max_pending_requests::1024
rate_limit_cluster::high_priority::max_requests::1024
rate_limit_cluster::high_priority::max_retries::3
rate_limit_cluster::added_via_api::false
rate_limit_cluster::10.0.4.75:8081::cx_active::0
rate_limit_cluster::10.0.4.75:8081::cx_connect_fail::15
rate_limit_cluster::10.0.4.75:8081::cx_total::96
rate_limit_cluster::10.0.4.75:8081::rq_active::0
rate_limit_cluster::10.0.4.75:8081::rq_error::96
rate_limit_cluster::10.0.4.75:8081::rq_success::0
rate_limit_cluster::10.0.4.75:8081::rq_timeout::0
rate_limit_cluster::10.0.4.75:8081::rq_total::81
rate_limit_cluster::10.0.4.75:8081::health_flags::healthy
rate_limit_cluster::10.0.4.75:8081::weight::1
rate_limit_cluster::10.0.4.75:8081::region::
rate_limit_cluster::10.0.4.75:8081::zone::
rate_limit_cluster::10.0.4.75:8081::sub_zone::
rate_limit_cluster::10.0.4.75:8081::canary::false
rate_limit_cluster::10.0.4.75:8081::success_rate::-1

/stats

cluster.rate_limit_cluster.bind_errors: 0
cluster.rate_limit_cluster.internal.upstream_rq_503: 96
cluster.rate_limit_cluster.internal.upstream_rq_5xx: 96
cluster.rate_limit_cluster.lb_healthy_panic: 0
cluster.rate_limit_cluster.lb_local_cluster_not_ok: 0
cluster.rate_limit_cluster.lb_recalculate_zone_structures: 0
cluster.rate_limit_cluster.lb_subsets_active: 0
cluster.rate_limit_cluster.lb_subsets_created: 0
cluster.rate_limit_cluster.lb_subsets_fallback: 0
cluster.rate_limit_cluster.lb_subsets_removed: 0
cluster.rate_limit_cluster.lb_subsets_selected: 0
cluster.rate_limit_cluster.lb_zone_cluster_too_small: 0
cluster.rate_limit_cluster.lb_zone_no_capacity_left: 0
cluster.rate_limit_cluster.lb_zone_number_differs: 0
cluster.rate_limit_cluster.lb_zone_routing_all_directly: 0
cluster.rate_limit_cluster.lb_zone_routing_cross_zone: 0
cluster.rate_limit_cluster.lb_zone_routing_sampled: 0
cluster.rate_limit_cluster.max_host_weight: 0
cluster.rate_limit_cluster.membership_change: 1
cluster.rate_limit_cluster.membership_healthy: 1
cluster.rate_limit_cluster.membership_total: 1
cluster.rate_limit_cluster.retry_or_shadow_abandoned: 0
cluster.rate_limit_cluster.update_attempt: 0
cluster.rate_limit_cluster.update_empty: 0
cluster.rate_limit_cluster.update_failure: 0
cluster.rate_limit_cluster.update_no_rebuild: 0
cluster.rate_limit_cluster.update_success: 0
cluster.rate_limit_cluster.upstream_cx_active: 0
cluster.rate_limit_cluster.upstream_cx_close_notify: 0
cluster.rate_limit_cluster.upstream_cx_connect_attempts_exceeded: 0
cluster.rate_limit_cluster.upstream_cx_connect_fail: 15
cluster.rate_limit_cluster.upstream_cx_connect_timeout: 0
cluster.rate_limit_cluster.upstream_cx_destroy: 0
cluster.rate_limit_cluster.upstream_cx_destroy_local: 0
cluster.rate_limit_cluster.upstream_cx_destroy_local_with_active_rq: 81
cluster.rate_limit_cluster.upstream_cx_destroy_remote: 0
cluster.rate_limit_cluster.upstream_cx_destroy_remote_with_active_rq: 0
cluster.rate_limit_cluster.upstream_cx_destroy_with_active_rq: 81
cluster.rate_limit_cluster.upstream_cx_http1_total: 96
cluster.rate_limit_cluster.upstream_cx_http2_total: 0
cluster.rate_limit_cluster.upstream_cx_idle_timeout: 0
cluster.rate_limit_cluster.upstream_cx_max_requests: 0
cluster.rate_limit_cluster.upstream_cx_none_healthy: 0
cluster.rate_limit_cluster.upstream_cx_overflow: 0
cluster.rate_limit_cluster.upstream_cx_protocol_error: 81
cluster.rate_limit_cluster.upstream_cx_rx_bytes_buffered: 0
cluster.rate_limit_cluster.upstream_cx_rx_bytes_total: 1184
cluster.rate_limit_cluster.upstream_cx_total: 96
cluster.rate_limit_cluster.upstream_cx_tx_bytes_buffered: 0
cluster.rate_limit_cluster.upstream_cx_tx_bytes_total: 25839
cluster.rate_limit_cluster.upstream_flow_control_backed_up_total: 0
cluster.rate_limit_cluster.upstream_flow_control_drained_total: 0
cluster.rate_limit_cluster.upstream_flow_control_paused_reading_total: 0
cluster.rate_limit_cluster.upstream_flow_control_resumed_reading_total: 0
cluster.rate_limit_cluster.upstream_rq_503: 96
cluster.rate_limit_cluster.upstream_rq_5xx: 96
cluster.rate_limit_cluster.upstream_rq_active: 0
cluster.rate_limit_cluster.upstream_rq_cancelled: 0
cluster.rate_limit_cluster.upstream_rq_maintenance_mode: 0
cluster.rate_limit_cluster.upstream_rq_pending_active: 0
cluster.rate_limit_cluster.upstream_rq_pending_failure_eject: 15
cluster.rate_limit_cluster.upstream_rq_pending_overflow: 0
cluster.rate_limit_cluster.upstream_rq_pending_total: 96
cluster.rate_limit_cluster.upstream_rq_per_try_timeout: 0
cluster.rate_limit_cluster.upstream_rq_retry: 0
cluster.rate_limit_cluster.upstream_rq_retry_overflow: 0
cluster.rate_limit_cluster.upstream_rq_retry_success: 0
cluster.rate_limit_cluster.upstream_rq_rx_reset: 0
cluster.rate_limit_cluster.upstream_rq_timeout: 0
cluster.rate_limit_cluster.upstream_rq_total: 81
cluster.rate_limit_cluster.upstream_rq_tx_reset: 0
cluster.rate_limit_cluster.version: 0

I don't want to be debugging here, but envoy docs do not say anything about stating protocols on rate limit service, gRpc is the only one supported, right?

HappyStoic on 16 May 2018

You want to add http2_protocol_options: {} to your cluster definition. Like:

  - name: rate_limit_cluster
    type: STATIC
    connect_timeout: 0.25s
    lb_policy: ROUND_ROBIN
    hosts: [{ socket_address: { address: 127.0.0.1, port_value: 8081 }}]
    http2_protocol_options: {}

junr03 on 16 May 2018

👍7 ❤4

@junr03 Yeah, that was the problem! Thank you very much.

HappyStoic on 18 May 2018

Hi HappyStoic

I am trying to achieve Rate limiting functionality. I am little confuse on the below config. The sample configuration you provided (as below), where this should will go?

domain: rate_per_ip
descriptors:

key: remote_address
rate_limit:
unit: minute
requests_per_unit: 3

Can you please share config?

Thanks
Asisranjan Nayak

Asisranjan on 4 Sep 2018

👍2

Hi HappyStoic

I am trying to achieve Rate limiting functionality. I am little confuse on the below config. The sample configuration you provided (as below), where this should will go?

domain: rate_per_ip
descriptors:

key: remote_address
rate_limit:
unit: minute
requests_per_unit: 3

Can you please share config?

Thanks
Asisranjan Nayak

I have this exact problem, is it solved yet? where the configuration should go

ggalihpp on 18 Oct 2018

Hi HappyStoic

I am trying to achieve Rate limiting functionality. I am little confuse on the below config. The sample configuration you provided (as below), where this should will go?

domain: rate_per_ip
descriptors:

key: remote_address
rate_limit:
unit: minute
requests_per_unit: 3

Can you please share config?

Thanks
Asisranjan Nayak

Hi, having the same problems. Could not understand, how to combine those configs.
Please someone share the example of configuring the rate limiter. There is nothing helpful in google regarding this.
Thanks!

gleb-s on 25 Oct 2018

@Asisranjan @ggalihpp @gleb-s
Hi guys, sorry for my late response.
The configuration you're mentioning is configuration of your rate limit service (service the envoy connects to). In this case reference implementation by lift. You can see how to proceed with configuration in their documentation.

Basically if you start the rate limit service with provided steps in Building and testing section, the configuration is supposed to be in /home/user/src/runtime/data/ratelimit. I hope I'm not mistaken, it's been a long time and I don't have this setup available anymore.

I hope I helped atleast a little. :)

HappyStoic on 25 Oct 2018

I documented wiring up a minimal working example of this https://medium.com/dm03514-tech-blog/sre-resiliency-bolt-on-sidecar-rate-limiting-with-envoy-sidecar-5381bd4a1137

This issue was crucial in getting the information necessary! Thank you!