Loki: ResourceExhausted desc = grpc: received message larger than max

Created on 29 Jun 2020 · 8Comments · Source: grafana/loki

Describe the bug
Trying to push chunk of logs from FluentD to Loki. Fluentd put logs in buffer and on buffer flush Loki is refusing to receive.
Is there some chunk size limitation in Loki? Is that size adjustable?

To Reproduce
Send chunk with more than 4194304 bytes in size

Expected behavior
Loki receives all messages.

Environment:

Infrastructure: Kubernetes
Deployment tool: helm

Screenshots, Promtail config, or terminal output
Loki logs:

020-06-27T07:04:34.242790819Z level=warn ts=2020-06-27T07:04:34.234793259Z caller=logging.go:49 traceID=64131507cd842576 msg="POST /loki/api/v1/push (500) 169.780175ms Response: \"rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5229930 vs. 4194304)\\n\" ws: false; Accept: */*; Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3; Content-Length: 5537209; Content-Type: application/json; User-Agent: Ruby; "
2020-06-27T07:04:36.914846213Z level=warn ts=2020-06-27T07:04:36.914619292Z caller=logging.go:49 traceID=7390203342d5661a msg="POST /loki/api/v1/push (500) 170.021947ms Response: \"rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5229930 vs. 4194304)\\n\" ws: false; Accept: */*; Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3; Content-Length: 5537209; Content-Type: application/json; User-Agent: Ruby; "
2020-06-27T07:04:39.907138126Z level=warn ts=2020-06-27T07:04:39.906926876Z caller=logging.go:49 traceID=2779ac80c54cbfa7 msg="POST /loki/api/v1/push (500) 249.833406ms Response: \"rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5229930 vs. 4194304)\\n\" ws: false; Accept: */*; Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3; Content-Length: 5537209; Content-Type: application/json; User-Agent: Ruby; "

Loki config

uth_enabled: false                                                                                                                                                              
chunk_store_config:                                                                                                                                                              
  max_look_back_period: 0s                                                                                                                                                       
ingester:                                                                                                                                                                        
  chunk_block_size: 262144                                                                                                                                                       
  chunk_idle_period: 3m                                                                                                                                                          
  chunk_retain_period: 1m                                                                                                                                                        
  lifecycler:                                                                                                                                                                    
    ring:                                                                                                                                                                        
      kvstore:                                                                                                                                                                   
        store: inmemory                                                                                                                                                          
      replication_factor: 1                                                                                                                                                      
  max_transfer_retries: 0                                                                                                                                                        
limits_config:                                                                                                                                                                   
  enforce_metric_name: false                                                                                                                                                     
  reject_old_samples: true                                                                                                                                                       
  reject_old_samples_max_age: 168h                                                                                                                                               
schema_config:                                                                                                                                                                   
  configs:                                                                                                                                                                       
  - from: "2018-04-15"                                                                                                                                                           
    index:                                                                                                                                                                       
      period: 168h                                                                                                                                                               
      prefix: index_                                                                                                                                                             
    object_store: filesystem                                                                                                                                                     
    schema: v9                                                                                                                                                                   
    store: boltdb                                                                                                                                                                
server:                                                                                                                                                                          
  http_listen_port: 3100                                                                                                                                                         
storage_config:                                                                                                                                                                  
  boltdb:                                                                                                                                                                        
    directory: /data/loki/index                                                                                                                                                  
  filesystem:                                                                                                                                                                    
    directory: /data/loki/chunks                                                                                                                                                 
table_manager:                                                                                                                                                                   
  retention_deletes_enabled: true                                                                                                                                                
  retention_period: 336h

Source

nomatterz

Most helpful comment

@nomatterz Could you please show your updated config file of loki and fluentd both.?I am facing the same issue. Thanks a lot

sreyasvpariyath on 14 Jul 2020

👍3

All 8 comments

You could try to increase those limits here https://github.com/grafana/loki/tree/master/docs/configuration#grpc_client_config and here https://github.com/grafana/loki/tree/master/docs/configuration#server_config

cyriltovena on 29 Jun 2020

@cyriltovena Thank you!
Are there any recommendations for this value grpc_server_max_recv_msg_size? Or I can put whatever I want here? Could its change affect another parts of Loki and therefore they need to be adjusted too?

nomatterz on 30 Jun 2020

You should be fine increasing this. Alternatively you could look at controlling the buffer sizes that fluent uses before flushing and aligning it to this config. You'll want to make sure that the send/receive max sizes are aligned across the grpc_client_config and the server_config.

We inherit some of these from our upstream dependency Cortex, but it looks like they don't align automatically.
@cyriltovena @slim-bean WDYT, should we align these defaults or PR Cortex to do so?

owen-d on 30 Jun 2020

👍1

@owen-d
In case of FluentD is log shipper to Loki as I understand FluentD is always grpc client and Loki is the server. The same in Grafana-Loki chain (Loki is the server). Am I right?
So I wonder why do we need to adjust grpc_client_config and not just server_config?

Also I've checked defaults:

# The maximum size in bytes the client can receive
[max_recv_msg_size: <int> | default = 104857600]

# The maximum size in bytes the client can send
[max_send_msg_size: <int> | default = 16777216]
---
# Max gRPC message size that can be received
[grpc_server_max_recv_msg_size: <int> | default = 4194304]

# Max gRPC message size that can be sent
[grpc_server_max_send_msg_size: <int> | default = 4194304]

By aligning do you mean making them equal? Didn't really get this...

Thank you!

nomatterz on 1 Jul 2020

We vendor another project, Owen was talking about this.

Fluentd is actually using http, but that server is also used internally between components running in the same process so yeah I recommend you change them all.

cyriltovena on 1 Jul 2020

As long as fluent isn't sending payloads larger than the server's grpc_server_max_send_msg_size, it should be fine. If you still see similar errors, I'd make sure that the client/server sizes are equal (what I meant by aligned). This is because under the hood, Loki's separate components talk to itself within the same process via grpc (I doubt you'd see this problem though).

Good luck!

owen-d on 1 Jul 2020

Thank you guys!

Your explanations are much appreciated.

For now i've increased grpc_server_max_recv_msg_size and grpc_server_max_send_msg_size to 8MB and these errors are gone. Will see...

nomatterz on 1 Jul 2020

@nomatterz Could you please show your updated config file of loki and fluentd both.?I am facing the same issue. Thanks a lot

sreyasvpariyath on 14 Jul 2020

👍3

Was this page helpful?

0 / 5 - 0 ratings