Loki: ResourceExhausted desc = grpc: received message larger than max

Created on 29 Jun 2020  路  8Comments  路  Source: grafana/loki

Describe the bug
Trying to push chunk of logs from FluentD to Loki. Fluentd put logs in buffer and on buffer flush Loki is refusing to receive.
Is there some chunk size limitation in Loki? Is that size adjustable?

To Reproduce
Send chunk with more than 4194304 bytes in size

Expected behavior
Loki receives all messages.

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: helm

Screenshots, Promtail config, or terminal output
Loki logs:

020-06-27T07:04:34.242790819Z level=warn ts=2020-06-27T07:04:34.234793259Z caller=logging.go:49 traceID=64131507cd842576 msg="POST /loki/api/v1/push (500) 169.780175ms Response: \"rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5229930 vs. 4194304)\\n\" ws: false; Accept: */*; Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3; Content-Length: 5537209; Content-Type: application/json; User-Agent: Ruby; "
2020-06-27T07:04:36.914846213Z level=warn ts=2020-06-27T07:04:36.914619292Z caller=logging.go:49 traceID=7390203342d5661a msg="POST /loki/api/v1/push (500) 170.021947ms Response: \"rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5229930 vs. 4194304)\\n\" ws: false; Accept: */*; Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3; Content-Length: 5537209; Content-Type: application/json; User-Agent: Ruby; "
2020-06-27T07:04:39.907138126Z level=warn ts=2020-06-27T07:04:39.906926876Z caller=logging.go:49 traceID=2779ac80c54cbfa7 msg="POST /loki/api/v1/push (500) 249.833406ms Response: \"rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5229930 vs. 4194304)\\n\" ws: false; Accept: */*; Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3; Content-Length: 5537209; Content-Type: application/json; User-Agent: Ruby; "

Loki config

uth_enabled: false                                                                                                                                                              
chunk_store_config:                                                                                                                                                              
  max_look_back_period: 0s                                                                                                                                                       
ingester:                                                                                                                                                                        
  chunk_block_size: 262144                                                                                                                                                       
  chunk_idle_period: 3m                                                                                                                                                          
  chunk_retain_period: 1m                                                                                                                                                        
  lifecycler:                                                                                                                                                                    
    ring:                                                                                                                                                                        
      kvstore:                                                                                                                                                                   
        store: inmemory                                                                                                                                                          
      replication_factor: 1                                                                                                                                                      
  max_transfer_retries: 0                                                                                                                                                        
limits_config:                                                                                                                                                                   
  enforce_metric_name: false                                                                                                                                                     
  reject_old_samples: true                                                                                                                                                       
  reject_old_samples_max_age: 168h                                                                                                                                               
schema_config:                                                                                                                                                                   
  configs:                                                                                                                                                                       
  - from: "2018-04-15"                                                                                                                                                           
    index:                                                                                                                                                                       
      period: 168h                                                                                                                                                               
      prefix: index_                                                                                                                                                             
    object_store: filesystem                                                                                                                                                     
    schema: v9                                                                                                                                                                   
    store: boltdb                                                                                                                                                                
server:                                                                                                                                                                          
  http_listen_port: 3100                                                                                                                                                         
storage_config:                                                                                                                                                                  
  boltdb:                                                                                                                                                                        
    directory: /data/loki/index                                                                                                                                                  
  filesystem:                                                                                                                                                                    
    directory: /data/loki/chunks                                                                                                                                                 
table_manager:                                                                                                                                                                   
  retention_deletes_enabled: true                                                                                                                                                
  retention_period: 336h 

Most helpful comment

@nomatterz Could you please show your updated config file of loki and fluentd both.?I am facing the same issue. Thanks a lot

All 8 comments

@cyriltovena Thank you!
Are there any recommendations for this value grpc_server_max_recv_msg_size? Or I can put whatever I want here? Could its change affect another parts of Loki and therefore they need to be adjusted too?

You should be fine increasing this. Alternatively you could look at controlling the buffer sizes that fluent uses before flushing and aligning it to this config. You'll want to make sure that the send/receive max sizes are aligned across the grpc_client_config and the server_config.

We inherit some of these from our upstream dependency Cortex, but it looks like they don't align automatically.
@cyriltovena @slim-bean WDYT, should we align these defaults or PR Cortex to do so?

@owen-d
In case of FluentD is log shipper to Loki as I understand FluentD is always grpc client and Loki is the server. The same in Grafana-Loki chain (Loki is the server). Am I right?
So I wonder why do we need to adjust grpc_client_config and not just server_config?

Also I've checked defaults:

# The maximum size in bytes the client can receive
[max_recv_msg_size: <int> | default = 104857600]

# The maximum size in bytes the client can send
[max_send_msg_size: <int> | default = 16777216]
---
# Max gRPC message size that can be received
[grpc_server_max_recv_msg_size: <int> | default = 4194304]

# Max gRPC message size that can be sent
[grpc_server_max_send_msg_size: <int> | default = 4194304]

By aligning do you mean making them equal? Didn't really get this...

Thank you!

We vendor another project, Owen was talking about this.

Fluentd is actually using http, but that server is also used internally between components running in the same process so yeah I recommend you change them all.

As long as fluent isn't sending payloads larger than the server's grpc_server_max_send_msg_size, it should be fine. If you still see similar errors, I'd make sure that the client/server sizes are equal (what I meant by aligned). This is because under the hood, Loki's separate components talk to itself within the same process via grpc (I doubt you'd see this problem though).

Good luck!

Thank you guys!

Your explanations are much appreciated.

For now i've increased grpc_server_max_recv_msg_size and grpc_server_max_send_msg_size to 8MB and these errors are gone. Will see...

@nomatterz Could you please show your updated config file of loki and fluentd both.?I am facing the same issue. Thanks a lot

Was this page helpful?
0 / 5 - 0 ratings

Related issues

setevoy2 picture setevoy2  路  4Comments

gouthamve picture gouthamve  路  4Comments

Mario-Hofstaetter picture Mario-Hofstaetter  路  4Comments

steven-sheehy picture steven-sheehy  路  4Comments

SuperQ picture SuperQ  路  5Comments