Elasticsearch: Fail put-mapping requests sooner if they will exceed the field number limit

Created on 14 Nov 2018 · 7Comments · Source: elastic/elasticsearch

6.4.1

We had a scenario when an index was already hitting the field number limit (index.mapping.total_fields.limit) and subsequent (high volume) indexing requests attempted to add new fields to this index. As a result, a lot of put_mapping tasks got generated. This caused the cluster
state to be held on in memory and became non-GC-able until these mapping updates eventually got rejected (and the coordinating node ran out of memory).

This is an enhancement request to handle this situation better. Is this something the real memory circuit breaker in 7.0 will help with?

:DistributeCRUD

Source

ppf2

Most helpful comment

This is an enhancement request to handle this situation better. Is this something the real memory circuit breaker in 7.0 will help with?

I think we should try to address the root cause, if possible, it'd be nice if we could check the limit for mappings prior to a put-mapping request being sent to the master. For instance, if the local node's cluster state contains over 1000 fields in the mapping (with the default limit being 1000), we know that even if the cluster state is behind the number of fields cannot decrease, so no need to send an update mapping request to the master node. The request can be rejected without overloading any other node.

dakrone on 14 Nov 2018

👍3

All 7 comments

Pinging @elastic/es-core-infra

elasticmachine on 14 Nov 2018

This is an enhancement request to handle this situation better. Is this something the real memory circuit breaker in 7.0 will help with?

dakrone on 14 Nov 2018

👍3

Hi @dakrone
Based on my understanding theTransportPutMappingAction is handled by the master which only checks for block and then goes ahead submitting a cluster state update task. Do you think it makes sense to reject it at master but before submitting the cluster state update tasks just as we check for blocks. I believe since the update task is serialized on the master and put-mapping has a priority HIGH, processing gets significantly delayed by the PutMappingExecutor(espl in cases when there are pending tasks with priority URGENT) allowing the heap build up. This would help even in cases where the local cluster state was lagging unaware of field limit breach.

Bukhtawar on 18 Apr 2019

Hey @dakrone, I'll be more than happy to work on this PR. Please share your thoughts on the same

Bukhtawar on 22 Apr 2019

@ppf2 @dakrone any thoughts on this?

Bukhtawar on 7 May 2019

I think that the coordinating node no longer runs out of memory due to failed put-mappings calls in versions ≥7.0, so I have updated the title of this issue to reflect the remaining work mentioned in this comment.