Currently ES store does not use bulk API and issues a request to ES for every span. This is not very efficient, but in combination with transaction log flushing leads to performance issues.
The default setting of index.translog.durability is request which means that there is a fsync after every request. This leads to pretty high IOPS when not using bulk API.
For example it's the main factor limiting throughput on AWS instances with small EBS drives - that was the reason why I was experiencing 60+ seconds of delay in indexing and losing 50%-75% of spans because the internal queue was always full.
One (simple) solution would be to change index.translog.durability to async in the index settings and make it flush asynchronously. In case of a node crash few seconds of data would be lost which might be acceptable in practice.
Another option is to start using bulk API. https://github.com/olivere/elastic/wiki/BulkProcessor can be utilized to avoid writing batching logic by hand.
I am looking into this.
FYI - I've been using index.translog.durability=async and it solved my IOPS problems on AWS. But indexing is still very heavy CPU-wise and bulk is also needed.
I have it almost ready, I did some test to report 300k spans. The builk API takes about 40s and without it 15min.
Sounds great!
Most helpful comment
I have it almost ready, I did some test to report 300k spans. The builk API takes about 40s and without it 15min.