Elasticsearch-net: Nest performance issue

Created on 23 May 2016  路  8Comments  路  Source: elastic/elasticsearch-net

NEST/Elasticsearch.Net version: 2.3.1

Elasticsearch version: 2.3.1

Description of the problem including expected versus actual behavior:

0
down vote
favorite
I'm upgrading elasticsearch from 1.6 to 2.3.1 (includes Nest upgrade from 1.6.1 to 2.3.1).

I'm facing a severe performance degradation in aggregation requests. In order to verify it, i have created a console application to compare between the versions.

The performance of Nest 2.3.1 (elastic 2.3.1) is slower by about 100% than 1.6.1 (elastic 1.6). (30 sec to 59 sec)

Then, I have tested the same query directly on elastic 1.6 and 2.3.1 (with Sense) - the performance was similar.

Therefore, as i see it - there is a severe performance issue in Nest (i have tested the most of 2.X Nest versions)

Steps to reproduce:

  1. Create a console application
  2. Create aggregation with many fields, you can see code example:

Provide DebugInformation (if relevant):
Interesting insight:
When I change query from the lambda syntax to string syntax the performance is much better (1 seconds instead of 12 seconds).
(instead of Terms('hi', x=> x.Field(f => f.ProductName))
I do Terms('hi', x=> x.Field("ProductName")))

Most helpful comment

Hey,

Sorry for the late response.
I have found the root cause for the performance degradation.
In Nest 1.6 the method of creation newtonsoft serializer is a static method, in since NEST 2.0 this method is not static.

In our codebase, we create the elastic client connection in every request (we have an abstraction on elastic).
The garbage collector cleans newtonsoft cache and serialization time increase significantly.

We fixed it, thanks!

All 8 comments

Hi @tomsender thanks for opening this.

I've ran tests comparing the following versions

  • 2.3.1 (includes recent cache fix)
  • 2.3.0
  • 1.8.0 (uses Json.Net 8)
  • 1.6.1 (uses Json.Net 7)

and I'm not able to reproduce such a drastic difference -- the numbers are all similar.

Not discrediting your findings at all, but maybe we're not on the same page in terms of how we're testing.

A few things to consider:

  • The timings for the first request sent by NEST should be thrown away and not considered in the comparison. It always takes significantly longer than other requests because the type caches have to be built for the first time.
  • Elasticsearch will cache aggregations, so this might also need to be taken into consideration.

Here's a gist with the tests I am running. Would you mind taking a look and running these as well? It's easily portable between 2.x and 1.x, you just need to change DefaultIndex() to SetDefaultIndex() in 1.x.

When I change query from the lambda syntax to string syntax the performance is much better (1 seconds instead of 12 seconds).

Somewhat expected as the expressions need to be resolved to strings, but they are also cached so subsequent resolutions shouldn't have much impact on performance. The 12s vs 1s probably plays into the initial request as described above.

Oops, seems like GitHub is having some issues and the link to the gist is broken. Pasting here instead:

using Nest;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace NestAggPerformance
{
    public class Test
    {
        public string Field1 { get; set; }
        public string Field2 { get; set; }
        public string Field3 { get; set; }
        public string Field4 { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var client = new ElasticClient(new ConnectionSettings().DefaultIndex("aggperformance"));

            var timings = new List<long>();
            var n = 100;

            for (var i = 0; i < n; i++)
            {
                var stopwatch = Stopwatch.StartNew();
                client.Search<Test>(s => s
                    .Aggregations(a => a
                        .Terms("field1", t => t
                            .Field(f => f.Field1)
                        )
                        .Terms("field2", t => t
                            .Field(f => f.Field2)
                        )
                        .Terms("field3", t => t
                            .Field(f => f.Field3)
                        )
                        .Terms("field4", t => t
                            .Field(f => f.Field4)
                        )
                    )
                );
                stopwatch.Stop();
                if (i > 0) // throw away the first request
                    timings.Add(stopwatch.ElapsedMilliseconds);
            }

            foreach (var timing in timings)
            {
                Console.WriteLine(timing.ToString());
            }
            Console.WriteLine();
            Console.WriteLine($"Total: {timings.Sum()}");
            Console.WriteLine($"Average: {timings.Sum() / (n - 1)}");
            Console.ReadLine();
        }
    }
}

@tomsender are you still able to reproduce this?

Closing. Feel free to re-open @tomsender if this is still an issue.

Hey,

Sorry for the late response.
I have found the root cause for the performance degradation.
In Nest 1.6 the method of creation newtonsoft serializer is a static method, in since NEST 2.0 this method is not static.

In our codebase, we create the elastic client connection in every request (we have an abstraction on elastic).
The garbage collector cleans newtonsoft cache and serialization time increase significantly.

We fixed it, thanks!

Thanks! This helped fix my performance problem too. ElasticClient was being instantiated once per request which added around 150ms per request.

Hey,

Sorry for the late response.
I have found the root cause for the performance degradation.
In Nest 1.6 the method of creation newtonsoft serializer is a static method, in since NEST 2.0 this method is not static.

In our codebase, we create the elastic client connection in every request (we have an abstraction on elastic).
The garbage collector cleans newtonsoft cache and serialization time increase significantly.

We fixed it, thanks!

so can I just make the elastic client single instance to improve the performance?

Was this page helpful?
0 / 5 - 0 ratings