Elasticsearch-net: Performance hit in NEST 6.x

Created on 4 May 2018  路  14Comments  路  Source: elastic/elasticsearch-net

NEST/Elasticsearch.Net version:
5.6.2
6.1.0

Elasticsearch version:
5.5.1
6.2.3

Description of the problem including expected versus actual behavior:
In the process of upgrading our Elasticsearch setup from version 5.x to 6.x we have noticed a performance impact that translates into slower response time in our service that uses Elasticsearch as data storage. The slower response is mostly visible when dealing with large documents with complex structures e.g. geojson.

The two Elasticsearch clusters performs equally when we are testing them directly via Apache ab so I guess the performance hit is related to the NEST library.

I have done some performance testing using BenchmarkDotNet (on a simplified implementation described further down) where NEST v. 6 has a noticeable higher response compared to v. 5.
I hope you can shed some light on why we are seeing this behaviour.
Thanks

Method | Mean | StdDev |
------- |---------:|---------:|
NEST 5 | 73.11 ms | 2.558 ms |
NEST 6 | 92.15 ms | 3.054 ms |

Steps to reproduce:

  1. Add an index to Elasticsearch with this mapping
{
  "properties": {
    "boundaries": {
      "properties": {
        "boundary": {
          "type": "geo_shape"
        },
        "type": {
          "type": "text",
          "index": false
        }
      }
    }
  }
}
  1. Insert document https://gist.github.com/kaaresylow/ece0d2dd88175653fb4fa44710297d14

3.

var connectionPool = new SingleNodeConnectionPool(new Uri("url"));
var settings = new ConnectionSettings(connectionPool).EnableHttpCompression();
var client = new ElasticClient(settings);
var response = client.Get<Document>(1, idx => idx
    .Index("documents")
    .Type("documentmodel"));
public class Document
    {
        public IEnumerable<PlaceBoundary> Boundaries { get; set; }
    }

    public class PlaceBoundary
    {
        public string Type { get; set; }

        public GeoJsonBoundary Boundary { get; set; }
    }

    public class GeoJsonBoundary
    {
        public string Type { get; set; }

        public object Coordinates { get; set; }

        public object Bbox { get; set; }

        public object Crs { get; set; }
    }
Performance

Most helpful comment

Just a quick heads up that we expect to switch to Utf8Json as the default internal serializer with 7.0.

One nice thing that @russcam found is that the hand off of the stream to the user defined serializer is now much much faster.

All 14 comments

Thanks for opening @kaaresylow. I'm keen to dive into this but I'm unlikely to get around to it until next week.

@kaaresylow I've put together a benchmark of NEST 5.6.2 against 6.1.0, using an InMemoryConnection that returns the response JSON that you posted in the gist. I've rewritten the NEST assemblies and their immediate dependencies to allow them both to be referenced in the same project.

The test is pretty simple

```c#
using System;
using System.Collections.Generic;
using System.IO;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Environments;
using BenchmarkDotNet.Horology;
using BenchmarkDotNet.Jobs;
using Elasticsearch.Net562;

namespace NestVersionBenchmark.Benchmarks
{
[Config(typeof(Config))]
public class GithubIssue3236
{
private class Config : ManualConfig
{
public Config()
{
Add(
Job.Default
.With(Platform.AnyCpu)
.With(Jit.Default));

            Add(
                Job.Default
                    .With(Platform.AnyCpu)
                    .With(Jit.RyuJit));

            Add(MemoryDiagnoser.Default);
        }
    }

    private readonly Nest562.ElasticClient _client562;
    private readonly Nest610.ElasticClient _client610;

    public GithubIssue3236()
    {
        var bytes = GetResponseBytes();
        var settings562 = new Nest562.ConnectionSettings(new SingleNodeConnectionPool(new Uri("http://localhost:9200")), new InMemoryConnection(bytes));          
        _client562 = new Nest562.ElasticClient(settings562);

        var settings610 = new Nest610.ConnectionSettings(new Elasticsearch.Net610.SingleNodeConnectionPool(new Uri("http://localhost:9200")), new Elasticsearch.Net610.InMemoryConnection(bytes));  
        _client610 = new Nest610.ElasticClient(settings610);
    }

    [Benchmark]
    public Nest562.IGetResponse<Document> Get562()
    {
        return _client562.Get<Document>(1, idx => idx
            .Index("documents")
            .Type("documentmodel"));
    }

    [Benchmark]
    public Nest610.IGetResponse<Document> Get610()
    {
        return _client610.Get<Document>(1, idx => idx
            .Index("documents")
            .Type("documentmodel"));
    }

    public class Document
    {
        public IEnumerable<PlaceBoundary> Boundaries { get; set; }
    }

    public class PlaceBoundary
    {
        public string Type { get; set; }

        public GeoJsonBoundary Boundary { get; set; }
    }

    public class GeoJsonBoundary
    {
        public string Type { get; set; }

        public object Coordinates { get; set; }

        public object Bbox { get; set; }

        public object Crs { get; set; }
    }

    private static byte[] GetResponseBytes()
    {
        using (var stream = typeof(GithubIssue3236).Assembly.GetManifestResourceStream("NestVersionBenchmark.Benchmarks.GithubIssue3236.json"))
        using (var memoryStream = new MemoryStream())
        {
            stream.CopyTo(memoryStream);
            return memoryStream.ToArray();
        }
    }
}

}

On the machine I'm currently on, the results are

``` ini

BenchmarkDotNet=v0.10.14.585-nightly, OS=Windows 10.0.16299.371 (1709/FallCreatorsUpdate/Redstone3)
Intel Core i7-2920XM CPU 2.50GHz (Sandy Bridge), 1 CPU, 8 logical and 4 physical cores
Frequency=2433513 Hz, Resolution=410.9286 ns, Timer=TSC
  [Host]     : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2633.0
  Job-HASSSB : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 64bit LegacyJIT/clrjit-v4.7.2633.0;compatjit-v4.7.2633.0
  Job-IYUISG : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2633.0

Platform=AnyCpu  

| Method | Jit | Mean | Error | StdDev | Gen 0 | Allocated |
|------- |-------- |---------:|----------:|----------:|---------:|----------:|
| Get562 | Default | 6.109 ms | 0.0228 ms | 0.0190 ms | 187.5000 | 774.58 KB |
| Get610 | Default | 6.109 ms | 0.0399 ms | 0.0374 ms | 187.5000 | 773.83 KB |
| Get562 | RyuJit | 5.992 ms | 0.0330 ms | 0.0292 ms | 187.5000 | 774.58 KB |
| Get610 | RyuJit | 6.014 ms | 0.0396 ms | 0.0370 ms | 187.5000 | 773.83 KB |

5.6.2 and 6.1.0 benchmarks look _very similar_.

Would you be able to share further information about

  1. Runtime e.g. .NET Framework, .NET Core, and version
  2. Application TFM e.g. net46, netcoreapp2.0

@russcam thanks for looking into this.
I agree your results looks very similar. I have tried replicating your test using an InMemoryConnection and I still get significant different results. I have performed the test on both net462 and netcoreapp2.0 with the same outcome.

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.16299.309 (1709/FallCreatorsUpdate/Redstone3)
Intel Core i5-4590 CPU 3.30GHz (Haswell), 1 CPU, 4 logical and 4 physical cores
Frequency=3215221 Hz, Resolution=311.0206 ns, Timer=TSC
.NET Core SDK=2.1.4
  [Host]     : .NET Core 2.0.5 (CoreCLR 4.6.26020.03, CoreFX 4.6.26018.01), 64bit RyuJIT
  Job-ZMNRCM : .NET Core 2.0.5 (CoreCLR 4.6.26020.03, CoreFX 4.6.26018.01), 64bit RyuJIT

Jit=RyuJit  Platform=AnyCpu

Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
------- |---------:|----------:|----------:|---------:|---------:|----------:|----------:|
Get562 | 15.51 ms | 0.0579 ms | 0.0542 ms | 531.2500 | 265.6250 | - | 3.2 MB |
Get610 | 38.46 ms | 0.2187 ms | 0.2046 ms | 1500.0000 | 687.5000 | 312.5000 | 8.39 MB |

The only difference between the two tests are the version of the NEST library
<PackageReference Include="NEST" Version="5.6.2" /> vs <PackageReference Include="NEST" Version="6.1.0" />
Can I provide you with more information?
Thanks again.

Have a chance to look at this again now @kaaresylow. Would you be able to re-run your benchmarks with NEST 6.2.0? There were several performance improvements made that show it to be faster than 5.x.

We recently upgraded to 6.3.1 from 2.x and took a significant performance hit on the clients. I have been profiling our system and my findings might be relevant to the issue above.
My initial findings points to the ReadJson method in the Nest.SourceConverter class:

public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
{
    var token = reader.ReadTokenWithDateParseHandlingNone();
    using (var ms = token.ToStream(serializer.GetConnectionSettings().MemoryStreamFactory))
        return sourceSerializer.Deserialize(objectType, ms);
}

Here is the part of the call tree from my profiling that covers the ReadJson method:

Function Name | Inclusive Samples
-----------------|--------------------
Nest.SourceConverter.ReadJson | 37.290
-> Nest.JsonNetSerializer.ConnectionSettingsAwareSerializerBase.Deserialize | 18.616
-> Nest.Json.Linq.JToken.ReadFrom | 14.061
-> Nest.JTokenExtensions.ToStream | 4.421
-> Elasticsearch.Net.RecyclableMemoryStream.ctor | 121
-> .... | ...

As I understand it, the Deserialize method does the real work, the remaining three are all about creating the stream to be parsed into Deserialize. When creating the stream the JSON is parsed and it is parsed again in the Deserialize method, so the JSON is actually parsed twice. If there was a way to parse the JsonReader directly into the Deserialize method we can cut it down to one parse and cut the cost of the ReadJson method by approx. 50%.

I have made an experiment in a fork of NEST that confirms this. To do the experiment I added a new interface to NEST and extended the ReadJson method like this:

namespace Nest {
    public interface IDeserializeFromJsonReader {
        object Deserialize(JsonReader reader, Type objectType);
    }
}
public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
{
    var sourceSerializer = serializer.GetConnectionSettings().SourceSerializer;
    if (sourceSerializer is IDeserializeFromJsonReader fromJsonReaderSerializer)
        return fromJsonReaderSerializer.Deserialize(reader, objectType);

    var token = reader.ReadTokenWithDateParseHandlingNone();
    using (var ms = token.ToStream(serializer.GetConnectionSettings().MemoryStreamFactory))
        return sourceSerializer.Deserialize(objectType, ms);
}

I then implemented the IDeserializeFromJsonReader interface in my SourceSerializer and ran a new profiling session. The ReadJson method then looked like this in the call tree:

Function Name | Inclusive Samples
-----------------|--------------------
Nest.SourceConverter.ReadJson | 21.608
-> Nest.JsonNetSerializer.MySerializerThatDeserializesFromJsonReader.Deserialize | 21.595
-> Nest.JsonExtensions.GetConnectionSettings | 7
-> ... | ...
(The profiled workload is not exactly the same as before so you can't compare the absolute numbers).

From a performance point of view there is a lot to win by adding an interface like IDeserializeFromJsonReader, but I'm unsure if it will cause other problems... I'm happy to create a PR with the above changes plus an implementation in the InternalSerializer, if you're interested.
/ Frits

@fritsduus Thank you for the detailed information.

There is a cost associated with using JsonNetSerializer at present, as there isn't a clean way to pass a Stream to JsonNetSerializer, to only deserialize objects that you control e.g. documents returned in _source fields. Can you see a clean way to pass the Stream? If so, we'd love to improve how this works! Note that the cost isn't paid when using the built-in serializer as the SourceSerializer.

One of the reasons for internalizing Json.NET in the client is so that it can be replaced with a faster serializer implementation going forward. As such, we wouldn't want to introduce APIs that expose Json.NET specific implementation details such as a IDeserializeFromJsonReader method.

We expect to start on the serializer work shortly, and have been looking at SpanJson and Utf8Json as potential candidates.

@russcam Thank you for your response.

I think your long term goal of replacing Json.NET with a faster serializer is very good. I was also looking at creating a SourceSerializer using Jil or Utf8Json, but then I realized the internal cost of the SourceConverter which couldn't really be removed by such a solution.

I don't see a simple way to pass the Stream at the moment. It looks like a larger task to replace the serializer. I was aiming for a quick win with the suggested solution as we are in need of a solution to our problem before Black Friday.

We're using Utf8Json as a serializer/deserializer for NEST client right now with great success (in our case, we saw up to 10 times improved performance in deserialization for our workload).

Just a quick heads up that we expect to switch to Utf8Json as the default internal serializer with 7.0.

One nice thing that @russcam found is that the hand off of the stream to the user defined serializer is now much much faster.

@DSilence can you please provide an example of how you had used Utf8Json as a serializer/deserializer for NEST?

@florin-nedelcu In our case, only a couple of heavy search requests required a more performant deserialization, so for them I've used a low level client.

var searchResult = await _elasticClient.LowLevel.SearchAsync<BytesResponse>(indexName, PostData.Serializable(searchDescriptor));
var result = Utf8Json.JsonSerializer.Deserialize<ElasticPointResponse>(searchResult.Body);

If there is a need for full replacement I suggest referring to https://www.elastic.co/guide/en/elasticsearch/client/net-api/master/custom-serialization.html. Keep in mind that some internal types will require customized formatters for UTF8JSON (e.g. geography).

@Mpdreamz when is 7.0 planned for release?
We have a similar perf problem with 6.x serializer.
Ty!

@SidHuntt As a policy, we don't give timelines for when releases will be out, I'm afraid.

There's an open PR to switch over to utf8json #3583 which we are looking to merge in along with other changes necessary for 7.0, so that we can publish a prerelease version of the client for 7.0, with the new serializer, to receive feedback on. We're hoping to do that shortly.

7.0 now uses Utf8Json as the serialiser, as detailed on the GA release blog post: https://www.elastic.co/blog/nest-and-elasticsearch-net-7-0-now-ga

Was this page helpful?
0 / 5 - 0 ratings