Elasticsearch-net: RequestData.CreatePathWithQueryStrings Performance Optimisation

Created on 24 Aug 2020  路  5Comments  路  Source: elastic/elasticsearch-net

Further to my PR #4952, I have been reviewing the calling code to look at potential further performance optimisations.

CreatePathWithQueryStrings is called within the RequestData ctor to populate the PathAndQuery property for every instance. The current implementation builds a string after populating NameValueCollection instances from both the global connection config and the request parameters. Eventually, it calls down to the optimised NameValueCollection extension method.

I've created a prototype of an optimised version of this method, which rents an oversized Span<char> from the ArrayPool and builds the path and query within that buffer. One key change this includes, is to avoid creating a copy of the existing IConnectionConfigurationValues.QueryStringParameters collection and converting the IRequestParameters.QueryString dictionary to a NameValueCollection. Instead, it performs the conversion of objects from the dictionary directly into the buffer from the pool.

The logic I've added so far is simplified for prototyping. It would require some tests to properly validate that it matches (or can match) the existing behaviour. Having benchmarked the prototype for one example request, the potential optimisation is fairly significant.

|                       Method |       Mean |     Error |      StdDev |     Median |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------------------------- |-----------:|----------:|------------:|-----------:|-------:|------:|------:|----------:|
|   CreatePathWithQueryStrings | 7,879.6 ns | 390.67 ns | 1,139.61 ns | 7,195.2 ns | 0.3357 |     - |     - |    1464 B |
| CreatePathWithQueryStringsV2 |   648.0 ns |  12.86 ns |    12.03 ns |   648.9 ns | 0.0305 |     - |     - |     128 B |

This equates to a 91.2% reduction in execution time and 91.3% fewer bytes allocated. The final bytes here is pretty much just the creation of the final string, with little or no additional overhead. Given that this occurs per request instance, it seems like a reasonable improvement to go after.

For reference: In the case of the above benchmarks, I used connection settings as follows:

```c#
_connectionSettings.GlobalQueryStringParameters(new NameValueCollection
{
{ "allow_no_indices", "false" },
{ "global_key", "thing" } // made up for testing
});

and a request:

```c#
new SearchRequestParameters { AllowNoIndices = true, DocValueFields = new[] { "item1" } };

Before investigating this further, I wanted to check if you agree that the risk vs. reward would be considered acceptable in this case. Given it's a pretty important method for building valid requests, so it's sensitive should any regressions occur. That said, with sufficient tests, the existing logic is not too complex to thoroughly test before making the change.

There is also a comment from @Mpdreamz in the code...

// TODO This feels like its in the wrong place

I'd consider adding a type (helper) capable of encapsulating the path and query building functionality. If that's something you'd consider at this point, given that it's a public API, would there be a preferred strategy for supporting a change. It's a very functional method, unlikely to be replaced during testing or with alternative runtime implementations.

Feature

All 5 comments

Before investigating this further, I wanted to check if you agree that the risk vs. reward would be considered acceptable in this case. Given it's a pretty important method for building valid requests, so it's sensitive should any regressions occur. That said, with sufficient tests, the existing logic is not too complex to thoroughly test before making the change.

It's sensitive to regressions but given the number of tests that have this codepath in their call stack I'm, perhaps paradoxically, not worried about committing to this change 馃槃

I'd consider adding a type (helper) capable of encapsulating the path and query building functionality. If that's something you'd consider at this point, given that it's a public API, would there be a preferred strategy for supporting a change. It's a very functional method, unlikely to be replaced during testing or with alternative runtime implementations.

There is a huge opportunity here but perhaps too big for this PR.

Elasticsearch.Net relies heavily on ElasticsearchUrlFormatter

NEST relies heavily on ApiUrls and ApiUrlsLookup which acts as a router to which Requests can resolve to.

It'd be awesome to revisit some of these design decisions and normalize the two and work out a more transparent common interface.

There is also a comment from @Mpdreamz in the code...

I never liked the placing of this logic on RequestData which really needs to be a DTO for the Transport and friends to switch on. I am happy to leave it there for now though and focus on the performance boost 馃コ 馃嵃.

Thanks @Mpdreamz! I'm away on holiday for the remainder of this week, but I'll continue to take a look at this upon my return. Once I have the core cases working reliably, I'll get a PR in to review from there.

Thanks for your effort and focus here, @stevejgordon!

On reflection, this is quite a complex change since the ideal implementation is to build within a single rented array of char but that requires either, knowing/calculating the max path and query length upfront, renting a much larger array than ever will be needed or allowing for the array to grow when needed. My prototype fixed the URL at 8,192 which is double the default max URL length for Elasticsearch. Of course, users may configure their cluster with a different maximum. Is there an absolute maximum we can rely upon?

I've prototyped a helper method to grow a rented array but it does add a fair degree of complexity to the code which requires some thought on the best way to structure it.

Will re-review in benchmark work

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ffMathy picture ffMathy  路  16Comments

meriturva picture meriturva  路  13Comments

markwalsh-liverpool picture markwalsh-liverpool  路  12Comments

alexandrepepin picture alexandrepepin  路  13Comments

jayhilden picture jayhilden  路  13Comments