Elasticsearch: RestClient improvements

Created on 24 Jun 2016  路  23Comments  路  Source: elastic/elasticsearch

This is a meta issue to track improvements that need to be made to the recently added RestClient (#18735).

  • [x] add async variant of performRequest method (#19301)
  • [x] add docs page on how to use it and configuration for common scenarios like ssl, proxy, basic auth etc. although we can refer to apache http client documentation (#19618)
  • [ ] higher level client that exposes api specific methods and helps with providing the right parameters. Bonus points if we can keep backwards compatibility with the current java api by ether implementing the Client interface or at least reusing the request and response objects that we already have, or something along those lines to ease migration.
:CorFeatureJava Low Level REST Client Meta

Most helpful comment

heya @brusic I think numbers outside of context can be misleading. Using the Java API has so many downsides at this point, like explained in the blogpost, that ~ 5% performance loss is reasonable. We will also work to make the REST client as performant as we can (and do more benchmarks like you are suggesting). That's the right trade-off. The main reason why we had the Java API in the first place wasn't performance, rather that it came for free :) If we started today from scratch we wouldn't expose the transport layer to users anymore, which is why we decided to go down this path.

All 23 comments

Bonus points if we can keep backwards compatibility with the current java api by implementing the Client interface and reusing the RequestBuilder objects that we already have, or something along those lines to ease migration.

I think that'd be super useful but if we're going to do it I'd want to do it in yet another project. So, like :client:high-level for a useful high level client and :client:transport-adapter or something. That way the high level client doesn't have to pull in all of Elasticsearch's dependencies and we don't have to figure out a way to slice the transport client's interface out of Elasticsearch.

I have configured an anonymous user for shield with limited monitoring privileges. Consequently, I get back a 403, not a 401, when making anonymous requests. This prevents the commons http client from responding with the correct credentials.

There should be a way to force pre-emptive authentication, either by setting credentials in the configuration manually (and letting the RestClient manage basic auth by itself), or by exposing the the HttpClientContext object used with the request. See: https://hc.apache.org/httpcomponents-client-ga/httpclient/examples/org/apache/http/examples/client/ClientPreemptiveBasicAuthentication.java

@mkw you can also set shield.authc.anonymous.authz_exception: false in your elasticsearch.yml to have the response be a 401 instead of the 403. This should allow you to use the client as is today.

@jaymode Thanks! That is very helpful and vasty superior to manually adding a default "Authentication" header.

I may have spoken too soon: While using the challenge/response mechanism is clearly more secure than pre-emptive authentication, it forces a double round-trip for every request because Apache HC will only cache authentication for the duration of an HttpClientContext. Simple searches are quick enough that the extra request really adds up.

Hi Guys, I'm a little late to the party, but started working on a high level rest client. My company has been stuck on 1.4 due to the size of our codebase and number of clients in production. This low level REST client sparked the idea... [https://github.com/brandonkearby/elasticsearch/tree/rest-client]
What I did was backport the low level rest client into the 1.4 branch and renamed it to InternalRestClient. Then started supporting the various api calls. Everything was going pretty smooth until I started working on Aggregations as they need to provide the type. Anyway I'm interested in working on the high level rest client.

I like the promise of the new 5.0 rest client. However, in it's current form, it isn't terribly useful. We would have to rewrite all of our query and index manipulation code to generate JSON. Having a Client implementation based on the rest-client would be ideal, but doesn't seem like a straightforward thing to build.

Quick update: we are not going to implement the Client interface in the upcoming high level REST client. We'd rather to start with a clean design and focus on a better API for the time being. We will reuse as much as possible from the current java API though, meaning requests, responses and builders. We are experimenting on this. We are considering to extract the current java api into a common library, or just make the high level client depend on elasticsearch core, at least in the beginning.

@javanna Not sure you took a peek, but I have a pretty much complete high level REST client already written for 1.4.1. It works with the existing Client and Admin interfaces

https://github.com/brandonkearby/elasticsearch/commits/rest-client-1.4.1

@javanna If at all possible I would not make a dependency on core if that would mean you also have to pull in all the Lucene jars. I personally really like the idea of having as little dependencies as possible.

I personally really like the idea of having as little dependencies as possible.

@jettro that makes sense and everybody would like to have no deps of course, but we need to focus first on having something to release rather sooner than later (for which the dependency on es core, hence lucene, is acceptable). After that we can focus on a long term solution, where we'll iterate and hopefully remove the dependency on es core. Other important point is trying to ease the migration from the Java API, for which reusing requests and responses is very important, I think more than removing the dependency on core at this point.

Yeah @javanna ease of migration was very important for us. It's the primary reason for adding it in 1.4.1 since that's the version we are on. That way we don't have to maintain to ways of doing things while in the transition state. This allows us to continue to use the transport client and once a customer's cluster has been reindex into a 5.0 cluster, we then switch them to the rest client. Same code using transport and rest in the same jvm.

hey @brandonkearby thanks for the heads up I had a look at your branch. It is similar to what we intend to do, but we will most likely add the parsing code as fromXContent methods to all the classes that implement ToXContent, one step at a time. Also, ease of migration doesn't mean 1:1 replacement in our plan. We don't plan on implementing the Client interface like you did. We have been unhappy with the Client interface for a long time, see #9201. We would like to take the chance and clean that up, have a sync method and an async method, rather than a mix of the two. The compatibility will be on the request, builders and response side of things. The call itself will need some slight updating, which should be acceptable.

@javanna, I started down the path of doing fromXContent, but there were places where the object graph didn't match the JSON. This made it difficult without a DOM approach. As far as the changes in #9201, looks a lot cleaner!

Has there been any thought to collaborating on improving the existing REST client that is out there, Jest?

https://github.com/searchbox-io/Jest

We just published a blogpost on the current state of the Java clients which should answer most of the questions around the REST client and its future: https://www.elastic.co/blog/state-of-the-official-elasticsearch-java-clients .

@dhay as explained in the blog we considered different approaches but we decided to start from our own Java API and take it from there.

From the blog: "Benchmarks, however, show that the performance of the HTTP client is close enough to that of the Transport client"

I do not read the results in the same way. :) Would love to see the benchmarks for multi-threaded indexing, not single-threaded. But this issue is regarding the RestClient, so I will rest my defense of the pure Java API for another time.

I do not read the results in the same way. :)

@brusic Can you please elaborate? I did the original benchmarks. Maybe something I did not explain something clearly enough in the blog post or I made a mistake (benchmarking is hard and even with great care something can go wrong :)).

Would love to see the benchmarks for multi-threaded indexing, not single-threaded.

That's one aspect that I thought about originally but I disregarded the idea because we needed to get an initial idea about the performance characteristics of REST vs. transport. Moreover, multi-threaded performance should be dominated by contention in the connection pool and not the client. Nevertheless, we could extend the benchmarks at a later time.

According to the original blog post "The HTTP client has between 4% and 7% smaller bulk indexing throughput than the transport client." In my world, those numbers are not close enough. :) I want to ask my boss for a 4%-7% raise. Hey, it's close enough!

The blog continues to say that those benchmarks are under lab conditions, but the underlying problem remains. Then again, without multi-threaded benchmarks, I am disregarding the results. :) I use the binary API for indexing and REST for searching since I need all the performance I can get from indexing. The only reason why I use Jest is because I have a dependency on an older version of Lucene in my code and therefore cannot use the elasticsearch jar.

heya @brusic I think numbers outside of context can be misleading. Using the Java API has so many downsides at this point, like explained in the blogpost, that ~ 5% performance loss is reasonable. We will also work to make the REST client as performant as we can (and do more benchmarks like you are suggesting). That's the right trade-off. The main reason why we had the Java API in the first place wasn't performance, rather that it came for free :) If we started today from scratch we wouldn't expose the transport layer to users anymore, which is why we decided to go down this path.

If the request / response objects are meant to be reused in the Java Rest Client, care should be taken to not impose java 8 on the client side, IMHO. That way many codebases that can't easly upgrade to Java 8 can still continue to use Elasticsearch 5 through the Java Rest Client.

hi @godiedelrio , we high level REST client will still depend on Elasticsearch at least in the beginning. That means it will also require java 8. People that can't upgrade to java 8 or don't want Elasticsearch as a dependency can use the low level REST client, and take care of writing requests and reading back responses in their own application.

Closing this in favour of #23331 that's more specific to the high level REST client.

Was this page helpful?
0 / 5 - 0 ratings