My company (Bazaarvoice) is making heavy use of elasticsearch in our applications, and we've had a lot of trouble with the Java client options.
In short: both the Node and Transport clients suffer from sensitivity to differences in serialization. It's easy enough to maintain Java and ES version consistency within the cluster, but it's not so easy to do so between the cluster and client applications. Especially if you want to support operations like HA rolls. Using a JSON/REST client is much more permissive of variation between the client and server configurations.
On the other hand, the Java JSON client offerings have some pretty serious shortcomings of their own, mostly around having weird interfaces, but bugs are also a substantial challenge.
Ideally, we would write all our code against the ES Client interface and then simply wire in an HttpTransportClient (e.g.). This would give the flexibility to experiment with the tradeoffs when using a Node client vs. a Transport client vs. Http. From a usablility perspective also, it seems preferable to reuse the same interfaces that ES already provides, documents, and maintains.
I've looked into the code, and it doesn't look like it would be too hairy, but I'd be surprised if you all hadn't thought about this problem at least once before.
I'd like to hear your thoughts before I get too deep into an implementation.
Also, I'd like to know if you'd have any interest in maintaining the library; specifically if you'd want it in the ES codebase or not. I can develop this in a separate repo, but if you think you'd ultimately want it, I should fork ES and send you a PR.
Thanks,
-John
This is also a problem that we are currently experiencing in developing with Elasticsearch at our company. We are in the process of moving from a monolithic app based set of applications to a micro-services based architecture. The way that the Java versions tie the client applications to Elasticsearch means that if we want to use Java 8 in one application that talks to Elasticsearch we have to update them all at once which isn't an easy task and goes against the idea of decoupling that micro-services provide us.
We've looked a little at Jest (https://github.com/searchbox-io/Jest) but it seems to be far behind the curve in terms of the functionality it provides (for example aggregations). We're also nervous about tying ourselves to a 3rd party project that is always going to be behind the curve compared the libraries Elasticsearch maintain.
We've also started looking at implementing the Elasticsearch Client.java interface to provide a pure HTTP RESTful based solution ourselves but it would be nice to know if this is already being worked on somewhere before we go too far down this path.
Hey Adrian,
I didn't hear back on the original question, so I have to assume this thing will live on its own. I've started development in a private repo, and I'm pretty happy with the way it's going.
One thought I had was to stop short of implementing the entire API at first and just do index, get, and search, which is all most of our teams do with the client. That way, we can start exercising the client, and finish up the rest of the API calls later.
If the client is useful to you in this state, I can make it available once it is functional. Which API calls do you need?
Thanks,
-John
Hi @vvcephei and @adrian-mcmichael
You can currently use any 1.x client with any 1.y server version (where y >= x). I'm not a Java programmer, but If you switch to using HTTP, surely you'd be losing the ability to receive the response as proper Elasticsearch Java objects (without building a whole infrastructure that knows how to convert the JSON response)?
This is the part that makes me step back from a pure REST Java client. Perhaps there is another way to improve the situation (or perhaps I've misunderstood). What exactly are the pain points that you're trying to solve with this?
The first step towards a native http client would be to add FromXContext
interface and let the response classes implements this (SearchResponse, SearchHits, InternalAggregations etc..). Should be similar to the ToXContext
interface, but then reading xcontext instead of writing.
This on its own would also be useful for #8150.
Either using "FromXContent" or the native serialization, a versioning aspect will need to be implemented. We already have versioning in the native serialization. The main benefit of using the Java client today is the strong typing it provides, cause by builders and deserialization support into Java objects. I think that adding the complexity of versioning with "xcontent" parsing will end up in the same problem space as versioning in the current native serialization, just more complicated (more code, ...).
There is another potential reason, and thats using HTTP as the transport layer, but still use the native serialization format. That can be beneficial, but is much simpler to implement.
I'd love to understand where the problems are when it comes to cross version (transport) client + server, and focus on resolving those.
Thanks, guys, for the response. To clarify, the goal is to get a well designed and supported Java JSON client, not to use HTTP as the transport per se.
Tl;dr:
I doubt there's any way to do the "right thing" with native serialization. The right way to go is "fromXContent". This is nonzero cost to implement and maintain, but it will save your java users from a lot of grief in the long run.
Long-form thoughts:
There may be some benefits to using java serialization with HTTP instead of TCP, but it wouldn't address the main issue I'm confronting.
The "FromXContent" approach is what I had in mind. Native serialization is convenient to use, but it's just too sensitive to differences between the client's and cluster's JVMs. Particularly, I'm thinking of an issue in which the serialization of Date
changed between java builds (I don't remember which right now; it might have been 1.7.0_25 and _45). This resulted in some very tricky errors for us to debug. It was all the more infuriating in that case because the errors also failed to deserialize, so we couldn't even see what was wrong.
Native serialization may work in most situations, but I don't think there is any way to ensure your clients won't face this kind of issue except to require them to use exactly the same version of java in the client application and the ES cluster. But this is a pretty tall order, since you'll have developer machines, application servers, and elasticsearch hosts, potentially all maintained by different teams; maybe even different companies.
Aside: I don't think there's a good reason to change how ES chatters internally because it's easy to guarantee every node in the cluster is using the same JVM version. Internally, efficient and compact serialization should be the priority.
I agree with Shay that we're looking at strictly more code and more maintenance, but the advantage is that you can explicitly control the situations under which a client is or is not api compatible with the server. I personally favor more permissive deserialization, so there wouldn't be a deserialization error unless it's not possible to construct a sane object from the json response, but I can see why you may want to be stricter with version checking. Either way, the typing is just as strong. It's just a question of the semantics as to whether certain fields are permitted to be absent on response objects.
I have already started implementing the "fromXContent" method for some parts of the API. I'm structuring it right now as a standalone library, but I'm happy to put it right inside the response objects instead and send a PR. This would make the implementation a little easier because there are a few cases where I need access to protected members to perform the deserialization, so I've had to play distasteful games with the packages.
I should mention that I've been preoccupied with my main project at work for the last couple of months, so the implementation situation is not much changed from my last exchange with Adrian.
hi @vvcephei
you have any progress on your work? I also feel that the REST API support for the JVM story is not good. Currently we are using Jest and we also miss the aggregations support.
We are using ES via HTTP mainly due to authentication requirements.
Hey @jtjferreira
I had to put the project aside for longer than I wanted to focus on some
other major es projects at work, but I'm getting back into it for now. I'm
updating the dependency to ES 1.4.4, and I'm going to start with just get()
and index() to get a round-trip going. Once that is in place, I'll
publicize my repo and post a link to it here so everyone can get a more
concrete look at what I'm thinking.
I've already implemented these two functions on 0.90.7, so it shouldn't be
too long before I get to this phase. I got fairly bogged down before
working my way through search() because the facet response has so many
different forms. Hence, my desire to get _something_ out there.
Thanks for the interest. I won't leave you hanging too much longer.
-John
On Sat, Jan 24, 2015 at 2:30 PM, jtjeferreira [email protected]
wrote:
hi @vvcephei https://github.com/vvcephei
you have any progress on your work? I also feel that the REST API support
for the JVM story is not good. Currently we are using Jest and we also miss
the aggregations support.We are using ES via HTTP mainly due to authentication requirements.
—
Reply to this email directly or view it on GitHub
https://github.com/elasticsearch/elasticsearch/issues/7743#issuecomment-71335853
.
Ok, folks, I've sketched out what I've been talking about here: https://github.com/vvcephei/es-rest-client-java
Please let me know what you think. For the long-term plans, see the README. For stuff that's on the roadmap, see the issues.
At this point, these are my top-level questions:
Thanks, everyone,
-John
+1 for same Client API but with HTTP transport
The Amazon ElasticSearch Service currently only supports HTTP
+1
Hey @matt-blanchette and @davsclaus,
It's good to hear I'm not the only one who thinks this should happen. You might want to check out the repo I previously linked (https://github.com/vvcephei/es-rest-client-java). Right now, it is only built against 1.3 and 1.4, but updating it is usually pretty straightforward.
One of my coworkers did some pretty extensive functional testing of it and resolved a number of bugs. We didn't get a chance to to perf testing before we both got preoccupied with other concerns, though.
I haven't put much work into this project recently because:
If you (or anyone else) want to use it, please let me know, and I'll make it a priority to support you.
Thanks,
-John
FYI - we announced that we will build an official http client with minimal dependencies. I will assign this issue to me and will update it with the plans once we have them ready for you. I am pretty sure we will provide a basic version in the next couple of months but for now I just wanted to let everybody know we have it on the roadmap.
Oh, hey @s1monw! I missed that announcement. It's great to hear.
I don't know if it will help you, but feel free to mine my project if it helps you get started.
I'm also more than happy to volunteer if you want help on the official client.
Thanks,
-John
@s1monw this is still slated for 5.0.0?
this is still slated for 5.0.0?
@otisg yes that is the goal.
Is there a JIRA issue number we can follow for this?
When you say basic API what do you mean, is there anywhere I can see a spec on what to expect.
Reason I ask is that I have hit the same limitations with the existing APIs that the other guys have experienced.
I am also now going to re-look at Spring Data ElasticSearch given that they have upgraded recently just to see what they have done.
Is there a JIRA issue number we can follow for this?
We don't use JIRA but this is the right github issue for it. When there will be a pull request it will reference this issue.
When you say basic API what do you mean, is there anywhere I can see a spec on what to expect.
Our REST spec are shared between all of our language clients and are here: https://github.com/elastic/elasticsearch/tree/master/rest-api-spec . The first version of the java client will be quite low level, with a very simple performRequest
method that allows to send http requests to the cluster doing round-robin against multiple nodes and handling fail-over. We will then have api specific objects so that users can do things like indexClient.index(index, type, id, body)
etc.
Reason I ask is that I have hit the same limitations with the existing APIs that the other guys have experienced.
I think the biggest limitation is the fact that you have to depend on the whole elasticsearch to connect to it through java, plus bw compatibility on the transport layer, and those are the main reasons why we are working on the java REST client.
It would be fantastic if this client could implement the current Client
interface (like @vvcephei implementation) and be a drop in replacement for the TransportClient
or NodeClient
.
Hi @gquintana , I agree it would be nice, but I don't think it is likely to happen. I don't expect migrating to be particularly hard, but changes will be required. Also keep in mind that the very first version may be much more low level than the Client
interface, which has calls for each specific api etc.
We will then have api specific objects so that users can do things like indexClient.index(index, type, id, body) etc.
Is there another issue to follow the progress on this part?
Hi @robinst I created a meta issue for the planned improvements, which will be updated as we go: #19055 .
Most helpful comment
@otisg yes that is the goal.