Elasticsearch: Java high-level REST client completeness

Created on 1 Nov 2017  路  72Comments  路  Source: elastic/elasticsearch

This is a meta issue to track completeness of the Java REST high-level Client in terms of supported API. The following list includes all the REST API that Elasticsearch exposes to date, and that are also exposed by the Transport Client. The ones marked as done are already supported by the high-level REST client, while the others need to be added. Every group is sorted based on an estimation around how important the API is, from more important to less important. Each API is also assigned a rank (easy, medium, hard) that expresses how difficult adding support for it is expected to be.

The API listed as "Not Required" won't need to be supported before the transport client is removed from the master branch (next major version). Such API are mainly administrative API that are not likely to be used from a Java application. They generally return heavy responses and make it hard to reuse response objects from the transport client as they expose internal objects that in some cases cannot even be parsed back entirely based on the information returned at REST. We considered returning those as maps of maps but that鈥檚 also easy to achieve using the low-level REST client hence we decided to not implement them for the time being.

Top-level APIs

  • [x] ping (easy)
  • [x] info (easy)
  • [x] index (medium)
  • [x] update (medium)
  • [x] delete (medium)
  • [x] bulk (hard)
  • [x] get (medium)
  • [x] exists (easy)
  • [x] multi get (medium) #27337
  • [x] search (very hard)
  • [x] search scroll (easy)
  • [x] clear scroll (easy)
  • [x] multi search (hard) #27274
  • [x] update by query (medium) @sohaibiftikhar
  • [x] delete by query (medium) @sohaibiftikhar
  • [x] reindex (medium) @sohaibiftikhar
  • [] reindex with wait_for_completion=false creates task @pgomulka
  • [x] rethrottle (reindex, update by query, delete by query) https://github.com/elastic/elasticsearch/pull/33951
  • [x] search template (medium) https://github.com/elastic/elasticsearch/pull/30473
  • [x] render search template (easy) (included in search template API) #30473
  • [x] multi search templates (medium) #30836
  • [x] term vectors (hard) #33447
  • [x] multi term vectors (hard) @mayya-sharipova #35266
  • [x] explain (medium) #31387
  • [x] field caps (easy) #29664
  • [x] put stored script (easy) #31323
  • [x] delete stored script (easy) #31355
  • [x] get stored script (medium) #31355

Indices API

  • [x] create index (easy)
  • [x] delete index (easy)
  • [x] indices exist (easy) #27384
  • [x] update alias (medium) #27876
  • [x] exists alias (easy) #28332
  • [x] get alias (medium) #28799
  • [ ] types exist (easy)
  • [x] put mapping (easy) #27869
  • [x] open index (easy)
  • [x] close index (easy)
  • [x] refresh (easy) #27799
  • [x] flush (easy) #28852
  • [x] update index settings (easy) #28892
  • [x] get index settings (easy) #29229
  • [x] clear cache (easy) #28866
  • [x] force merge (easy) #28896
  • [x] shrink (easy) #28425
  • [x] split (easy) #28425
  • [x] rollover (easy) #28698
  • [x] synced flush (medium) (exposes ShardRouting, hard to reconstruct the whole response from info returned via REST) #30650
  • [x] get index (medium) #31703
  • [x] get mappings (easy) #30889
  • [x] get field mappings (medium) #31423
  • [x] put index template (medium) #30400
  • [x] delete index template (easy) #36320
  • [x] get index templates (medium) #31161
  • [x] validate query (medium) #31077
  • [x] analyze (hard) #31577

Not required

  • shard stores (medium)
  • upgrade (easy) (to be removed?)
  • upgrade status (easy) (to be removed?)
  • segments (hard) (exposes ShardRouting)
  • recoveries (hard) (exposes ShardRouting, DiscoveryNode)
  • indices stats (hard) (exposes ShardRouting and a lot of other objects)

Snapshot API

  • [x] get repositories #30362
  • [x] create repository #30501
  • [x] verify repository #30934
  • [x] delete repository #30666
  • [x] create snapshot (medium) #31215
  • [x] snapshots status (medium) #31515
  • [x] get snapshots (medium) #31537
  • [x] delete snapshot (easy) #31393
  • [x] restore snapshot (medium) #32155

Ingest API

  • [x] put ingest pipeline (easy) #30793
  • [x] delete ingest pipeline (easy) #30865
  • [x] get ingest pipeline (easy) #30847
  • [x] simulate ingest pipeline (medium) #31158

Tasks API

  • [x] list tasks (medium) #29546
  • [x] get task (medium) #35166
  • [x] cancel task (easy) #30745

Cluster API

  • [x] cluster health (medium) #29331
  • [x] update cluster settings (easy) #28633
  • [x] get cluster settings (medium) #31706 (doesn't have its own Response object, exposed at REST only)

Not required

  • search shards (medium) (exposes ShardRouting, DiscoveryNode and requires parsing back QueryBuilder)
  • pending cluster tasks (easy)
  • allocation explain (hard) (exposes ShardRouting)
  • cluster state (hard) (exposes ClusterState)
  • reroute (easy if done after cluster state API, returns the entire cluster state)
  • nodes info (hard) (exposes DiscoveryNode and a lot of other objects)
  • nodes stats (hard) (exposes ShardRouting and a lot of other objects)
  • cluster stats (hard) (exposes DiscoveryNode, requires nodes info and nodes stats)
  • hot threads (easy) (exposes DiscoveryNode)
  • nodes usage (medium) (exposes DiscoveryNode)

REST only API

There are a number of API that are exposed via REST but not via the Transport Client. They don't necessarily have to be implemented if the goal is feature parity with the Transport Client, yet we should probably have a look at why they were not added to the Transport Client and whether it makes sense to add their support to the high-level REST Client or not. I don't think it makes sense to add support for cat API and ingest processor grok, hence I took them out already.

  • [ ] cluster remote info
  • [x] count #31868
  • [ ] get source
  • [x] source exists https://github.com/elastic/elasticsearch/pull/34519
  • [ ] delete alias
  • [x] indices template exist @andyb-elastic (#36132)
  • [ ] get upgrade
  • ingest processor grok
  • cat API: aliases, allocation, count, fielddata, health, help, indices, master, nodeattrs, nodes, pending tasks, plugins, recovery, repositories, segments, shards, snapshots, tasks, templates, threadpool

How to add support for a new API

Look at some of the already supported API and existing PRs that have been merged:

  • Add Index API to High Level Rest Client (#23040)
  • Add BulkRequest support to High Level Rest client (#23312)
  • Add delete API to the High Level Rest Client (#23187)
  • Add UpdateRequest support to High Level Rest client (#23266)
  • Added Delete Index support to high-level REST client (#27019)

The common tasks in each of the above PRs are:

  • add fromXContent method to existing response class currently used by transport client and corresponding unit tests that make use of fields shuffling as well as random fields insertion (in order to test forward compatibility). That usually means adding a test for the response object that extends AbstractXContentTestCase where supportsUnknownFields() returns true as well as assertToXContentEquivalence. There are cases where we can't insert random fields everywhere, which then require to also override the getRandomFieldsExcludeFilter() method which returns path that should be excluded when injecting random fields. Given the randomizations applied, it makes sense to run this type of test locally with -Dtests.iters=50 argument just to make sure that it is consistently green.
  • add new method to Request class which translates the input request into the internal REST request representation that holds method, url, endpoint, params etc. and add corresponding tests to RequestTests
  • add new method to RestHighLevelClient, possibly also its async variant when it makes sense, we may not want to add async variants to every single method, so we decide case by case. The name of the new method must match what is defined in our REST spec including the namespace.
  • add integration test that extends ESRestHighLevelClientTestCase that tests the new method end-to-end by sending REST requests to an external cluster.
  • add docs page. To check how docs are rendered and whether the links between docs pages and docs snippets work ok, run the following command from the root of your local checkout of the Elasticsearch repository: /path/to/elastic/docs/build_docs.pl --doc docs/java-rest/index.asciidoc --chunk 1 --out ~/temp/asciidoc --open . This requires also a local checkout of the docs repository, where the perl script is located.

Relates #29827

:CorFeatureJava High Level REST Client Meta Pretty Bloody Important

Most helpful comment

Hi, it would be very useful to have 'update by query' exposed in the new client.

Thank you!

All 72 comments

I updated the description of the issue by assigning each API a rank from 1 to 3 based on how difficult it should be to add support for it to the high-level REST client. Criterias were mainly how big the request is to serialize and how big the response is to parse back.

thanks @javanna - I've separated the APIs into "important" and "optional" lists, where optional APIs are ones that will seldom be used from applications other than monitoring applications or tests. If anybody disagrees with my selection, feel free to mention which APIs should be marked as important.

Not using the high-level REST client yet, but I would really have expected that multi-get was supported.

@javanna This might be a bloody stupid question, but: In which way does someone pick up an API and starts working on it? Without risking that someone did the same.: )

@javanna This might be a bloody stupid question, but: In which way does someone pick up an API and starts working on it? Without risking that someone did the same.: )

You add a comment here saying you are working on it.

Thanks @nik9000 !

I've picked up Create Index.

I have picked up " indices exist".

For questions related to code (how to run a test, which tests are (not) needed, do we need the async version of a method etc.) do I ask here, in a separate issue or the forums? Or something else? : )

hi @hariso it depends on the question :) Probably better to open a PR even though it is work in progress, so we can discuss your questions there. Would that work for you?

It definitely would. Thanks for the answer!

Hi, it would be very useful to have 'update by query' exposed in the new client.

Thank you!

analyze (hand)!

Hi,
Will we have "Indices listing" feature in the high level rest client? I believe this is an useful and important api to a lot of ES cluster operation tasks.

Besides that, if I'm looking for an api to check shards size of an index, is "indices stats" api designed to cover this need?

Currently I'm working on a cluster management tool and looking for a "cross-version" java rest client for ES. If there is no better choice, may be I can help to contribute these features to high level client :)

There is a major flaw in the current high level rest client in my opinion. All of the methods that perform requests are marked final. This creates a problem when trying to mock the RestHighLevelClient. Perhaps an interface could be defined that the RestHighLevelClient can implement?

@sowelie This might give you some more info: #27238. I'm using Mockito, and I had no problems mocking RestHighLevelClient, even though I did have a problem with the IndicesClient.

@hariso I ended up creating an interface for the methods I needed, and subclassing RestHighLevelClient. Though, I guess there is an extension to mockito that can be used to mock final classes / methods. The pull you referenced doesn't explain why the methods have to be final. Do you know why this is?

@sowelie some methods were made final because they are not meant to be subclassed. RestHighLevelClient is non-final as it can be extended by adding new methods to it (think of adding support for plugins that add custom endpoints to Elasticsearch). Such custom methods can use the existing internal perform* protected methods which are the core of the class itself. We don't want the core of this class to be also potentially rewritten by subclasses. To me, a big flaw would have been the other way around, to have non-final methods for no clear reason.

I hear you though on the mocking issues and I would like to understand what you do differently compared to for instance what @hariso does. Could you elaborate, and one more thing, please can you open a new issue so we can properly discuss this problem? This is not the right place as it's a meta issue where we track the progress on adding support for all the missing APIs. Thanks!

search_after support would be great.
Apologies if a new Issue is the best place for requests.

@halfninja as far as I can see search_after is already supported as part of the search API. Am I missing something?

@javanna You're absolutely right, I've found it now and not sure how I missed it before. Thanks.

I would like to know if the APIs which has the check box checked are already implemented?

@a-k-j "The ones marked as done are already supported by the high-level REST client, while the others need to be added."

Hi all,
just a small question: Isn't the UpdateSettingsRequest supposed to implement also ToXContentObject, besides the IndicesRequest.Replaceable?

@javanna The list of functionalities to be support by the HLRC contains this as well:

  • upgrade (easy)
  • upgrade status (easy)
    However, the docs mention this:

The _upgrade API is no longer useful and will be removed. Instead, see Reindex before upgrading.

Question is, do we still want them and they are up for grabs, or shall I look for work somewhere else? : D

good point @hariso , I will move those two API, I don't think we should implement them at the moment.

I've picked up Cluster Health. If somebody else is already working on it - please notify me.

I'm taking "update by query".

Update by query, reindex, and delete by query worry me because they are fairly "odd" in the way that they work. I wrote them that way because it made sense at the time but I really don't like it now. I'm fine with moving them over as they are, but I'd like to rework them so they are less confusing one day.

I have barely started with the work, so if you (and @javanna I guess) think we should postpone it, I can take something else.

I have barely started with the work, so if you (and @javanna I guess) think we should postpone it, I can take something else.

Nah, I don't think it is worth delaying integrating it to rework it. I think it might be nice for it to already be integrated before reworking it to be honest. Just understand that it is a little weird. A lot weird.

I am taking Cluster API: list tasks and get task, as they use the same request object.

@javanna @nik9000 I am confused right now. ListTasksRequest and GetTaskRequest both can request specific task by id. But in REST endpoint ListTasksRequest can't be used for this purpose. Do we need it to support backward compatibility with transport client and allow ListTasksRequest be used for task by id or better add validation which will prevent it?
I can create almost zero changes PR for this discussion if needed.

Do we need it to support backward compatibility with transport client and allow ListTasksRequest be used for task by id or better add validation which will prevent it?

@Van0SS I would not allow for this, otherwise in some cases we would end up having to send list requests to the get task endpoint, which is cumbersome. We could potentially move the taskId instance member from the base class to the relevant subclasses only but also the transport actions would need to be adapted for this, I would need to play with this and see how such change turns out.

Feedback from migrating a 2.x to 6.x client. The only thing that did not go smoothly is admin().indices() template things and one odd admin().validate().
Not sure if this is intended to vote, but if it is, I would vote for the template functions ;)

thanks for your feedback @CodingFabian

I'll pick up NodesInfo because it seems to be the last of the Important APIs that hasn't been claimed.

@tvernum I put my name on it (NodesInfo) yesterday and I have a decent amount of work started on it, it previously had my name on it but I think you might not have refreshed because you overwrote my working on it

@dnhatn @jtibshirani (and others)
Can you make sure you refresh the page before editing the description - it looks like @dakrone's claim on NodesInfo has been lost again (as has mine) and based on the edit history, it would appear that one of you two might have accidentally done so.

Heads up: I have updated the description of this issue and rearranged the list of API based on recent discussions.

I could not map High Level Rest API with Filter Query , can someone point me to right direction.

@hth please ask your questions on our forums at https://discuss.elastic.co/

Hi, I'm working on explain api as it's not picked up yet, I post here just to avoid duplication.

thanks for letting us know @PnPie and most importantly for taking another API! looking forward to your PR.

Hello, It seems that put stored script has not been picked yet. May I have a try to work on this API?

sure @johnny94 go ahead!

Hi @javanna
is there agreement to add support for document _count? If so, I can work on it.

@mrdjen I think it would be good to have @mrdjen feel free to go ahead and take it. It would be a bit different compared to other API that we already added support for as it doesn't have request and response objects in the server-side code, so we should add those to client directly.

Hi @nik9000 @hariso I was reading the conversation about the "update by query" case from March, do you have any update about it? Is that addition still on the table or it has been discarded? I just realised that this is not part of the High Level client, what's your suggestion to find a workaround for this API? I think I'll have to use the TransportClient instead. Thanks!

Hi @nik9000 @hariso I was reading the conversation about the "update by query" case from March, do you have any update about it? Is that addition still on the table or it has been discarded? I just realised that this is not part of the High Level client, what's your suggestion to find a workaround for this API? I think I'll have to use the TransportClient instead. Thanks!

I do! @sohaibiftikhar has started to look at implementing reindex, update by query, and delete by query. They kind of come as a bundle.

Personally I'd use the low level rest client to work around the high level one not supporting update by query for now. That'll continue to work for a long time. The transport client will die one day. Also, it is pretty complex to keep a transport client around just for one or two APIs and use REST for the others.

@sescotti @nik9000 Unfortunately, I didn't have time to look into update by query yet.: /

hi, @javanna
in my work when i use this client , i met some difficulties , i want the log of the es server response , just the log in the below code , i want it support MDC . but i can not find the way .

is in the org.elasticsearch.client.RestClient

    private void performRequestAsync(final long startTime, final HostTuple<Iterator<HttpHost>> hostTuple, final HttpRequestBase request,
                                     final Set<Integer> ignoreErrorCodes,
                                     final HttpAsyncResponseConsumerFactory httpAsyncResponseConsumerFactory,
                                     final FailureTrackingResponseListener listener) {
        final HttpHost host = hostTuple.hosts.next();
       ...
        client.execute(requestProducer, asyncResponseConsumer, context, new FutureCallback<HttpResponse>() {
            @Override
            public void completed(HttpResponse httpResponse) {
                try {
                    RequestLogger.logResponse(logger, request, host, httpResponse);//   i want the log of this  with MDC 
                 ........
                } catch(Exception e) {
                    listener.onDefinitiveFailure(e);
                }
            }

i use slf4j+logback and i want the code like this behind to support slf4-MDC

  String sessionId = MDC.get("SessionId");//get sessionID from current thread 
  client.execute(requestProducer, asyncResponseConsumer, context, new FutureCallback<HttpResponse>() {
            @Override
            public void completed(HttpResponse httpResponse) {
                try {
                  MDC.put("SessionId", sessionId);  // put the sessionId to the thread-pool thread so that logback logger can find it .
                    RequestLogger.logResponse(logger, request, host, httpResponse);//   i want the log of this  with MDC 
                 ........
                } catch(Exception e) {
                    listener.onDefinitiveFailure(e);
                }
            }

the way i use the client

create:

    @Bean
    public RestHighLevelClient esOriginClient() {
        List<String> hostList = Arrays.asList(host.split(","));
        HttpHost[] httpHosts = new HttpHost[hostList.size()];
        for (int i = 0; i < hostList.size(); i++) {
            httpHosts[i] = new HttpHost(hostList.get(i), port, "http");
        }
        RestClientBuilder builder = RestClient.builder(httpHosts);
        return new RestHighLevelClient(builder);
    }


use:

 SearchResponse searchResponse = client.search(request);

just the method in org.elasticsearch.client.RestHighLevelClient

  public final SearchResponse search(SearchRequest searchRequest, Header... headers) throws IOException {
        return performRequestAndParseEntity(searchRequest, Request::search, SearchResponse::fromXContent, emptySet(), headers);
    }

the way i have try to support mdc

  1. myRestClient extends RestClient but i find the RestClient cannot be extended because of the constructor is not public
  2. i try to extend org.elasticsearch.client.RequestLogger but i find it is final ,and alos the method is static .

so i want is there any way can help me? sorry for bother you .

Hi @chenchuangc! I replied to the new issue that you filed.

hi @nik9000 @hariso, I finally sorted it using a scroll search and a bulk update (after trying to use update by query I discovered it's not possible to send partial updates which is my requirement here). Thank you anyway!

REST high-level client can not be used by bulkprocessor , please add this api

@iamazy BulkProcessor is supported, see https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-document-bulk.html#java-rest-high-document-bulk-processor .

would it add the bucket_script api ?

@javanna I had tried the high-level-rest client with bulkprocessor,but got some errors, the client is TransportClient's instance not the High-Level-Rest Client instance,Is there any other class named bulkprocessor?

@iamazy the docs I linked above are for the high-level REST client. The BulkProcessor class stays the same, but it accepts a function as argument that identifies the client bulk method to be called. I suspect you are using an older version if you don't see the same.

@javanna Do you know what release of Elasticsearch you are targeting the DeleteByQuery and UpdateByQuery? I see that this APIs are already implemented on a dev fork and review request is pending for merge. Do you have any timeline as to when these APIs will be part of official Elastisearch High REST client and what version?

Any plans of porting over DeleteByQueryRequestBuilder?

I started looking at "get task" (did you get anywhere with this @Van0SS ?).
One point of uncertainty is the general HLRC policy for translating GET /foo 404s into Java-speak. The options are:
1) getFoo throws an exception
2) getFoo returns null
3) getFoo returns a FooResponse object that has an "isExists()" property to say if it is populated

Option 3 seems a little weird to Java developers yet I see that's what we do for docs with the GetResponse.java class.

@markharwood not much, you can take this task, @javanna could you remove me please from assigned list? Sorry if I held on anybody to start working on it.

One point of uncertainty is the general HLRC policy for translating GET /foo 404s into Java-speak.

Yes. I'm not sure what the right way is, but I don't think isExists() is it.

My 2 cents: As a client library user, I prefer not having to catch an exception for a situation like an entity not found. It's not really an exceptional situation, nothing extraordinary. Having to try-catch to handle such a situation makes the client code uglier then needed. On the other hand, many DB drivers simply return a null or an Optional when nothing is found.

im ++ for not throwing exceptions too. I did a random sampling of 4 things that throw 404's on the server if they are not found, and the results are almost all that they have a "exists" method of some sort. The alias one is just a bit different due to its API. Im sure there are cases where we throw an exception, but it seems that we should not be doing that.

There is a concept of a StatusToXContentObject which returns a RestStatus to the consumer. We have no standard on "what was the status of the call I made" in the codebase currently, so it might make sense to add one. Im keen to add something to the responses for this. We have roughly 15 responses that are a StatusToXContentObject, and bulk/index being some of those. These still rely on either 1) the status saved in the responses, or 2) some boolean used to say if its OK or NOT_FOUND. The latter is how get pipelines does it. The former is what is stored in get aliases response.

Im not keen on the null response. Id rather have someone do a if (status check) vs if (null check) but that could be because of my scala days. I do think an Optional would also work if we want to get a little functional, as @hariso mentions :)

My .02 would be to have either a way to say "isFound" as translated by some internal status code, or just save the rest status code internally and let the user reason about it. The former ensures we can say "well this non 200 status code is actually 'ok'", but I dont know if we have a reason for that. the latter gives the user the flexibility and foot gun. I would also be fine with an Optional.

The methods i checked

Alias - getAlias has a getException() which is validated against if there are any exceptions in the call.
Pipelines - get pipeline response is a StatusToXContentObject
Get - 404 if index not found, isExists (which is a bool set on the GetResult nested in the response set by ShardService) if doc not found
Delete watch - isFound, which is set directly by the transport action

@hub-cap One more for the list - the explain API uses the "isExists" approach too.

This looks to be where the "isExists or exception" design choice is forced. The ignores parameter can be used to declare 404 status codes are to be expected but the logic in this helper method uses the same responseConvertor.apply method for parsing both healthy responses and any "ignored" error codes - the same type of response object is returned. This steers us towards using a "FooResponse" object with an "isExists" property of some sort.
The alternative use of this method is to call without listing 404s in the ignores parameter in which case a more generic exception is thrown (ElasticsearchStatusException with status =404).

Perhaps another general "java convention" to consider @hub-cap.

How do we map potentially long-running wait_for_completion=false style REST APIs to our notion of sync and async Java calls?

I hit this trying to find a long-running task that could be used in my getTask tests. It looks like HLRC's reindex has been written without any support for returning task IDs. This means reindex and getTask can't be practically used together in HLRC. Reindex needs to find a way to offer more of the async features.
In discussions with @pgomulka we came up with this candidate general convention for mapping async REST apis to Java:

  • Foo syncFoo(...) and void asyncFoo(..., listener) would map to REST calls _without_ wait_for_completion params (the majority of our existing APIs)
  • FooTask submitFooTask(...) would map to the REST equivalents with wait_for_completion set to false. It's a synchronous call to a REST api with async features.

Does this make sense? It probably applies to more than reindex

@markharwood I think as mentioned above, I dont think throwing exceptions is the way to go, so I dont think we should be removing it from ignores. Just to reiterate the work I saw in your other review, #35166, I think the use of Optional works well here.

Also, I agree with @pgomulka and your assessment of sync/async/submit, :shipit:

Updating elastic to 5.x from 2.x needs shield security to be replaced with x-pack security and java Transport client with java Rest-High-Level client/Low-Level-Client.
Where can i find the information on how to use xpack with elasticsearch java rest client for version 5.6.2 and If no such information is present then how do i do it? Plzz help :)

@utkarsh4G Could you please ask this question on our discuss forum
Elastic uses GitHub issues for tracking work that needs to be undertaken, such as bugs and feature requests, and we use the forums for questions such as yours.

Closing in favor of #47679

Just kidding, closing in favor of #47678

Was this page helpful?
0 / 5 - 0 ratings