Elasticsearch version: elasticsearch_5.0.0-rc1_all.deb
Plugins installed: none
JVM version:
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
OS version: Debian 8
Description of the problem including expected versus actual behavior: performance while indexing are dramatically reduced by a factor 1000 after upgrading from beta1 to RC1. Downgrading and everything is back on track.
Steps to reproduce:
can you provide more information, how do you index? Our indexing perf benchmarks look just fine. I really wonder what you are doing can you share some infos otherwise I can't do anything here
I'm using the ES JS lib and doing something like that on a distant host:
client.index({
index: 'myIndex',
type: 'event',
date: timestamp,
body: {
date: timestamp,
customerEmail: user.email,
customerId: +user.id,
role: role ? role : null
}
})
Note that I created the client using this:
const client = new elasticsearch.Client({
host: 'https://user:[email protected]:443',
apiVersion: 'master',
maxSockets: 100,
requestTimeout: 1800000
})
do you have any logs provided? I mean I just ran a simple benchmark and everything seems fine?
No output in the syslog while indexing, nor errors reported. Did I missed something?
Anyway, I just saw that my docs.count contains only 181 records with the RC1. However, in beta, I got all record imported successfully (82917 records).
edit: I destroy and recreate the index for each import, using exactly the same map everytime. The JS code doesn't change, only switching from a ES version to another leads to very different behavior.
can you try to reduce this to a contained testcase without your javascript client somehow? I mean it's really hard to get down to the bottom since it seems you are going through some kind of proxy too?
Just Nginx as a reverse proxy, but nothing changes while switching to both versions of ES: that's the only "moving" part. I'll take a look at Nginx logs if I can see anything.
I can't inject massively this set of data without JS, but if you have an example bash script with any data that I could use to make tests, I'll be able to execute it locally.
I can't inject massively this set of data without JS, but if you have an example bash script with any data that I could use to make tests, I'll be able to execute it locally.
We use Rally for benchmarking.
Please ensure you have the prerequisites installed. You can then set it up with:
git clone https://github.com/elastic/rally.git
cd rally
./rally
After the initial setup, you can run:
./rally --pipeline=benchmark-only --target-hosts=elastic.mydomain.com:443 --client-options="use_ssl:true,verify_certs:true,basic_auth_user:'INSERT_USERNAME_HERE',basic_auth_password:'INSERT_YOUR_PASSWORD_HERE'"
It will index a document corpus of about 2 GB against the specified cluster and will report metrics afterwards.
Okay, will do and report when it's done.
I can't use rally, because everytime I got:
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'illegal_argument_exception', 'wait_for_relocating_shards has been removed, use wait_for_no_relocating_shards [true/false] instead')
elasticsearch WARNING GET /_cluster/health?timeout=3s&wait_for_status=green&wait_for_no_relocating_shards=True [status:408 request:3.003s]
2016-10-11 14:25:57,887 rally.driver ERROR Client error in health API. Using 'wait_for_no_relocating_shards'.
Is there an option I missed?
Does Rally abort the benchmark or do you just see these in the log? Rally probes internally to find the right API call. So these warnings are ok.
Rally, however does abort the benchmark if the cluster does not reach health status green. If you have existing indices on this cluster this may be the case. If that is your problem you can add --cluster-health=yellow or --cluster-health=red (depending on which cluster health you'd expect) to the command line options.
Aborting, but I'll try with the yellow/red to see if it changes anything.
edit: it starts! (with yellow, I don't have a replica anywhere)
edit: it starts! (with yellow, I don't have a replica anywhere)
Great! I guess you have some index already in the cluster because the indices that Rally creates for benchmarks have zero replicas to ensure a "green" cluster state. Just post here, if you have any further issues (but I don't expect that much problems as the benchmarks already run..).
It takes ages but it runs (I'm at running scroll step)
@olivierlambert I see in your original JS snippet date: timestamp,:
client.index({
index: 'myIndex',
type: 'event',
date: timestamp, // <----- Here
body: {
date: timestamp,
customerEmail: user.email,
customerId: +user.id,
role: role ? role : null
}
})
Elasticsearch doesn't accept "date" as a parameter to the indexing request. The recent changes to Strict URL Parsing (also in Release Notes) mean that parameter will now be rejected, the document won't be indexed, and you'll get an error.
If you're determining throughput by the number of documents that are indexed, that's probably why :)
_Caveat that I'm not super familiar with the JS client, but I don't think that's a special flag they've enabled_
Ah that makes sense. Let me try that!
Got 309 hits (better than 181 before) but still no cigar.
I'll dig on my JS request to be sure I don't have any error returned by the callback/promise.
On the other hand, do you have a shorter benchmark that I can use than default rally one that take hours?
Okay found something: without the callback used in the client, it generated my issue.
I added the callback in the index request, and now it works normally. I suppose ES RC1 added some restriction on how it receives requests (as @polyfractal put me on that track), and rejects a lot of them.
So to recap: working JS clients on beta1 appeared broken with non-compliant fields and/or missing callbacks on rc1.
Renamed the issue. Thanks a lot @polyfractal , @danielmitterdorfer and @s1monw for your assistance!
Np, happy to help :) You should open a ticket at the ES-JS repo and see if it's an issue with the client, misconfiguration, etc.
Will do! (just after I find 5 minutes to do so ^^ )
Most helpful comment
@olivierlambert I see in your original JS snippet
date: timestamp,:Elasticsearch doesn't accept "date" as a parameter to the indexing request. The recent changes to Strict URL Parsing (also in Release Notes) mean that parameter will now be rejected, the document won't be indexed, and you'll get an error.
If you're determining throughput by the number of documents that are indexed, that's probably why :)
_Caveat that I'm not super familiar with the JS client, but I don't think that's a special flag they've enabled_