Jaeger: query error after run es_indices_clean.sh

Created on 26 Dec 2017  ·  16Comments  ·  Source: jaegertracing/jaeger

jaeger version: cba413ede300564e6f92a16f812604e4a829d0c7
storage: elasticsearch 6.1.1, Build: bd92e7f/2017-12-17T20:23:25.338Z, JVM: 9.0.1

It's work before clean.

My operation process:

⋊> ~/opt ./es_indices_clean.sh 0 localhost:9200                                                                                                                                                             
Installing python dependencies required for curator...
Requirement already satisfied: elasticsearch in /usr/local/lib/python2.7/dist-packages
Requirement already satisfied: elasticsearch-curator in /usr/local/lib/python2.7/dist-packages
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /usr/local/lib/python2.7/dist-packages (from elasticsearch)
Requirement already satisfied: click>=6.7 in /usr/local/lib/python2.7/dist-packages (from elasticsearch-curator)
Requirement already satisfied: pyyaml>=3.10 in /usr/local/lib/python2.7/dist-packages (from elasticsearch-curator)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python2.7/dist-packages (from elasticsearch-curator)
Requirement already satisfied: voluptuous>=0.9.3 in /usr/local/lib/python2.7/dist-packages (from elasticsearch-curator)

Removing jaeger-service-2017-12-26
Removing jaeger-span-2017-12-26
⋊> ~/opt curl -XGET 'http://localhost:9200/_cat/indices'          # generate new data                                                                                                                                          
yellow open jaeger-service-2017-12-26 sMNJrlcYQGqMDp4AUrLlWA 5 1  2 0   9.9kb   9.9kb
yellow open jaeger-span-2017-12-26    wBc1qghHQ4uo0XobS7tKgA 5 1 27 0 263.7kb 263.7kb

Then visit http://local:16686/search, get error:

There was an error querying for traces:
HTTP Error: Search service failed: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]

Status | 500
Status text | Internal Server Error
URL | /api/services
Response body | {   "data": null,   "total": 0,   "limit": 0,   "offset": 0,   "errors": [     {       "code": 500,       "msg": "Search service failed: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]"     }   ] }
storage storagelasticsearch

Most helpful comment

I have seen the same problem with bad mappings. You will need to either:

A. Shutdown all jaeger collectors, delete the bad indexes, then restart the collectors and lose the data or
B. Create new ElasticSearch index templates and trigger a reindex to a new index and delete the bad ones after it completes. The index mappings for the template can be extracted from a working index.

All 16 comments

Today i tried Release 1.2.0, get the same error.

@black-adder could it be the timezone issue? The ticket was booked Dec 26, 2017, so I assume the script was run on the same date, so it looks strange that it removed these indices:

Removing jaeger-service-2017-12-26
Removing jaeger-span-2017-12-26

ill try to reproduce

@meilihao You're deleting the currently-active index. This removes the index's mapping, which is created by the collectors on start-up. This issue explains it in a bit more detail: https://github.com/jaegertracing/jaeger/issues/374

This would be due to elasticsearch trying to figure out the mappings on its own, since we aren't checking if the index is their post startup. Since we can't aggregate on text fields without enabling fielddata this would break the query searches.

The only way to avoid this is by not deleting an index that's still being used - since you're posting on Dec 26th, and you deleted two indices marked 2017-12-26, I assume that's the case. (ie. never run ./es_indices_clean.sh with a parameter of 0, perhaps the script should warn about this)

I have the same issue:

/api/services
 service failed: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]"

I just updated es from 6.2 to 6.3, and cleaned the old data.

image

Hi, any progress here? -- I have the same issue.
What I did:
(running jaeger, version 1.4.1 from helm chart https://hub.kubeapps.com/charts/incubator/jaeger/0.7.0)

  1. Delete indices jaeger-* from ES
  2. See jaeger is not working -- mentioned error
  3. helm delete --purge -- delete whole jaeger from k8s
  4. deploy again
  5. See jaeger is not working -- mentioned error
  • no error logs:
pojo@local:~$ kubectl logs -f pod/jaeger-query-665984d7b9-m9rcj -n monitoring
{"level":"info","ts":1538725711.235911,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":16687,"status":"unavailable"}
{"level":"info","ts":1538725711.3213375,"caller":"query/main.go:180","msg":"Archive storage not created","reason":"Archive storage not supported"}
{"level":"info","ts":1538725711.321707,"caller":"query/main.go:127","msg":"Registering metrics handler with HTTP server","route":"/metrics"}
{"level":"info","ts":1538725711.321767,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1538725711.3218033,"caller":"query/main.go:136","msg":"Starting jaeger-query HTTP server","port":16686}
^C
pojo@local:~$ kubectl logs -f pod/jaeger-collector-7889bd4c49-v2ng9 -n monitoring
{"level":"info","ts":1538725707.0003345,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":14269,"status":"unavailable"}
{"level":"info","ts":1538725707.1209114,"caller":"static/strategy_store.go:76","msg":"No sampling strategies provided, using defaults"}
{"level":"info","ts":1538725707.121135,"caller":"collector/main.go:142","msg":"Registering metrics handler with HTTP server","route":"/metrics"}
{"level":"info","ts":1538725707.121202,"caller":"collector/main.go:150","msg":"Starting Jaeger Collector HTTP server","http-port":14268}
{"level":"info","ts":1538725707.1212277,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1538725707.1555748,"caller":"collector/main.go:207","msg":"Listening for Zipkin HTTP traffic","zipkin.http-port":9411}

indices in ES:

health status index                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   jaeger-span-2018-10-05          C4zjEF33Sn6i3bXH5QkTow   5   1     259767            0     35.4mb         17.7mb
green  open   jaeger-service-2018-10-05       7tQAzJT7S2GoxZoc-1I_QA   5   1         53           14    241.7kb        119.7kb

It was not fixed after date change as suggested by @dmitrygusev here: https://github.com/jaegertracing/jaeger/issues/374#issuecomment-362265880

I have seen the same problem with bad mappings. You will need to either:

A. Shutdown all jaeger collectors, delete the bad indexes, then restart the collectors and lose the data or
B. Create new ElasticSearch index templates and trigger a reindex to a new index and delete the bad ones after it completes. The index mappings for the template can be extracted from a working index.

@mikelduke, @sta-szek, @oiooj are you able to provide a reproducer?

This issue is related to #374.

Steps to reproduce:

Start jaeger

docker run -it --rm -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "xpack.security.enabled=false" --name=elasticsearch  docker.elastic.co/elasticsearch/elasticsearch:5.6.10
SPAN_STORAGE_TYPE=elasticsearch go run -tags ui ./cmd/all-in-one/main.go 
  1. Generate spans via Jaeger UI or hotrod
  2. Remove all indices python plugin/storage/es/esCleaner.py 0 localhost:9200
  3. refresh Jaeger UI - it should show HTTP Error: Search service failed: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception].

I have fixed this by creating an index template

curl -ivX PUT -H "Content-Type: application/json" localhost:9200/_template/span  -d @./plugin/storage/es/mappings/jaeger-span.json
curl -ivX PUT -H "Content-Type: application/json" localhost:9200/_template/service  -d @./plugin/storage/es/mappings/jaeger-service.json

I think jaeger-collector should create the template on startup. The create index would just create an index and mapping would be derived from the template stored in ES. Maybe we could even omit creating the index and it would be created automatically once data is inserted.

I fixed it for my use by creating a template from an exported copy of a working index.

It would be great to see either Jaeger create the template on startup, or allow for exporting the template from the binary using the command line. This would allow for template creation using different users or a separate ci process.

I think I will submit a PR where the collector creates a template at startup.

A command for generating the template is an interesting idea. Rollover script could also make use it. We could talk about it in a separate issue.

is creating a template idempotent? There are many collectors starting.

@pavolloffay
I'm running into this issue. i do have 1.14 and later but i'm using jaeger-ingester to write to ES.
Does the fix also need to go in there ?

thx @mikelduke ! , your post helped me solve this weird error that was, at least for me, likely caused by the ES being recreated but collectors werent reset/restarted. and me then spending few hours to debug until I stumbled over your precious hint ❤️ and noticed the index templates were missing !!
I restored them based on https://github.com/jaegertracing/jaeger/tree/master/plugin/storage/es/mappings

Was this page helpful?
0 / 5 - 0 ratings