Kibana: Migrator uses stopped elasticsearch service.

Created on 19 Nov 2019 · 6Comments · Source: elastic/kibana

Kibana version: master

Describe the bug:
Saved objects migrator tries to use elasticsearch client even if it was closed.

Steps to reproduce:

Stop Kibana while it's migrating.
Kibana logs fatal error.

server    log   [22:30:24.057] [fatal][root] Error: Cluster client cannot be used after it has been closed.
    at ClusterClient.assertIsNotClosed (/Users/sqren/elastic/kibana/src/core/server/elasticsearch/cluster_client.ts:225:13)
    at assertIsNotClosed (/Users/sqren/elastic/kibana/src/core/server/elasticsearch/cluster_client.ts:145:10)
    at apiCaller (/Users/sqren/elastic/kibana/src/core/server/elasticsearch/retry_call_cluster.ts:43:24)
    at Observable._subscribe (/Users/sqren/elastic/kibana/node_modules/rxjs/src/internal/observable/defer.ts:59:15)
    at Observable._trySubscribe (/Users/sqren/elastic/kibana/node_modules/rxjs/src/internal/Observable.ts:238:19)
    at Observable.subscribe (/Users/sqren/elastic/kibana/node_modules/rxjs/src/internal/Observable.ts:219:14)
    at RetryWhenSubscriber.notifyNext (/Users/sqren/elastic/kibana/node_modules/rxjs/src/internal/operators/retryWhen.ts:112:17)
    at InnerSubscriber._next (/Users/sqren/elastic/kibana/node_modules/rxjs/src/internal/InnerSubscriber.ts:17:17)
    at InnerSubscriber.Subscriber.next (/Users/sqren/elastic/kibana/node_modules/rxjs/src/internal/Subscriber.ts:99:12)
    at MergeMapSubscriber.notifyNext (/Users/sqren/elastic/kibana/node_modules/rxjs/src/internal/operators/mergeMap.ts:162:22)
    at InnerSubscriber._next (/Users/sqren/elastic/kibana/node_modules/rxjs/src/internal/InnerSubscriber.ts:17:17)
    at InnerSubscriber.Subscriber.next (/Users/sqren/elastic/kibana/node_modules/rxjs/src/internal/Subscriber.ts:99:12)
    at AsyncAction.dispatch (/Users/sqren/elastic/kibana/node_modules/rxjs/src/internal/observable/timer.ts:91:14)
    at AsyncAction._execute (/Users/sqren/elastic/kibana/node_modules/rxjs/src/internal/scheduler/AsyncAction.ts:122:12)
    at AsyncAction.execute (/Users/sqren/elastic/kibana/node_modules/rxjs/src/internal/scheduler/AsyncAction.ts:97:24)
    at AsyncScheduler.flush (/Users/sqren/elastic/kibana/node_modules/rxjs/src/internal/scheduler/AsyncScheduler.ts:58:26)

Expected behavior:
Migrator cannot do much without connection.
retryCallCluster should stop polling if SavedObjects service was stopped.
https://github.com/restrry/kibana/blob/41b49358e6626809d0ed479c53aa711bb430bbbe/src/core/server/saved_objects/saved_objects_service.ts#L108

New Platform Core bug

Source

restrry

Most helpful comment

Solved.

For anyone else that stumbles on this error.

If you've got:

logging.quiet: true
logging.verbose: false

then you will not see the errors out of Kibana after an interrupted auto-migration:

Dec 13 11:05:34 hostname kibana[28288]: {"type":"log","@timestamp":"2019-12-13T00:05:34Z","tags":["warning","migrations"],"pid":28288,"message":"Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_3 and restarting Kibana."}

It's tagged as a warning, but I think should be tagged as an error - as it prevents Kibana from starting.

As noted, you need to delete the identified .kibana_? index, and restart Kibana -- this worked for me.

jedd on 13 Dec 2019

👍2

All 6 comments

Pinging @elastic/kibana-platform (Team:Platform)

elasticmachine on 19 Nov 2019

Are there some recovery or mitigation suggestions around this please?

jedd on 11 Dec 2019

@jedd As far as we understand, this shouldn't cause any real problems, just some noise in your log messages.

This should only happen if Kibana is terminated before it's had the chance to setup and/or upgrade its index in Elasticsearch. This should only happen if Kibana is never able to connect to Elasticsearch before it is shutdown.

If you're seeing this error in another situation or this is causing other issues, please let us know.

joshdover on 11 Dec 2019

Thanks @joshdover

I've just done an upgrade from 7.1 to 7.5 on a single node ELK machine, keeping logstash at 7.1, but upgrading E & K. 30GB (on disk) of elastic data - but Kibana seemed to stall during the upgrade, so I stopped it - apparently that was my first mistake - and then saw this error repeatedly while trying to get elastic and kibana coming back.

Using http://localhost:9200/_flush/synced I saw it was choking on one index, which I deleted, but it was being quickly recreated by a logstash pipeline - kibana logs were showing that above error on each retry, so I've stopped logstash altogether.

elastic and kibana are both now running - for several minutes - but 'Kibana server is not ready yet' persists.

flush/synced isn't showing any errors, nor are journalctl messages for elastic or kibana.

cluster (of one host) is green, all indices are green, nothing untoward in gc.log or cluster-name.log.

Previously Kibana would be UI responsive within 2m of startup on this machine, but it's been 15 minutes now.

jedd on 12 Dec 2019

Update

Starting kibana .. eventually shows up an error like

Dec 12 10:36:50 8117dip000es001 kibana[16719]: {"type":"log","@timestamp":"2019-12-11T23:36:50Z","tags":["fatal","root"],"pid":16719,"message":"Error: Cluster client cannot be used after it has been closed.\n at ClusterClient.assertIsNotClosed (/usr/share/kibana/src/core/server/elasticsearch/cluster_client.js:187:13)\n at ClusterClient._defineProperty (/usr/share/kibana/src/core/server/elasticsearch/cluster_client.js:108:12)\n at /usr/share/kibana/src/core/server/elasticsearch/retry_call_cluster.js:50:35\n at Observable._subscribe (/usr/share/kibana/node_modules/rxjs/internal/observable/defer.js:10:21)\n at Observable._trySubscribe (/usr/share/kibana/node_modules/rxjs/internal/Observable.js:43:25)\n at Observable.subscribe (/usr/share/kibana/node_modules/rxjs/internal/Observable.js:29:22)\n at RetryWhenOperator.call (/usr/share/kibana/node_modules/rxjs/internal/operators/retryWhen.js:28:23)\n at Observable.subscribe (/usr/share/kibana/node_modules/rxjs/internal/Observable.js:24:22)\n at /usr/share/kibana/node_modules/rxjs/internal/Observable.js:93:19\n at new Promise (<anonymous>)\n at Observable.toPromise (/usr/share/kibana/node_modules/rxjs/internal/Observable.js:91:16)\n at /usr/share/kibana/src/core/server/elasticsearch/retry_call_cluster.js:59:11\n at fetchInfo (/usr/share/kibana/src/core/server/saved_objects/migrations/core/elastic_index.js:52:24)\n at Object.migrationsUpToDate (/usr/share/kibana/src/core/server/saved_objects/migrations/core/elastic_index.js:173:29)\n at requiresMigration (/usr/share/kibana/src/core/server/saved_objects/migrations/core/index_migrator.js:84:37)\n at isMigrated (/usr/share/kibana/src/core/server/saved_objects/migrations/core/index_migrator.js:51:24)"}

Consulting _flush/synced I note that:

,".monitoring-es-7-2019.12.11":{"total":1,"successful":0,"failed":1,"failures":[{"shard":0,"reason":"pending operations","routing":{"state":"STARTED","primary":true,"node":"6KEwCkAvTT6m0d-2_1Kcog","relocating_node":null,"shard":0,"index":".monitoring-es-7-2019.12.11","allocation_id":{"id":"bROlTU1aTzOjqufVoBmTwQ"

(all other indices are successful:1, failed:0)

So it looks like 'it' is failing with indices that are actively being updated. Weirdly, watching 'to'p the box (both elastic and kibana processes) are both extremely quiescent after start -- no cpu, no i/o activity to speak of.

I subsequently successfully ran a large remote ksql query talking back into this host - elasticsearch certainly seems to be happy.