Cloud-on-k8s: Could not update cluster license: failed to revert to basic

Created on 16 Dec 2019  路  5Comments  路  Source: elastic/cloud-on-k8s

By debugging an E2E test, I noticed there are several events with the message: Could not update cluster license: failed to revert to basic:

# es-apm-sample-vctm
Could not update cluster license: failed to revert to basic: 
  Post https://es-apm-sample-vctm-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.250.109:9200: connect: connection refused
# test-failure-kill-a-data-node-htzl
Could not update cluster license: failed to revert to basic: 
  Post https://test-failure-kill-a-data-node-htzl-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.247.252:9200: connect: no route to host
# test-failure-kill-a-master-node-xkv6
Could not update cluster license: failed to revert to basic: 
  Post https://test-failure-kill-a-master-node-xkv6-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.246.127:9200: connect: no route to host
# test-failure-delete-services-rqgn
Could not update cluster license: failed to revert to basic: 
  Post https://test-failure-delete-services-rqgn-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.253.246:9200: connect: connection timed out
# force-upgrade-pending-sset-brjf
Could not update cluster license: failed to revert to basic: 
  503 Service Unavailable: 
# test-es-keystore-zcpq
Could not update cluster license: failed to revert to basic: 
  Post https://test-es-keystore-zcpq-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.249.91:9200: connect: connection refused
# test-es-keystore-zcpq
Could not update cluster license: failed to revert to basic: 
  503 Service Unavailable: 
# test-mutation-mdi-to-dedicated-ct9l
Could not update cluster license: failed to revert to basic: 
  Post https://test-mutation-mdi-to-dedicated-ct9l-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.243.174:9200: connect: connection refused
# test-mutation-less-nodes-jt9x
Could not update cluster license: failed to revert to basic: 
  Post https://test-mutation-less-nodes-jt9x-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.243.138:9200: connect: connection timed out
# test-mutation-resize-memory-up-xjq2
Could not update cluster license: failed to revert to basic: 
  Post https://test-mutation-resize-memory-up-xjq2-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.244.206:9200: connect: connection refused
# test-mutation-resize-memory-up-xjq2
Could not update cluster license: failed to revert to basic: 
  Post https://test-mutation-resize-memory-up-xjq2-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.244.206:9200: connect: no route to host
# test-mutation-resize-mem-down-chnv
Could not update cluster license: failed to revert to basic: 
  503 Service Unavailable: NodeNotConnectedException[[test-mutation-resize-mem-down-chnv-es-masterdata-2][10.84.0.32:9300] Node not connected]
# test-mutation-resize-mem-down-chnv
Could not update cluster license: failed to revert to basic: 
  503 Service Unavailable: 

This seems suspicious.

>flaky_test

Most helpful comment

This is still happening as recently as May 29, so keeping this open.

All 5 comments

Do you have more context: which version of Elasticsearch for example were we testing here?
I am asking because:

  • this step should only run if we consider Elasticsearch reachable
  • on 6.x clusters we know that ES might return 503 during master election (which would be OK'ish)

Independently:

  • there is room for optimization as we are currently calling start basic on every reconciliation run if no other license is configured for a given cluster.

These events comes from the dump of a 7.1.1 stage of the cloud-on-k8s-stack job. So most clusters were in 7.1.1, some in 7.5.

I wonder if a lot of the connection errors will be resolved with https://github.com/elastic/cloud-on-k8s/pull/2360

This is still happening as recently as May 29, so keeping this open.

It didn't happen again for a while. Let's close and reopen if this happens again.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sebgl picture sebgl  路  3Comments

pebrc picture pebrc  路  3Comments

barkbay picture barkbay  路  4Comments

nkvoll picture nkvoll  路  4Comments

pebrc picture pebrc  路  3Comments