By debugging an E2E test, I noticed there are several events with the message: Could not update cluster license: failed to revert to basic:
# es-apm-sample-vctm
Could not update cluster license: failed to revert to basic:
Post https://es-apm-sample-vctm-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.250.109:9200: connect: connection refused
# test-failure-kill-a-data-node-htzl
Could not update cluster license: failed to revert to basic:
Post https://test-failure-kill-a-data-node-htzl-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.247.252:9200: connect: no route to host
# test-failure-kill-a-master-node-xkv6
Could not update cluster license: failed to revert to basic:
Post https://test-failure-kill-a-master-node-xkv6-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.246.127:9200: connect: no route to host
# test-failure-delete-services-rqgn
Could not update cluster license: failed to revert to basic:
Post https://test-failure-delete-services-rqgn-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.253.246:9200: connect: connection timed out
# force-upgrade-pending-sset-brjf
Could not update cluster license: failed to revert to basic:
503 Service Unavailable:
# test-es-keystore-zcpq
Could not update cluster license: failed to revert to basic:
Post https://test-es-keystore-zcpq-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.249.91:9200: connect: connection refused
# test-es-keystore-zcpq
Could not update cluster license: failed to revert to basic:
503 Service Unavailable:
# test-mutation-mdi-to-dedicated-ct9l
Could not update cluster license: failed to revert to basic:
Post https://test-mutation-mdi-to-dedicated-ct9l-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.243.174:9200: connect: connection refused
# test-mutation-less-nodes-jt9x
Could not update cluster license: failed to revert to basic:
Post https://test-mutation-less-nodes-jt9x-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.243.138:9200: connect: connection timed out
# test-mutation-resize-memory-up-xjq2
Could not update cluster license: failed to revert to basic:
Post https://test-mutation-resize-memory-up-xjq2-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.244.206:9200: connect: connection refused
# test-mutation-resize-memory-up-xjq2
Could not update cluster license: failed to revert to basic:
Post https://test-mutation-resize-memory-up-xjq2-es-http.e2e-h5jfv-mercury.svc:9200/_license/start_basic?acknowledge=true: dial tcp 10.87.244.206:9200: connect: no route to host
# test-mutation-resize-mem-down-chnv
Could not update cluster license: failed to revert to basic:
503 Service Unavailable: NodeNotConnectedException[[test-mutation-resize-mem-down-chnv-es-masterdata-2][10.84.0.32:9300] Node not connected]
# test-mutation-resize-mem-down-chnv
Could not update cluster license: failed to revert to basic:
503 Service Unavailable:
This seems suspicious.
Do you have more context: which version of Elasticsearch for example were we testing here?
I am asking because:
Independently:
These events comes from the dump of a 7.1.1 stage of the cloud-on-k8s-stack job. So most clusters were in 7.1.1, some in 7.5.
I wonder if a lot of the connection errors will be resolved with https://github.com/elastic/cloud-on-k8s/pull/2360
This is still happening as recently as May 29, so keeping this open.
It didn't happen again for a while. Let's close and reopen if this happens again.
Most helpful comment
This is still happening as recently as May 29, so keeping this open.