Cloud-on-k8s: Beat might not be able to start after update

Created on 20 Jul 2020 · 6Comments · Source: elastic/cloud-on-k8s

When:

Beat is already running, and
Beat is deployed as Deployment, and
Beat is updated, and
New Pod lands on the same Node as old Pod, and
the timing is right, then

new Beat will keep crashing with the below.

ERROR   instance/beat.go:958    Exiting: data path already locked by another beat. Please make sure that multiple beats are not sharing the same data path (path.data).

(We want the new Pod to use the same path to preserve Beat identity.)

What _could_ be a solution is to allow modifying maxUnavailable for deployment update strategy, but we don't expose that currently. Right now, the paths to unblock would be:

remove ReplicaSet that contains old Pods (requires manual intervention), or
use emptyDir for beat-data volume (Beat won't preserve its identity):

volumes:
- name: beat-data
  emptyDir: {}

>bug v1.3.0

Source

david-kow

All 6 comments

What could be a solution is to allow modifying maxUnavailable for deployment update strategy, but we don't expose that currently

I think I'd be +1 for adding a Strategy appsv1.DeploymentStrategy field in the CRD: https://github.com/elastic/cloud-on-k8s/blob/096607c6ffa2d672e6849355f6a2d2f7a0482e5a/pkg/apis/beat/v1beta1/beat_types.go#L78-L81

sebgl on 20 Jul 2020

👍1

What _could_ be a solution is to allow modifying maxUnavailable for deployment update strategy, but we don't expose that currently.

Would you not rather want to change the type of the DeploymentStrategy to Recreate than changing maxUnavailable?

pebrc on 14 Aug 2020

👍1

What _could_ be a solution is to allow modifying maxUnavailable for deployment update strategy, but we don't expose that currently.

Would you not rather want to change the type of the DeploymentStrategy to Recreate than changing maxUnavailable?

Wouldn't that cause all Pods to be deleted before any new ones appear? It seems we lose a lot of rollout safety with that approach - if the config is bad, user is left with no Beats until correct config is supplied (vs the current behavior where only a single Beat is affected). But when I think about it, this seems to be the only way to guarantee avoiding the issue - even with maxUnavailable the next Pod might hit the same issue.

david-kow on 17 Aug 2020

Any chance we can also have an option to remove/override the host mount? Deployments are not bound to nodes, so storing state in a host mount doesn't always make sense. What if multiple replicas are scheduled on the same node?

anders-cognite on 20 Aug 2020

@anders-cognite you already have full control over the podTemplate and are able to change mounts to your liking.

pebrc on 20 Aug 2020

I tried overriding volumeMounts but that gave duplication errors. I see now that you do de-duplication of the volumes, so that works. Thanks!

I guess it would be nice to be able to also override the volumeMount options too, but that's not as important.

anders-cognite on 20 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Expose the Kibana keystore

nkvoll · 4Comments

xpack.security.enabled is not user configurable

Pandoraemon · 5Comments

TestUpdateKibanaSecureSettings is flaky

sebgl · 5Comments

Status subresource updates fail when the crd version changes

sebgl · 3Comments

Support multi-namespace watches

sebgl · 3Comments