Cloud-on-k8s: Beat might not be able to start after update

Created on 20 Jul 2020  路  6Comments  路  Source: elastic/cloud-on-k8s

When:

  • Beat is already running, and
  • Beat is deployed as Deployment, and
  • Beat is updated, and
  • New Pod lands on the same Node as old Pod, and
  • the timing is right, then

new Beat will keep crashing with the below.

ERROR   instance/beat.go:958    Exiting: data path already locked by another beat. Please make sure that multiple beats are not sharing the same data path (path.data).

(We want the new Pod to use the same path to preserve Beat identity.)

What _could_ be a solution is to allow modifying maxUnavailable for deployment update strategy, but we don't expose that currently. Right now, the paths to unblock would be:

  • remove ReplicaSet that contains old Pods (requires manual intervention), or
  • use emptyDir for beat-data volume (Beat won't preserve its identity):
volumes:
- name: beat-data
  emptyDir: {}
>bug v1.3.0

All 6 comments

What could be a solution is to allow modifying maxUnavailable for deployment update strategy, but we don't expose that currently

I think I'd be +1 for adding a Strategy appsv1.DeploymentStrategy field in the CRD: https://github.com/elastic/cloud-on-k8s/blob/096607c6ffa2d672e6849355f6a2d2f7a0482e5a/pkg/apis/beat/v1beta1/beat_types.go#L78-L81

What _could_ be a solution is to allow modifying maxUnavailable for deployment update strategy, but we don't expose that currently.

Would you not rather want to change the type of the DeploymentStrategy to Recreate than changing maxUnavailable?

What _could_ be a solution is to allow modifying maxUnavailable for deployment update strategy, but we don't expose that currently.

Would you not rather want to change the type of the DeploymentStrategy to Recreate than changing maxUnavailable?

Wouldn't that cause all Pods to be deleted before any new ones appear? It seems we lose a lot of rollout safety with that approach - if the config is bad, user is left with no Beats until correct config is supplied (vs the current behavior where only a single Beat is affected). But when I think about it, this seems to be the only way to guarantee avoiding the issue - even with maxUnavailable the next Pod might hit the same issue.

Any chance we can also have an option to remove/override the host mount? Deployments are not bound to nodes, so storing state in a host mount doesn't always make sense. What if multiple replicas are scheduled on the same node?

@anders-cognite you already have full control over the podTemplate and are able to change mounts to your liking.

I tried overriding volumeMounts but that gave duplication errors. I see now that you do de-duplication of the volumes, so that works. Thanks!

I guess it would be nice to be able to also override the volumeMount options too, but that's not as important.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nkvoll picture nkvoll  路  4Comments

Pandoraemon picture Pandoraemon  路  5Comments

sebgl picture sebgl  路  5Comments

sebgl picture sebgl  路  3Comments

sebgl picture sebgl  路  3Comments