Cloud-on-k8s: Should we get rid of most finalizers?

Created on 18 Oct 2019 · 15Comments · Source: elastic/cloud-on-k8s

Forking the discussion from https://github.com/elastic/cloud-on-k8s/pull/2000#discussion_r336373978.

Finalizers are a great way to ensure we properly garbage-collect some things related a particular resource. For example, by setting our keystore finalizer to the Elasticsearch resource, we ensure the operator removes all dynamic watches on user-provided keystore secrets, before the apiserver can safely delete the Elasticsearch resource.

I think this is especially useful when finalizers trigger an operation on an external system. For example, when dealing with filesystem volumes or external databases.

The major drawback of the finalizer approach is that you cannot delete a resource when the operator is not running. For example, running kubectl delete elasticsearch es-sample will hang forever until the operator runs again. Which can be confusing for many users (especially if they are uninstalling ECK). An additional minor concern is that it requires updating the corresponding resource with a field that must be backward-compatible with next ECK versions (one more thing for which we need to maintain backward-compatibility).

In our situation, I think all finalizers we have so far are used for cleaning some in-memory state. For example:

don't keep an observer goroutine running if the cluster does not exist anymore
don't continue watching the user secret if the cluster does not exist anymore

I think (it's hard to be 100% sure?) the apiserver/client-go implementation we use does guarantee that we do not miss a deletion event (an interesting comment in the controller-runtime project). Of course, if the operator is not running, then the deletion event is missed.

So either:

the operator is running. If an ES resource is deleted, we will get a deletion event and reconcile that deleted resource. At which point we notice the resource does not exist, so we can cleanup our in-memory state. Note that any cleanup function must not return any error here, because we will never reconcile the ES resource again.
the operator is not running. If a cluster is deleted, we won't notice. But it's fine, since there is no in-memory state to garbage collect anyway.

>enhancement

Source

sebgl

👍1

All 15 comments

I asked on #kubebuilder (Slack channel), it seems like we can indeed not worry about missing deletion events: https://kubernetes.slack.com/archives/CAR30FCJZ/p1571410760147500

sebgl on 23 Oct 2019

🎉1

Would be nice to look deeper into how the watches are handled and what kind of guarantee we have that we do not miss an event.
Worst case scenario if we miss the deletion event: the operator uses more memory than it should, which is not super bad.

We discussed this issue with the team: we should do it.

sebgl on 31 Oct 2019

Finalizers are used for 2 main purposes in ECK:

To remove some watches or stop the observation of a cluster
To delete the Secrets used by the APM Server or Kibana to setup users in Elasticsearch.

This last case is a little bit tricky to manage because the Secret is created in the Elasticsearch namespace and thus is owned by Elasticsearch, not by the APM Server or Kibana. The reason of that choice was the following limitation:

Namespace-scoped dependents can only specify owners in the same namespace (ref.)

As a consequence if we remove the Finalizer the Secrets will not be deleted anymore.

If we want to handle the case where a Delete event is received but is not correctly handled by the operator _(e.g. in case of a bug, or if the operator is stopped right after the event is received)_ the only solution I see is to have our own GC. This could just be a go routine, regularly listing the user secrets, watching for any orphaned secret.

Thoughts ?

barkbay on 27 Nov 2019

Could we garbage collect those Secrets as part of the "normal" reconciliation done by the association controllers?

The Kibana-ES association controller responsible for creating the secret is triggered automatically on any change in Kibana/ES, and also when the operator (re)starts. As such, it could inspect and garbage collect any useless association secret without requiring a separate concurrent goroutine.

_Edit_: this would require watching all ES resources, not only the ones having an association set up.

_Edit_ (again): actually this cannot work since we're enqueuing reconciliations based on the Kibana NamespacedName, which we won't get from the unassociated Elasticsearch resource. Sorry for the noise, I'm still leaving this comment here just in case.

sebgl on 27 Nov 2019

I agree with @barkbay's comments. We have enough metadata on the user secret created by the association controllers to do a fairly generic 'garbage collection' in the association controller itself

Fetching all user secrets that have been created by the association controller
Check if the Kibana/APM server referenced still exists
If not, delete the user secret

pebrc on 27 Nov 2019

Just to illustrate a case where the association controller will not delete a remaining secret:

Inital state:
- Elasticsearch in namespace default
- Kibana in namespace ns1
- Operator is stopped
User deletes Kibana:
- All the resources in namespace ns1 are removed by the K8S api/controller
- secret/ns1-kb-sample-kibana-user remains in the Elasticsearch namespace
Operator is restarted
- association controller does not reconcile because there's no resource to watch
- secret/ns1-kb-sample-kibana-user still remains in the Elasticsearch namespace

I think we have 2 solutions here:

Association controller watches all ES resources as suggested by @sebgl, it would trigger a reconciliation at step 3 and we would need to fetch all user secrets and check if they are orphaned.
We have an external go routine started by the association controller

I think I'm +1 with the first one, @pebrc sorry, I think I didn't get your preference

barkbay on 28 Nov 2019

Association controller watches all ES resources as suggested by @sebgl, it would trigger a reconciliation at step 3 and we would need to fetch all user secrets and check if they are orphaned.

The reconciliation is currently triggered on a Kibana NamespacedName: what would it be in this situation? I guess we'd need to trigger a different reconciliation with no particular NamespacedName that would go through all secrets.

If we consider we cannot miss a deletion event, maybe we could do the following:

when the association controller starts, garbage collect all association secrets
in the "normal" process, garbage collect secrets of the corresponding Kibana instance if it doesn't exist anymore <- make sure we requeue in case of any error here

sebgl on 28 Nov 2019

I don’t think the first one (watching all es resources ) works. Because the association controller watches Kibanas. So you would have to write what amounts to a new controller (GC controller if you will) that watches Elasticsearch and checks for orphaned secrets, which is basically what I suggested the only difference being that the watch on secrets is probably less efficient than the watch on Elasticsearch so +1 on that.

pebrc on 28 Nov 2019

I guess we'd need to trigger a different reconciliation with no particular NamespacedName that would go through all secrets.

That sounds a bit hacky to me. Could we not implement a separate controller/routine that just does just GC and is keyed on the type we are interested in?

Or we could consider doing it in the elasticsearch_controller? That would be conceptually wrong because we are maintaining some separation of concerns here, but on the other hand we could implement a GC that handles all present and future user secrets created by different association controllers at once (APM/Kibana/$FUTURE_APP_WE_WILL_SUPPORT) as long as the secrets follow a certain metadata scheme and if there is a way in the k8s API to do a generic existence check e.g. by UID (not sure there is actually)?

pebrc on 28 Nov 2019

That sounds a bit hacky to me

Agreed, that's why I proposed something different with a simple GC function when the controller starts (and then, gc as part of regular reconciliations):

on startup: list all secrets (using a non-cached k8s client?), and garbage collect useless ones
on normal reconciliations: if the Kibana resource does not exist (deletion event), garbage collect useless secrets

I think we achieve the same with your proposal of the separate gc goroutine/controller, but it may be slightly more complicated due to setting watches etc.?

sebgl on 28 Nov 2019

I think we achieve the same with your proposal of the separate gc goroutine/controller, but it may be slightly more complicated due to setting watches etc.?

Yes, agreed, I am not even sure if it's possible to do with the k8s client in the way I had in mind. 👍 on doing it the way you suggested. My only worry would be that we are slowing down the startup of the controller, but I guess we could use a label selector to reduce the number of secrets we actually have to list. Also I assume when you wrote 'all secrets' you meant all secrets created by the association controller. Finally we have to build this in such a way that we can do at least code sharing between the different association controllers as each would have to do its own user secret GC.

pebrc on 28 Nov 2019

👍1

My only worry would be that we are slowing down the startup of the controller

Yes. I guess we could run it in a separate short-lived goroutine (and still move on with the normal startup process)? It may conflict with gc done by the regular reconciliation but that would just be a "normal" conflict.

but I guess we could use a label selector to reduce the number of secrets we actually have to list

We currently set these labels:

"labels": {
    "kibanaassociation.k8s.elastic.co/name": "kibana-sample",
    "kibanaassociation.k8s.elastic.co/namespace": "default"
  },

I was hoping we would have a more "generic" one, because the List API with label filtering requires key + value. We know the key we're looking for here but not the value. Should we add another label in order to do that?

sebgl on 28 Nov 2019

Yes. I guess we could run it in a separate short-lived goroutine (and still move on with the normal startup process)?

I think we may have a race condition here:

associated resource has been deleted while ECK was stopped
ECK is restarted, orphaned secret is detected
user recreate the associated resource
the "no more orphaned" secret is deleted

barkbay on 28 Nov 2019

I think we may have a race condition here

Yes, but it will be fixed at next reconciliation, right? Eventual consistency to the rescue!
Anyway, I think we don't have to solve slow startup issues we don't have. And may never have unless we manage 5k Kibanas, at which point we may also have other issues. I think that's fine doing things sequentially for now.

sebgl on 28 Nov 2019

👍1

Yes, but it will be fixed at next reconciliation, right? Eventual consistency to the rescue!

I don't think we have any guarantee about that because we only watch secrets controlled by the association controller:

https://github.com/elastic/cloud-on-k8s/blob/master/pkg/controller/kibanaassociation/watch.go#L35-L41

We also need to improve the watches if we want to do that.

barkbay on 28 Nov 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Using self-signed certificates for filebeat

SebastianCaceresUltra · 3Comments

Kibana does not support rolling upgrades

pebrc · 3Comments

Write Elasticsearch Spec documentation

sebgl · 5Comments

Ensure ECK is compatible with ipv6 (only) environments

sebgl · 5Comments

Unsupported Distribution in Elastic Search 7.11 ECK 1.3.0

deepaksinghcs14 · 4Comments