Beats: [Ingest Manager] Improve agent unenrollment

Created on 28 May 2020  路  13Comments  路  Source: elastic/beats

Description

Related to https://github.com/elastic/kibana/issues/67409

We need to improve the way we unenroll agent.
Currently it's done by revoking all API keys used by the agent, and the agent guess that it should unenroll if he start receiving 401.

We can to improve this to support:

  • a gracefull unenrollment, that give time to the agent to send back all the information we want to both Kibana and ES
  • the agent need to uninstall endpoint
  • we still need to support a way to instantly unenroll agent where we revoke all API keys
beta1 Ingest Management

Most helpful comment

I think we should expose like a Force Unenroll in the UI, because I keep getting into a situation where I have agents that will never be removed from the list.

Example is I enroll the agent, then I enroll the same agent again. Now I have 2 in the list, 1 online and 1 offline. If I unenroll the first enrollment it will sit in Unenrolling forever because that agent has completely been replaced with a new enrollment and will never ack that unenroll.

I think this will also be the case in the real world not just development. In the case of the Windows world, users my just reboot the machine press F5, boot from Windows Deployment Services and deploy a fresh clean image. If that fresh clean image has a post install step of enrolling the Elastic Agent, it will show as a new machine in Fleet. But the previous installation of the machine will remain Offline forever and it will never be able to complete a true unenrollment.

All 13 comments

Pinging @elastic/ingest-management (Team:Ingest Management)

I'm posting this here and in https://github.com/elastic/kibana/issues/67704 -

  • one concern I have that I don't see commentary on, is making sure we have and provide in the API call the time that the agent was unenrolled (from the UI).

from api/ingest_manager/fleet/agents?page=1&perPage=20&showInactive=true

  • you can see the enrolled_at timestamp field, we should capture and add to the metadata an unenrolled_at timestamp. that field can be empty if we want / ned to populate it in the first place, and when empty it need not have any impact on the UI.

Also, do we ever intend to show users the list of inactive Agents in the UI? It may certainly be useful, not sure that I saw tickets for that for Beta 1 or beyond, but I assume we have to... @mostlyjason yes?

Same concern for when we revoke an enrollment API key - I don't think that was tracked either, it may be an even bigger concern for that API + use case.

Sorry I missed this one earlier @EricDavisX. yes I think we show the inactive ones in the UI. They are hidden by default but there is a filter to show them.

Same concern for when we revoke an enrollment API key - I don't think that was tracked either, it may be an even bigger concern for that API + use case.

When an user click on unenroll we save on the agent the property unenrollment_started_at when the agent confirm the unenrollment we save unenrolled_at these fields are exposed in the API not in the UI.

@nchaulet If the Agent unenroll but never confirm we still invalidate the key?

No if the agent never confirm, you will need to be manually force unenroll (currently only available with an API call, but can expose this in the UI for agent that are currently unenrolling see the implemation PR https://github.com/elastic/kibana/pull/70031)

I think we should expose like a Force Unenroll in the UI, because I keep getting into a situation where I have agents that will never be removed from the list.

Example is I enroll the agent, then I enroll the same agent again. Now I have 2 in the list, 1 online and 1 offline. If I unenroll the first enrollment it will sit in Unenrolling forever because that agent has completely been replaced with a new enrollment and will never ack that unenroll.

I think this will also be the case in the real world not just development. In the case of the Windows world, users my just reboot the machine press F5, boot from Windows Deployment Services and deploy a fresh clean image. If that fresh clean image has a post install step of enrolling the Elastic Agent, it will show as a new machine in Fleet. But the previous installation of the machine will remain Offline forever and it will never be able to complete a true unenrollment.

@hbharding what do you think of having a force unenroll action when a agent is unenrolling?
Screen Shot 2020-07-17 at 10 36 20 AM

@nchaulet I think we need to solve it in two steps:

  • Having a force unenroll action.
  • Having a timeout when force unenroll is executed.

We need to make sure the APIs keys are _invalidated_ when unenroll is executed.

I discussed this with @ruflin and we agreed that the timeout is more complex to implement as it needs background task, and in a first step for beta1 the timeout could be a manual action via an API call, or what I propose in the UI.

I think the case where the agent do not unenroll correctly by itself is an error and I think it's ok to require an action to correct it (force unenroll)

@nchaulet OK, lets do the following:

  1. If we can add a force without too much effort lets do it.
  2. Create an issue for having the timeout in 7.10 and add it to the tracking issue?

I think we close this issue after 1

Was this page helpful?
0 / 5 - 0 ratings