Beats: [Meta][Ingest Manager] Allow the elastic agent to upgrade itself and his artifacts

Created on 23 Jul 2020  路  19Comments  路  Source: elastic/beats

Overall

  • [ ] Define upgrades rules and behavior. (Limit to upgrades, what happens on failure) @blakerouse
  • [ ] Define and implements how Fleet is starting an upgrade. @michalpristas /@neptunian

Design

Ingest Manager

Elastic Agent

  • [x] Define Elastic Agent structure on disk for elastic agent support upgrade and rollbacks. https://github.com/elastic/beats/issues/20048 @michalpristas
  • [x] Allow the Elastic Agent to reexec itself. https://github.com/elastic/beats/pull/20111 @blakerouse
  • [x] Make upgrade work with packaging. #21019 @blakerouse
  • [x] Add ability to communicate and control the running agent daemon. https://github.com/elastic/beats/issues/20142 @blakerouse
  • [x] Allow to trigger an update locally @blakerouse
  • [x] Elastic agent can receive an upgrade action and start the upgrade process. @michalpristas
  • [x] Allow the Elastic Agent to upgrade his artifacts @blakerouse @michalpristas

    • [x] Allow to upgrade to a nightly snapshot.

    • [x] Allow downgrading to a specific version.

  • [ ] Elastic agent can rollback on failure to the previously correct version. @blakerouse
  • [x] When the upgrade is good we remove the previously installed version. @blake @michalpristas
  • [ ] Add new status to the Elastic Agent and report them to Fleet: rollback, upgrading, etc: @michalpristas, @blakerouse
  • [x] Allow Elastic Agent to be configured as non-upgradable it could be a flag an enrollment time. https://github.com/elastic/beats/issues/21001
  • [x] Report to Fleet when Elastic Agent can be upgraded https://github.com/elastic/beats/issues/21318

Documentation

  • [ ] Document where logs, data and files used by the elastic agent are installed with the different artifacts. @blakerouse
  • [ ] Document that this is released as an experimental feature, and we reserve the right to change it

Endpoint

  • [x] Rollback support for the endpoint. @ferullo

Other feature

  • [ ] We could have rules at the Agent policy level to only allow an upgradable agent to be enrolled.
Ingest Management v7.10.0 v7.11.0

Most helpful comment

@ph Sounds good. I will ensure, it works!

All 19 comments

@ruflin @michalpristas @blakerouse Meta issue for upgrade, can you create specifc issues for the items and link them to this one?

@EricDavisX Upcoming test cases to add for 7.10 ^

@ph I would like to reassing Define upgrades rules and behavior. (Limit to upgrades, what happens on failure) @ruflin to @blakerouse or @michalpristas if you don't mind?

@blakerouse I have assigned you as the owner of this feature.

@ph Sounds good. I will ensure, it works!

Added a point to document the path/files are installed.

@michalpristas @blakerouse lets keep this issue updated.

Hi Eric

W have observed that changes for availability of upgrade agent option are not reflected on latest 7.10.0-SNAPSHOT cloud environment with following commit 2ace108bc8aab580c0db8b788e03f647fa4927bc.
Kibana_commit

7.10.0 agent: https://snapshots.elastic.co/7.10.0-bbfba61c/downloads/beats/elastic-agent/elastic-agent-7.10.0-SNAPSHOT-windows-x86_64.zip file

Hence, we have reported bug #79644 for the same.

We will test this ticket once the above bug is fixed.

Hi Eric

We have observed that changes for this ticket are not reflected on latest 8.10.0-SNAPSHOT cloud environment with commit 6f983728d7f8c2cf065a6d5099157a5cfdc3cd08 and 7.10.0-SNAPSHOT, 8.10.0-SNAPSHOT agents.

Created 09 testcases and failed 09 testcases under the TestRun https://elastic.testrail.io/index.php?/runs/view/723&group_by=cases:section_id&group_order=asc

Updating_status

We will test this ticket once the changes are reflected on latest Kibana cloud environment.

@rahulgupta-qasource this is available for test, but as far as I know it can only be assessed via the API, and its not hard to do that, but we are confirming bugs and expectations for now and will post back details on testing.

For now, here is a list of tasks / questions from Engineering Productivity side:

Agent Upgrade

  • no way to test it via the UI in 7.10?
  • Will we be able to test in the UI in 7.11 BC / GA?
  • Until then, do we have enough api + unit tests for this?
  • UI allows to attempt upgrade when snapshot build, which will never work (Sandra was considering a Kibana side fix)
  • Via API, it did'n't work on 7.11 stack w 7.10 Agent (error with .asc files)
  • Eric saw the UI hang for a minute while it was processing something. needs review.
  • worked better with 8.0 Kibana and 7.10 Agent tho as tested by Michal
  • still had errors in 8.0 & 7.10 scenario, Michal reports Agent shows wrong in UI because upgrade re-execs into same process. only shows and executes correctly after re-starting machine, Agent side will review.
  • finish e2e-test that Eric spec'ed out earlier
  • kibana.endpoint demo env doesn't show Agent as upgradeable, not sure why this is...

Hi Eric

Thank you for sharing the feedback.

Please find below the upgrade agent testcases location in Testrail https://elastic.testrail.io/index.php?/suites/view/27&group_by=cases:section_id&group_order=asc&group_id=8423

As per discussion in ingest weekly call, we will validate this ticket once agent is upgradable through UI.

just posting a summary of what I have learned and relating info:

  • feature is being tested via the API and we're waiting on a fix from Infra to be able to test end to end
  • Windows side state of 'upgradeable' field was not stable, fix is PR'ed on Agent side
  • UI testing is waiting on a separate issue on Agent side to support 'detecting' what the latest build is and to use that as the upgrade target, for -SNAPSHOT and BC/GA builds

pre-requisites:
to test this, the Agent must be installed... with the Agent 'install' subcommand (not by just 'running' it)
to test this, the current deployed Agent version must be greater or equal to 7.10 (first support in Agent)
to test this, the Kibana version must be greater than the Agent version installed (so... test with a 7.11 Kibana and 7.10 Agent)

With those pre-reqs in place:
Via the UI, you should see the option to upgrade is available if Kibana version is > than current Agent version. Then when you click that option for that agent, it should automatically download the new Agent and install it. #easy

Via API, like in postman (or curl) you can pass in a specific version and even a specific webserver / download location (this won't be necessary in coming builds, after above work is finished), the call would look like a POST as: https://ericserver:9243/api/fleet/agents/adsf00a7-a185-4dfb-b654-18e5a71d71a7/upgrade
with a body of: {"source_uri":"https://snapshots.elastic.co/7.11.0-8a4554fc/downloads/","version":"7.11.0-SNAPSHOT"}

  • the exact path above is dynamic and must be determined by parsing out the build artifacts search API. until such time as the Agent will do it for us (which is the intent if the source_uri is left out of the api call)
  • the inclusion of '-SNAPSHOT' in the POST bod for version indicates to the Agent if it should look in the nightly elastic build locations relating to the cited build version for the 'latest' Agent or if it should look in the shipped GA agent storage buckets for the 'latest' agent of the cited build line.

the fixes are starting to roll in for this, and we will be re-testing. I don't think all fixes will be in 7.10 BC3 so we can keep pushing and will test around it as possible (it needs 7.11 or 8.0 anyhow) and will chat with Michal about automation nuances for it in the e2e-testing repo (its an interesting one to automate)

I did another quick end to end test and found we need one more fix - tagged here:
https://github.com/elastic/beats/issues/21971

the feature was tested and was seen working end to end, with macOS. The 7.10 BC4 build has a signing problems on windows which prevent the feature from working, fyi.

Hi @EricDavisX

Thank you for sharing the feedback.

We have validated the ticket for linux tar, deb and rpm under https://elastic.testrail.io/index.php?/plans/view/792 Testplan with latest 7.11-snapshot Kibana build(with commit 06278dcdec4e3d34df6b0b617b70e961aa630f15) and 7.10 BC4 agent:

BC4 agents download location:
Linux rpm: https://staging.elastic.co/7.10.0-c650b297/downloads/beats/elastic-agent/elastic-agent-7.10.0-x86_64.rpm
Linux deb: https://staging.elastic.co/7.10.0-c650b297/downloads/beats/elastic-agent/elastic-agent-7.10.0-amd64.deb
Linux tar: https://staging.elastic.co/7.10.0-c650b297/summary-7.10.0.html/elastic-agent-7.10.0-linux-x86_64.tar.gz

Observations:

  1. We are blocked on linux rpm execution due to bug https://github.com/elastic/beats/issues/22296
  2. Linux deb agent is not upgradeable and we have reported bug https://github.com/elastic/beats/issues/22304 for the same.
  3. Error is received in activity logs with Linux tar package and reported bug https://github.com/elastic/beats/issues/22306 for the same.

Query:
Could you please look into our following query:

After we have successfully upgraded the agent from 7.10.0 BC4 agent to 7.11.0-SNAPSHOT agent on latest 7.11.0-SNAPSHOT Kibana build, how to know the hash of upgraded 7.11.0-SNAPSHOT agent to verify it has upgraded to the latest 7.11.0-SNAPSHOT agent hash currently available on 7.11.0-SNAPSHOT artifact link page.

@rahulgupta-qasource As posted elsewhere... let us re-do all tests on the next 7.10 snapshot Agent, updating to the next 7.11 (or 8.0) snapshot build. This will get feedback faster than waiting on the BC.

From your observations,

22296 - should not relate further to the upgrade testing (not urgently at least), but it seems a bug indeed! we need more info, too, please.

22304 - not a bug, but a good negative test case we can formalize.

22306 - a real bug! That we are surprised by but have put a work-around fix into the Agent.

To your query:
How to know the hash of upgraded 7.11.0-SNAPSHOT agent?
Answer: I think we can traverse Agent logs to look for the url it is downloading from and match that to the current one. We can do that manually one time, now to check it - and then in the test content we can assume that if the version increments as seen in the UI to the next version (from 7.10.0 to 7.11.0 for example), that this is enough validation for the UI based suites.

Hi Eric

Thank you for sharing the feedback.

We have reexecuted the above tests for windows x64 zip, Mac tar, Linux tar, Linux deb and rpm under TestPlan https://elastic.testrail.io/index.php?/plans/view/798 with latest 7.10 agent snapshot and 7.11-0-snapshot kibana cloud environment.

We have also reported bug https://github.com/elastic/kibana/issues/82259 for the same.

Please let us know if anything is missing from our end.

I put a PR in to our private elastic/siem-team repo in support of testing this in our demo environment, fyi

Was this page helpful?
0 / 5 - 0 ratings