When installing the control plane, there is a collection of things that can cause problems. Many of these can be validated beforehand.
Once the control plane has been installed, there is a collection of things that can go wrong and make it difficult to operate linkerd2. Many of these can be validated after installation.
check is an awesome tool. It would be nice to extend this to have filters for the checks (pre and post install checks). Then, a new user could run check to make sure their cluster is ready to go (or fix any problems if it isn't).
Some example options and output. Note: this isn't meant to be all the checks, see further down for a list.
linkerd check --pre (success) - Limit the number of checks to only look at the ones relevant before installation. ==== Preflight checks ====
Cluster Config ✓
Cluster Connection ✓
Cluster Version ✓
Cluster Permissions ✓
linkerd check --pre (failure) - On failure, provide links to documentation that explain how to fix the problem.==== Preflight checks ====
Cluster Config ✓
Cluster Connection ✓
Cluster Version ✓
Cluster Permissions X
-- It appears that the currently active configuration does not have the correct RBAC to install linkerd. Take a look at https://linkerd.io/2/rbac/ for more details.
linkerd check --post --wait - Limit the number of checks to only look at ones relevant after installation and wait until they all pass (with a timeout).==== Postflight checks ====
Cluster Service ✓
Service Health ✓
Service Version ✓
Instead of piping
installinto kubectl
I like the idea of automatically running post-install checks. However, I also think there's value in having install just output yaml rather than actually changing the cluster. The current approach allows the install output to be manually inspected (and changed) before it's applied, or saved to a file so it can be applied multiple times (which I've found useful for testing). I think these advantages of the current approach are worth maintaining.
Perhaps there should be two commands, one that outputs yaml as we do currently, and one that does the entire install process plus checks?
@hawkw 100% agreed, I'd go for a flag on install conduit install -o yaml personally. Either that or a separate command is a requirement.
Instead of piping install into kubectl
Myself, I would never let conduit install modify my (production) cluster itself; i would always go through the conduit install > conduit-install.yml, manually review conduit-install.yml, kubectl apply -f conduit-install.yml process. One advantage of conduit install | kubectl apply -f - is that it hints that such a careful installation process is possible, succinctly.
In particular, we originally decided on the current approach because we wanted people to be able to see how we are changing the cluster by letting them inspect the yaml, because we thought that people wouldn't just trust conduit install to do something reasonable. Also at the time we thought people might version control the output of conduit install in order to keep a history of changes to the configurations.
We can change the approach so that the piping into kubectl is avoided, and provide a "output the yaml" option to let people inspect the configuration. However, this raises the question "Does the yaml completely describe everything that is done?;" i.e. it creates some confusion about what happens during install. Also, currently the conduit tool is always safe in the sense that it is read-only, i.e. it never changes anything (IIRC). Finally, you can save the conduit install output once and replay it to multiple clusters (e.g. in minikube on my laptop, and then in GKE after testing it) and you know that it is the same configuration. With conduit install doing the kubectl apply itself, it wouldn't be clear that it would install the same configuration each time it is given the same inputs.
Note that all of this applies to conduit inject too.
Anyway, don't interpret these comments as being -1 (or +1) to such changes. I just want to provide context regarding the original (current) design.
Regarding the overall idea, I think it's important that we have a very good pre-install check mechanism (that can be done separately from the install), and a very good post-install check mechanism (that can be run separately from the install). Increasing the thoroughness of those checks is the most important improvement we can make, IMO. Finding problems (e.g. RBAC isn't enabled) and automatically generating a script that can fix those problems is the second most important thing we can do.
Having a "one step" install that does everything is harder for me to see the value in, because I don't think a one-step install is what one would do in a production deployment, realistically. A one-step install does look good in a demo because it makes conduit look easy to install, but it seems like pure demo-ware. We'd have to document the careful approach in addition to the one-step approach so it won't actually simplify the documentation to have a one-step option (default or otherwise).
That's a great point, I had conflated everything a little bit too much (though I still like the one shot install). Maybe everything could be scoped to the check command?
conduit check --preconduit check --postthough I still like the one shot install
I think I'm against this idea. Install is intentionally zero-magic. Typing conduit can't feel risky. Having to type kubectl is an explicit acceptance of responsibility for whatever happens next. At the very least, I would not want the install command to take any action by default.
conduit check --preconduit check --post --waitLGTM. Maybe --post should be the default? I slightly prefer conduit pre-install and conduit check but if you don't love those names then no use bike-shedding.
I like conduit check running through everything by default, with --pre and --post just being filters (maybe conduit check --groups=pre,post?).
Having a separate command for running the pre-flight makes sense, especially for first users to find the functionality without docs, just feels like a little duplication maybe?
Good point about conduit check checking both pre and post, though perhaps the post-check should always check the pre- stuff anyway? I.e. "post" should always be a superset of "pre"?
I dunno why not. I do like the idea of having groups in the future, something like:
# conduit check
==== Pre-Install ====
==== Post-Install ====
==== Ready for Upgrade ====
==== Upgrade Success ====
(Feels like we'd want some kind of pre/post-flight and upgrade automation as well)
Potential additions to conduit check --pre:
A potential check that might be helpful:
See #1421 for another thing to check (NetworkPolicy).
Sorry for chiming in late here. I really like the way that the spec for these checks has evolved. I just wanted to add:
Maybe we don't need to support a --post flag? We can certainly divide the checks up into pre-install and post-install (and only run the pre-install checks when the --pre flag is set), but I don't think it will ever make sense to run the post-install checks in isolation. The post-install checks will fail in weird ways if the pre-install checks fail, so we should validate that the pre-install checks pass as part of running the post-install checks
In other words:
linkerd check --pre
# runs only pre-install checks
# intended to be run before linkerd is installed
linkerd check
# runs both pre- and post-install checks
# intended to be run after linkerd is installed
That seems easier to grok from a user's perspective, too.
Note that there are also a few pre-flight checks described above that we should only run when the --pre flag is present, and we should skip when running pre- and post-flight checks together. Namely:
Maybe we don't need to support a --post flag?
Makes sense to me. Some part of me would like to have multiple sections of checks that you can run individually. I don't have a good use case for that though, so it is likely over-engineering at this point.
Ok, sounds good. And as part of #1417 we should consider at least grouping the check output into multiple sections. Down the road we could add support for running individual sections if that makes sense.
I think there may be another scenario that we may need to think about and correct me if we do not need to worry about it. Should we also consider having an optional flag for linkerd install to also run these pre and post-flight checks? It sounds like a possible use case for this feature would be: Run linkerd check --pre then run linkerd install then linkerd check. It feels like there is some value in having this feature baked in the install process as an option.
@dadjeibaah the original suggestion was to have linkerd install just take care of it all for you (and wait until everything was healthy). The more I think about it, having install do nothing more than output some YAML is best behavior we can have.
Since most folks will do an install by following the getting started documentation, we can have them use the pre/post checks as part of the process and still have the ability for folks to see exactly what is happening to their cluster and potentially change it for their specific use cases.
I'm going to close this, since pre- and post-install checks are now available via the linkerd check command, and fleshing out the set of checks that we run pre- and post-install is ticketed in separately (e.g. #1474, #1475, #1732, #1741).
Most helpful comment
I think I'm against this idea. Install is intentionally zero-magic. Typing
conduitcan't feel risky. Having to typekubectlis an explicit acceptance of responsibility for whatever happens next. At the very least, I would not want theinstallcommand to take any action by default.