Linkerd2: RFC: Break up linkerd install into stages

Created on 28 Jan 2019 · 24Comments · Source: linkerd/linkerd2

This RFC proposes modification to the linkerd install flow, dividing Linkerd
installation into stages grouped by permission level.

Current State

Installation of the full suite of components requires three commands, with
varying combinations of flags:

`linkerd install-cni`

ClusterRole
- linkerd-cni
ClusterRoleBinding
- linkerd-cni
Namespace
- linkerd
ServiceAccount
- linkerd-cni
DaemonSet
- linkerd-cni
ConfigMap
- linkerd-cni-config

`linkerd install`

ClusterRole
- linkerd-linkerd-controller
- linkerd-linkerd-prometheus
ClusterRoleBinding
- linkerd-linkerd-controller
- linkerd-linkerd-prometheus
Namespace
- linkerd
CustomResourceDefinition
- serviceprofiles.linkerd.io
ServiceAccount
- linkerd-controller
- linkerd-grafana
- linkerd-prometheus
- linkerd-web
ConfigMap
- linkerd-grafana-config
- linkerd-prometheus-config
Service
- linkerd-controller-api
- linkerd-grafana
- linkerd-prometheus
- linkerd-proxy-api
- linkerd-proxy-injector
- linkerd-web
Deployment
- linkerd-controller
- linkerd-grafana
- linkerd-prometheus
- linkerd-web

`linkerd install --tls optional`

ClusterRole
- linkerd-linkerd-ca
ClusterRoleBinding
- linkerd-linkerd-ca
ServiceAccount
- linkerd-ca
ConfigMap
- linkerd-ca-bundle
Deployment
- linkerd-ca

`linkerd install --tls optional --proxy-auto-inject`

ClusterRole
- linkerd-linkerd-proxy-injector
ClusterRoleBinding
- linkerd-linkerd-proxy-injector
ConfigMap
- linkerd-proxy-injector-sidecar-config
Service
- linkerd-proxy-injector
Deployment
- linkerd-proxy-injector

`linkerd install-sp`

ServiceProfile
- linkerd-controller-api
- linkerd-proxy-api
- linkerd-prometheus
- linkerd-grafana

Proposal: Multi-stage installation

All components listed above, grouped by privilege, --admin and --user:

$ linkerd install --help
Output Kubernetes configs to install Linkerd.

Usage:
  linkerd install [flags]

Flags:
      --admin  Install components requiring cluster-wide privileges.
      --user   Install components requiring namespace-wide privileges.

The linkerd check command can be modified to mirror the linkerd install flow:

linkerd check --admin
linkerd check --user

1. Cluster Admin

Default usage:

linkerd install --admin

Possible modifier:

linkerd install --admin --cni

ClusterRole
- linkerd-cni
- linkerd-linkerd-ca
- linkerd-linkerd-controller
- linkerd-linkerd-prometheus
- linkerd-linkerd-proxy-injector
ClusterRoleBinding
- linkerd-cni
- linkerd-linkerd-ca
- linkerd-linkerd-controller
- linkerd-linkerd-prometheus
- linkerd-linkerd-proxy-injector
Namespace
- linkerd
ServiceAccount
- linkerd-cni
ConfigMap
- linkerd-cni-config
DaemonSet
- linkerd-cni
CustomResourceDefinition
- serviceprofiles.linkerd.io

2. Cluster User

linkerd install --user

ServiceAccount
- linkerd-ca
- linkerd-controller
- linkerd-grafana
- linkerd-prometheus
- linkerd-web
ConfigMap
- linkerd-ca-bundle
- linkerd-grafana-config
- linkerd-prometheus-config
- linkerd-proxy-injector-sidecar-config
Deployment
- linkerd-ca
- linkerd-controller
- linkerd-grafana
- linkerd-prometheus
- linkerd-proxy-injector
- linkerd-web
Service
- linkerd-controller-api
- linkerd-grafana
- linkerd-prometheus
- linkerd-proxy-api
- linkerd-proxy-injector
- linkerd-web
ServiceProfile
- linkerd-controller-api
- linkerd-proxy-api
- linkerd-prometheus
- linkerd-grafana

Developer installation

linkerd install shall continue to work as an atomic installation process,
assuming the user has cluster access sufficient to install all Linkerd
components.

To address the timing issue between installing CRD/serviceprofiles and the
control-plane ServiceProfiles, the ServiceProfiles shall be installed via a
job configured in the linkerd install output.

arecli areinstall prioritP0 rfc

Source

siggy

All 24 comments

@siggy downside of having ServiceProfiles as a job post-fact is that we'll need to have extra RBAC (which is probably okay, just need to be aware).

It is feeling like CNI will need to be opt-in.
Should there be a PSP included when you're not using CNI?

grampelberg on 28 Jan 2019

@grampelberg Re: ServiceProfiles I could go either way, adding more RBAC isn't great, but a 3rd command isn't great either. I'd rather it work one way rather than both.

Re: CNI, I've made an edit to call out linkerd install --admin --cni as a possible opt-in modifier. Unless there is a compelling use case for installation without CNI, I'd prefer to limit to configuration surface area by always including it.
Re: PSP when not using CNI, is the intent to provide sidecar proxy-init containers sufficient permission to modify IP tables? This may be another argument from the previous point, where we can avoid supporting different installation configs by limiting the surface area.

Meta point: In this issue I'd like us to nail down the user-facing ergonomics of installation. The underlying implementation is important, and may influence the UX, but deeper discussions around install/inject changes should probably happen elsewhere, #2095 being a good example.

siggy on 28 Jan 2019

😻

When neither --user nor --admin are specified, is the result just the concatenation of the --admin and --user outputs?

I see service profiles listed under install --user. Should those be installed by a job instead?

I wonder what we can do with the naming of these commands to make it more clear that these are sequential steps rather than alternatives. I still like the idea of having the --admin flag be called something like --prepare, --setup, --init, etc. Or even a separate subcommand linkerd prepare, linkerd setup, linkerd init, etc.

Building off the previous point, it may be possible for linkerd install to be context sensitive. It can inspect the state of the cluster (linkerd check basically) and determine if the admin setup has already been completed or if the user has admin permissions:

Cluster is already set up: output regular user-level install output
Cluster is not set up, but user has admin permissions: output admin and user yaml together (atomic install)
Cluster is not set up and user does not have admin permissions: fail with error message describing that an admin must run the setup command first.

adleong on 28 Jan 2019

Re: neither --user nor --admin being specified, and service profiles getting installed by a job, see the Developer installation section. ;)

I'm open to naming changes, the intent of --admin and --user was to convey the role of the person doing the install, in addition to privilege level. I'm also open having them be separate commands. The ergonomics around a full-suite linkerd install being a concatenation of linkerd --admin and linkerd --user felt intuitive to me.

Context sensitive commands are interesting. I think there's a lot we could leverage in our check infrastructure when performing installs around providing feedback to the user. I'm a bit hesitant to make the output of cli commands context-dependent, as it could make scripts/docs/instructions less deterministic.

siggy on 28 Jan 2019

That's a good point, predictability is really important.

adleong on 28 Jan 2019

FYI: Everything related to TLS in this proposal will change between 2.2 and 2.3

olix0r on 28 Jan 2019

Product goals, in priority order:

Each step should be clear, understandable, and fit with Kubernetes concepts and idioms.
Things that require different privilege levels should be delineated between separate steps, so that they can be run by separate teams/people.
Linkerd should still be usable in a locked down environment without elevated permissions, even if that mode has fewer features.
The install flow be should "easy", i.e. the common case should not require a lot of commands, a lot of flags, or navigating a big decision tree.

In other words: clarity first, easiness last.

wmorgan on 28 Jan 2019

👍1

What those priorities suggest to me is that it's more important to have the service profile explicitly in the command output (instead of created by a job) than it is to have a single atomic install command for developers.

This suggests putting the service profiles in the --user output and scraping the atomic install mode.

adleong on 28 Jan 2019

It feels like we need to think through the check implications here at the same time.

I'm a little torn, from a workflow perspective, it could be:

$ linkerd check setup
$ linkerd install setup # --admin

# -----------
# Optional
# -----------

$ linkerd check cni
$ linkerd install cni

# -----------

$ linkerd check components
$ linkerd install components # -- user

setup and cni are the wrong words, linkerd check components isn't right either. I prefer sub-commands to flags as I suspect we'll have flags that are specific to each and confusing when all mashed together.

This suggests that there's the possibility of:

$ linkerd check install
$ linkerd install

To make this work, we'd need to change the install flow from linkerd install | kubectl apply to doing it all within install and having significantly reduced configuration options. I still think the concept of outputting YAML is solid, but maybe this is a good middle ground? You can get all the YAML if you follow the advanced procedure and just have it work for the simple one. It would be possible to prompt the user and explain what's happening to their cluster in the linkerd install case as well in a more succinct version than we're able to with YAML.

Going this route, we could use some checks to suggest install modes (we noticed you don't have RBAC for CRDs, you might want to install in unprivileged mode).

Also, it feels like there's a whole other sub-command for what is --single-namespace now:

$ linkerd check unprivileged
$ linkerd install unprivileged

While it'd be possible to split this across setup and components using a flag, feels like the whole thing is overly complicated.

grampelberg on 29 Jan 2019

Just my 2c:

I think a linkerd install CLI wizard which performs multiple steps sequentially is a really nice user experience, but it also is in tension with the project goals of being transparent and non-magical. Having two different install methods (applying yaml and install wizard) means more surface area to support and maintain. My vote would be to not have this.
How important is it that we support an unprivileged mode? I don't think it's unreasonable to require that having a cluster admin run a setup command is a hard prerequisite for Linkerd. This lets us remove unprivileged/single-namespace entirely which greatly simplifies the code and reduces the number of configuration permutations.
We should standardize on if linkerd check X is intended to run before or after linkerd install x. I think running check after makes sense, which means we'd also need a linkerd check pre which can be run before any linkerd install command.
I also like the idea of linkerd check without any further sub-command being something that runs ALL the checks in sequence and basically tells you which command to run next. i.e. "You're cluster is correctly set up, next run linkerd install control-plane" or whatever.

All together I think the workflow would be:

linkerd check pre # All good, cluster is ready for Linkerd setup

linkerd install setup # Must be run by admin
linkerd check setup # All good
linkerd check # Linkerd setup is complete, next run `linkerd install control-plane`

linkerd install control-plane # Can be run by user
linkerd check control-plane # All good
linkerd check # Linkerd control-plane installed, next inject some services

linkerd inject 
linkerd check data-plane # All good
linkerd check # All good

adleong on 30 Jan 2019

👍1

How important is it that we support an unprivileged mode? I don't think it's unreasonable to require that having a cluster admin run a setup command is a hard prerequisite for Linkerd. This lets us remove unprivileged/single-namespace entirely which greatly simplifies the code and reduces the number of configuration permutations.

It is pretty important that someone other than cluster admins operate the control plane. If there was some YAML that a cluster admin could run to setup cluster level things (like the CRDs) and still have the control plane restricted to a single namespace, that might be a valid workaround. I have some concerns around upgrade instructions, in particular whether it would be possible at all to do #1903.

We should standardize on if linkerd check X is intended to run before or after linkerd install x. I think running check after makes sense, which means we'd also need a linkerd check pre which can be run before any linkerd install command.

In a perfect world, I'd actually prefer to just run check pre as part of each install step. It makes the whole thing easier for users. That way, check becomes a post-run command and install becomes a little bit more predictable (with a flag to opt out of course).

I also like the idea of linkerd check without any further sub-command being something that runs ALL the checks in sequence and basically tells you which command to run next. i.e. "You're cluster is correctly set up, next run linkerd install control-plane" or whatever.

I dunno, I'm not a big fan of this. You'll need to be able to run the distinct steps anyways. Feels a little magical.

grampelberg on 30 Jan 2019

If there was some YAML that a cluster admin could run to setup cluster level things (like the CRDs) and still have the control plane restricted to a single namespace, that might be a valid workaround.

Yes, I think that is how it would work. A cluster admin would run linkerd install setup | kubectl apply -f - (which creates CRDs, etc) and create a namespace for a regular user to use: kubectl create ns alex. Then a regular user would be able to linkerd install -l alex control-plane | kubectl apply -f -. This would install the control plane into the alex namespace without requiring any cluster level permissions.

I think this gets us everything we want without any special --single-namespace mode.

adleong on 30 Jan 2019

👍1

Just need to think through the ClusterRole and Role implications (which is the biggest differentiator right now).

grampelberg on 30 Jan 2019

👍1

Related to --single-namespace and down in the weeds, the control-plane components currently take a --single-namespace flag, and behave differently based on this setting.

That said, I don't love the fact that we pass this flag around, and would much prefer that the control-plane components detect and degrade gracefully when confined to a single namespace. I just wanted to call this change out to help scope.

siggy on 30 Jan 2019

ClusterRole vs Role is an interesting wrinkle I hadn't thought about until now. Just as a starting point for discussion, what about something like this?

linkerd install setup takes a flag which specifies which namespaces Linkerd can access (default: all) and creates a ClusterRole with permissions on those namespaces. It also takes a flag which specifies which namespaces the control-plane can be installed into (default: linkerd) and creates ClusterRoleBindings for the linkerd service accounts in those namespaces.

Then, linkerd install control-plane creates a service account, a Role that grants access within that namespace, and a RoleBinding that connects them.

This unlocks the following use-cases:

Developer with full cluster access

# Creates a ClusterRole that can read all namespaces and a ClusterRoleBinding for the linkerd
# service account in the Linkerd namespace.
linkerd install setup | kubectl apply -f -
# The control plane runs under the linkerd service account and can therefore read from all
# namespaces.
linkerd install control-plane | kubectl apply -f -

Cluster admin locks Linkerd to certain allowed namespaces

# Creates a ClusterRole which can read from the test1, test2, and test3 namespaces.  Also creates
# a ClusterRoleBinding for the linkerd service account in the alex namespace.  This command is
# run by a cluster admin.
linkerd intsall setup --data-plane-namespaces=test1,test2,test2 --control-plane-namespaces=alex | kubectl apply -f -

# This command is run by someone running the linkerd control plane in the alex namespace.  Since
# it runs under the linkerd service account in the alex namespace, it has permissions to read from
# namespaces test1, test2, and test3.  It also creates a Role and RoleBinding so that it can read from
# its own namespace.
linkerd install control-plane -l alex | kubectl apply -f -

Total lockdown

# No ClusterRole or ClusterRoleBindings are created.
linkerd install setup --data-plane-namespaces="" --control-plane-namespaces=""

# Creates the linkerd control plane in the alex namespace.  It also creates a Role and RoleBinding
# so that it can read from its own namespace.  This is effectively single-namespace mode.  linkerd
# does not have permissions to read from any other namespace.
linkerd install control-plane -l alex | kubectl apply -f -

adleong on 30 Jan 2019

As far as I can tell, the --single-namespace flag is used to control two things in the controller code:

Which namespaces to watch
To disable service profiles

adleong on 30 Jan 2019

Re: ClusterRole vs Role, I think it's an interesting idea to have linkerd install control-plane create a Role for itself.

Re --data-plane-namespaces and --control-plane-namespaces, do we need both? This seems to provide customization beyond what --single-namespace does today. What if instead we just did:

Developer with full cluster access

linkerd install setup
linkerd install control-plane

Total lockdown

linkerd install setup -l foo
linkerd install control-plane -l foo

siggy on 30 Jan 2019

What about just removing the ClusterRoleBindings entirely from linkerd install setup with a flag?

linkerd install setup --no-rbac

You can't make a ClusterRoleBinding that is namespace specific. This feels like it'd be nicer for cluster admins as well to audit.

I'd always looked at ClusterRoleBindings and RoleBindings as an either/or kind of thing, but I like the idea of always just doing a Role/RoleBinding on the namespace level. Makes permissions a little more clear.

grampelberg on 30 Jan 2019

👍1

You can't make a ClusterRoleBinding that is namespace specific. This feels like it'd be nicer for cluster admins as well to audit.

I believe you can have a ClusterRoleBinding that has a service account (from a specific namespace) as the subject:

subjects:
- kind: ServiceAccount
  name: linkerd-controller
  namespace: alex

adleong on 30 Jan 2019

Yup, that's because service accounts are tied to namespaces. The ClusterRole would still give you access to everything in every namespace.

grampelberg on 30 Jan 2019

What is the summary of "things in the admin install" versus "things in the user install"? I.e. when we add new resources, what's the question that we use to determine which phase the resource should be installed within?

olix0r on 20 Feb 2019

Is it safe to summarize this issue as the following:

Introduce an install --admin that emits CRDs, ClusterRoles, ClusterRoleBindings, and the linkerd-cni config
install --user emits all other resources (including those previously installed by install-sp).

Is that it?

olix0r on 20 Feb 2019

@olix0r That's basically the gist.

Another way to think about it:

linkerd install --admin is for things that require privileges outside of a predetermined set of namespaces
linkerd install --user is for things that only require privileges within a predetermined set of namespaces

There is still some discussion around supporting something like a --single-namespace mode, and how that would relate to a multistage install.

siggy on 20 Feb 2019

Completed as part of #2719. Tracking subsequent work via #2337.

siggy on 24 Apr 2019

Was this page helpful?

0 / 5 - 0 ratings