Velero: design for multi-tenancy support

Created on 9 Aug 2017  路  14Comments  路  Source: vmware-tanzu/velero

Allow users other than cluster-admin to use Ark. This has security implications:

  • need to ensure a user can only back up/restore PVs/snapshots they have access to
  • need to ensure a user can only back up kube resources they have access to
  • need to find a way to have the restore "run as" the user, so there's no privilege escalation
Breaking change EnhancemenUser multi-tenancy Needs Product P2 - Long-term important Reviewed Q2 2021

Most helpful comment

@nzoueidi please attend our next community meeting and we can discuss your use case. Karl from the Cruise team was spearheading this effort if you want to sync with them as well.

All 14 comments

@yastij is also interested in helping out here

+1

+1. I'm willing to help too. I will start by getting familiar with the codebase and come up with a proposal

Guys, do you please know in which upcomming release this feature will be available ?

@abessifi we haven't slated it for a release yet - we are still in the stage of needing to do some R&D/design work to figure out how this would work. we'll update here as/when it progresses!

If you have any input on this item - please feel free to add comments on this issue about your use case, that would be helpful for the team.

Moving this into v1.3 for tracking purposes, as we have some community members who are interested in helping to design/implement.

I am interested on helping with this feature. Maybe we could take this up to the velero community meeting and discuss it with all the other interested community members?

@nzoueidi please attend our next community meeting and we can discuss your use case. Karl from the Cruise team was spearheading this effort if you want to sync with them as well.

Our team has been exploring multi-tenancy in our project which leverages Velero to perform the backup and restore. The biggest concern for us is that once a non-admin user is granted permissions which allow them to either 1. access the Velero namespace, or 2. create Velero resources, then we have broken the security model. If a user has access to exec into the Restic pod, then it now has access to PV data and the privileged Velero service account which gives the user the ability to perform any operation the privileged service account can run.

If Velero was run as a user (so not the privileged security account), then I think for the most part things would work. If the service account Velero runs as doesn't have access to a resource, then it can't be included in the backup and the backup will fail. If Velero doesn't have the ability to create specific resources/namespaces then the restore will fail. Ideally Velero would take into account the originating user who created the backup/restore resource and perform all backup/restore operations as this user.

The big concern on our end to get this working is Restic. How can Velero be redesigned to ensure that a user can create backup/restore resources but not be able to exec into the Restic containers nor run as the privileged service account. It seems like Restic would need to be run in a separate namespace... but I haven't totally thought this through. Would like to hear what others have been thinking in terms of this.

The biggest concern for us is that once a non-admin user is granted permissions which allow them to either 1. access the Velero namespace, or 2. create Velero resources, then we have broken the security model.

I expect users to interract with Velero only through CRDs. This is the best way to abstract the access to Velero and it's underlying infra (Restic in this case). I think the challenge here is more about redesigning Velero to support multi-tenancy out-of-the-box which may lead to a lot of code refactoring.

Can we imagine a model that fits more directly with RBAC principles?

  • velero server can run in its own namespace
  • velero resources can exist in any namespace
  • access to velero Backup/Snapshot objects are managed via ClusterRole, Role, ClusterRoleBinding and RoleBinding
  • velero backups are scheduled as individual pods (Jobs) in the same namespace as the Backup/Restore object (thus, visible and manageable by tenants)
  • velero service accounts used in the jobs are bound to namespaces and also subject to RBAC rules

One thing that doesn't fit well with this model is a multi-namespace backup, which seems like a typical pattern for Velero. Should we sacrifice this for the sake of multi-tenancy? Probably not...

@bgagnon yeah, I have been thinking about something like that as well. Another challenge is how cluster-scoped resources are dealt with - Persistent Volumes being the first one that comes to mind. If you have a constrained service account per namespace, then it needs to be able to GET these cluster-scoped resources in order to back them up.

PersistentVolume objects are going to be difficult, for sure. In our multi-tenant platform we've allowed anyone to list/get them, but only cluster admins can create/update/delete them. Tenants can only manipulate PersistentVolumeClaims. We didn't find this transparency to be a concern (our tenants all know and trust each other).

Even then, there's a need for carefully managing the PV binding references. For one thing, there's a need to permanently annotate PVs (and/or their corresponding cloud volume) with the name of the tenant that owns it. This info can later be used to restrict was is allowed or not-allowed as a PVC source.

Overall, I think some sort of coordination with another instance of velero running with elevated privileges is going to be necessary for the pieces that involve mutating PersistentVolume objects.

Hello, any progress to introduce multi-tenancy in Velero?

Was this page helpful?
0 / 5 - 0 ratings