Cockroach: sql,kv: tenants can write before create_tenant() and after destroy_tenant()

Created on 4 Sep 2020  路  5Comments  路  Source: cockroachdb/cockroach

The following is possible today:

  1. launch a tenant (say id=5)
  2. tenant SQL process will perform some KV write (but won't properly, since tenant keyspace has not been seeded)
    This write is possible since KV does not check against the tenants table, but uses the cert
  3. select crdb_internal.create_tenant(5) returns basically "tenant exists in keyspace but not in system.tenants"
  4. tenant can neither be deleted nor created

This should not be possible. Either we say that crdb_internal.create_tenant clears the keyspace if necessary,
or we allow destroy_tenant to wipe the keyspace even if the tenant is not in the table, or we try to make sure that the SQL tenant will not perform any writes until it knows that it's running against a properly bootstrapped keyspace. Or, we augment KV authentication to also check for the existence of the tenant in the gossiped table (which however can result in failing legitimate auth attempts due to the async nature of gossip).

cc @nvanbenschoten @asubiotto @chrisseto

A-multitenancy C-bug

All 5 comments

we try to make sure that the SQL tenant will not perform any writes until it knows that it's running against a properly bootstrapped keyspace

Is this hard? It would be my preference.

I suspect not, the KV endpoint backing the connector could check for the gossip perhaps? That wouldn't result in flakes as the connector retries until it succeeds anyway.

Note that if we did this, it wouldn't 100% solve the problem - if a pod got compromised before the entry is added to the system.tenants table, it could in principle write into its designated keyspace, and we'd be back to square one. Pretty sure that will never happen though (and if it does, it's not the end of the world).

Would it be possible to remove the call to crdb_internal.create_tenant all together?

It sounds like we'll have to do _some_ additional work to when a sql pod establishes a connect to check if it is in fact allowed to write. Could that initial check handle doing the work of create_tenant if the tenant needs to be bootstrapped?

If not possible or desirable, I agree with @asubiotto.

Hmm, seems possible. We would be digging the "KV has to run a system tenant" grave a foot deeper, but it's already too deep anyway, might as well embrace it.

However, it's also lacking. In the future tenants will have metadata like custom rate limits, etc. We are robbing ourselves of an opportunity to set them with this strategy. I think it is more sensible that creating tenants is a separate step, and that tenants can not connect until they "exist".

I think this will fall out of #53477. We will simply say that if there is no TenantInfo in Gossip yet for that tenant, then the tenant has zero rate limits, and a zero limit means reject requests. So we are making sure that if KV serves a tenant - it will be in the table. This is mod theoretical races, where you add a tenant, remove and re-add it (but it's really a new tenant), but I think that is ok (and even that may go away once we do distributed rate limits).

Was this page helpful?
0 / 5 - 0 ratings