Cert-manager: Allow private key rotation when renewing certificates

Created on 27 Nov 2019  路  6Comments  路  Source: jetstack/cert-manager

Is your feature request related to a problem? Please describe.

Currently, when a certificate is renewed the existing private key is re-used in order to make rolling out of new TLS keypairs easier.

The reason we do this today is that we have to store the generated private key somewhere whilst the certificate is being issued, and we don't have a way to securely store a new private key without overwriting the existing private key that is being used to serve traffic (and thus, causing a brief period where the stored private key does not match the existing certificate whilst the renewal is in progress).

We should support a mode of operation where we atomically update both the private key AND the certificate once it has been signed. This will allow applications that are using the old key to continue to serve traffic until a newly minted keypair is available, at which point they can switch to using the new key/certificate pair.

Describe the solution you'd like

In order to do this in a 'level based way', we should adjust the issuance/renewal workflow to:

1) Given a Certificate resource that names the secretName as example-com-tls, already containing a signed and valid keypair
2) Generate a new private key and store it in example-com-tls-temp (name TBC)
3) Create a CertificateRequest with a CSR signed by this 'temporary' private key
4) Once the certificate has been obtained, update the example-com-tls keypair with both the new private key AND the new signed x509 certificate

Describe alternatives you've considered

Alternatively, we could not store the new private key in a Secret resource at all, and instead store it in-memory in cert-manager. This would work, however upon a restart of the cert-manager-controller we would no longer have access to the private key used to sign the CSR on any in-progress CertificateRequest resources. This would mean we'd need to 'restart' the entire process, potentially costing the user money, and at the very least using up API quotas.

Additional context

Without this functionality, using short certificate durations is effectively pointless, as the private component of the keypair is not changing.

/kind feature
/priority important-longterm

areapi kindesign kinfeature prioritimportant-longterm

Most helpful comment

I would like to note that I think that this feature should be opt-in, or opt-out, as it would break public key pinning.

All 6 comments

This will require a design document to clearly communicate and consider how we are going to implement it.

Moving forwards this is an important feature. Especially as security teams try to enforce shorter validity periods. This is realistically only achievable by using sophisticated key/certificate lifecycle tools.

I would like to note that I think that this feature should be opt-in, or opt-out, as it would break public key pinning.

Another issue that would benefit from a solution here would be related to Helm Charts that create Certificate objects.

For example, if I have a Certificate in a Helm Chart and a pod that depends on the underlying secret created by the Certificate I have seen cases where cert-manager creates the underlying TLS secret but it has no data. This causes the pod that depends on that secret to start and then fail if the data has yet to be populated in that secret. This also breaks the hook-weight annotations in Helm because the order of the created objects cannot be controllled.

I would like to note that I think that this feature should be opt-in, or opt-out, as it would break public key pinning.

We would implement this feature as opt-in I think, at least until we hit a v1 apiVersion, at which point we could consider flipping it (as secure by default really should be the goal).

I think the next step for this is to think about the API design. A few thoughts:

...
kind: Certificate
...
spec:
  secretName: some-secret-tls
  # issuancePolicy controls how the private key and corresponding x509 certificate
  # is issued into the `secretName` Secret resource. If 'Atomic', both the private key
  # and signed certificate will be updated in the Secret in a single atomic action.
  # If `privateKeyRenewalPolicy` is `Rotate`, it is strongly recommended to set this
  # field to `Atomic` to avoid cert-manager removing the `tls.crt` field from the
  # Secret resource during a re-issuance.
  # Can be one of `NonAtomic` or `Atomic`.
  # Defaults to 'Atomic'.
  issuancePolicy: Atomic 
  # privateKeyRenewalPolicy controls whether the private key of a Certificate should
  # be rotated or re-used upon renewal. If set to `Rotate`, you should also ensure
  # `issuancePolicy` is set to `Atomic` to avoid periods of time where signed x509
  # certificates are not available to applications.
  # Can be one of `Rotate` or `Reuse`.
  # Defaults to `Reuse`.
  privateKeyRenewalPolicy: Rotate

By separating out issuancePolicy from privateKeyRenewalPolicy, we enable users like @McShauno above to set issuancePolicy to Atomic so that they can make use of 'atomic secret updates' without necessarily enabling private key rotation.

Open questions:

1) I'm not sure whether there is much value in using an enum for issuancePolicy or if we are better to call it atomicSecretUpdates and make it a *bool. As an aside, the key issuancePolicy at the very least should change as it is not clear from reading the name what it controls (there are a lot of options that could be described as an 'issuance policy')

2) I'm not certain privateKeyRenewalPolicy is a good name/key either 馃槄

3) The above proposal suggests that issuancePolicy will default to Atomic, which is a behavioural change (as we'll no longer store tls.key in the target Secret resource by default prior to the certificate being issued). I doubt this is an issue as I cannot imagine anyone is making use of this behaviour right now, but I could be wrong. If it _is_ an issue, we may want to consider defaulting it to NonAtomic in v1alpha2 and Atomic in v1alpha3.

4) Are Rotate and Reuse exhaustive enough? @JamesGuthrie mentions the use of public key pinning - if someone is pinning their public key, they presumably want a very explicit way to signal that a private key should never be regenerated. How would this be denoted? Would this field be used? Further to that, what if no tls.key exists in the target Secret, but 'public key pinning' is 'enabled'? Would cert-manager generate a private key once and _then_ never rotate/regenerate it again?

After some discussion with @liggitt, an alternative idea would be to introduce a privateKeySecretName field to the Certificate resource.

This field would act as a reference to the private key input used to sign the CSR (which effectively decouples our 'input' and 'output' data, as right now we treat the secretName as an input as well as an output).

This has a few benefits:

  • We can easily handle @McShauno's case above, as we can store the private key data separately whilst the issuance is in progress
  • For rotation, we can allow users to annotate the Secret resource with some options to control how rotation should occur, e.g. cert-manager.io/rotation-policy which could be OnEveryCSR, Never and even Periodic (with an optional additional annotation for a duration/time period in which to rotate)

And introduces a few other things for consideration:

  • If two Certificate resources have the same privateKeySecretName, how should we handle rotation OnEveryCSR? Upon each Certificate coming to renewal time, there may be a race where they each continue trying to rotate the private key and never manage to get a CSR fully signed due to losing their respective private keys before issuance is completed.

We could address the above in a couple of ways:

1) After a CSR has been created, store a copy of the private key in-memory as well as in the Secret (mapped to the CertificateRequest resource). If the privateKeySecretName data changes, the in-memory copy will be the version issued into the secretName resource, thus forming a valid keypair. If cert-manager restarts during this time, it will attempt to read the private key for the referenced privateKeySecretName and validate it against the public key used to sign the CSR. If there is a mismatch, the pending CSR will be thrown away. If there is not, it will be read into memory as before. This situation will only occur if a user is sharing a single privateKeySecretName between Certificate resources, has enabled 'rotate on every CSR', _and_ restarts cert-manager whilst both of those Certificates are undergoing issuance.

2) We could explore storing multiple different private keys in that privateKeySecretName Secret, with keys like tls.key-{timestamp}, then referencing which one is used from the Certificate resource. This seems like it may be more complex to manage, and we'd need to introduce a mechanism to 'clean up' old private keys once they are no longer used.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jbartus picture jbartus  路  4Comments

kragniz picture kragniz  路  4Comments

Stono picture Stono  路  3Comments

howardjohn picture howardjohn  路  3Comments

jakubknejzlik picture jakubknejzlik  路  3Comments