Aws-cdk: [aws-eks] Upgrading to v1.20.0

Created on 24 Dec 2019 · 4Comments · Source: aws/aws-cdk

As described in #5540, version 1.20.0 of the experimental @aws-cdk/aws-eks module includes new implementation for the resource providers behind Cluster and the KubernetesResource in order to address several stability issues.

This change requires a replacement of your existing EKS clusters and since this module is experimental, we decided to introduce these breaking changes without backwards compatibility. To alleviate the pain, we will publish the previous version of this module under @aws-cdk/aws-eks-legacy until March 1st, 2020. The legacy module be used as a drop-in replacement in case you wish to plan this migration.

We are aware that this can be disruptive, especially if your EKS cluster runs production workloads, but since the EKS module is still experimental, we are unable to invest the resources needed to offer a clean migration process in such cases. We are committed not to introduce breaking changes to stable modules.

If you will try to update a stack that contains an existing EKS cluster to this new version, you will get an error that the service token of a custom resource cannot be changed.

Unfortunately, this means that you will have to destroy and recreate your cluster in order to use the new aws-eks library. We understand that in production systems this requires intentional planning.

To allow you to migrate at your own time, we have published the old version under @aws-cdk/aws-eks-legacy. If you replace @aws-cdk/aws-eks with @aws-cdk/aws-eks-legacy, you stacks will stay unchanged, as well as your cluster.

When you are ready to recreate your cluster, the safest option is to follow these steps:

Delete the code that defines the EKS cluster from your CDK app
Deploy an update, and wait for your cluster to be destroyed
Take a dependency on @aws-cdk/[email protected] (or above)
Re-add your cluster definition to your CDK app
Deploy.

Alternatively you can also try to modify the logical ID of your cluster resource, so CloudFormation will think this is a new cluster and that the old cluster should be deleted. Bear in mind that this technique cannot be used if your cluster uses a physical name.

@aws-cdaws-eks managementracking

Source

eladb

👍1

Most helpful comment

Hi @eladb. I understand the reasoning and I understand that this package was marked experimental from the start. Problem is, this kind of upgrade path is really not ideal, we have invested a lot in CDK and now we fear that this kind of solution could be repeated in the future.

Can the CDK team guarantee, or at least try to commit to, that this kind of solution will not become the norm for your packages? This is also a clear violation of the semantic versioning system, a minor version upgrade should not introduce breaking changes, especially a huge one like this. I suggest that either you change the way packages are versioned to reflect their changes, or you try to respect the major-minor semantic. Otherwise, as a customer, we really cannot trust this project and will have to migrate away to avoid potentially losing money and time rebuilding our infrastructure each time there is a change in tooling.

I'm sorry if I sound harsh or angry, I'm really not but this update has scared us a lot, and management is starting to question our technical choices, which as you can imagine puts me in a really difficult position.

Thank you for your understanding

MatteoJoliveau on 15 Jan 2020

👍3

All 4 comments

Thank you for your understanding

MatteoJoliveau on 15 Jan 2020

👍3

@MatteoJoliveau thanks for your feedback.

We absolutely commit that modules that are marked "stable" will not be broken in minor versions and such migrations will not be required, but unfortunately we can't make this commitment for "experimental" modules like EKS.

Since the entire framework uses a single version line (for a myriad of reasons), we are unable to conform to semantic versioning on modules that are still unstable. This is actually not an uncommon practice in this space. Node.js uses the same approach where experimental modules in the node.js API are not bound to semantic versioning.

I believe this type of breakage is not going to be common, and we tried hard to make it possible for you to avoid the breakage by using @aws-cdk/aws-eks-legacy module until you are ready to make the switch. Had this module been already marked as "stable", this would not have been our approach, and I we would need to provide better tools for you to migrate from your existing cluster setup.

It's a nasty tradeoff between progress and stability I am sure you are familiar with from your work. For example, if EKS was already marked "stable", it means it would have been much harder to implement a robust fix for the issues this change addresses without breaking existing clusters.

We understand this could be very painful and apologize if this caused grief with your team.

eladb on 15 Jan 2020

Thank you @eladb for your reply. I understand it is not an easy task to maintain such a large and complex ecosystem of packages. We'll chart a plan to upgrade our clusters some way, and will be more cautious with experimental packages in the future. Being reassured that stable packages upgrades are handled more carefully is more than enough for us.

MatteoJoliveau on 15 Jan 2020

Have you considered moving "experimental" constructs out of the main library and into a separate package? That would serve three purposes:

Make it absolutely clear that these are not final versions.
You could then have independent upgrade paths.
You can utilize semantic versioning on those experimental features.

As far as #1, just because there is a label in the documentation does not mean that people are expecting large breaking changes on a point release. We tend to think about the entire library as either being in GA or Beta, but not a little bit of both. Having a separate library makes it crystal clear.

Although it may add more complication for you to keep track of dependencies, #2 would benefit the customers in that it would allow us to take advantage of improvements to the core library without having to deal with a possible breaking change in an experimental library. This could also work the other way around, where an experimental library can iterate faster than the stable core.

I think the advantage of #3 is obvious, and would result in happy consumers of your API. Most importantly, it would give us more confidence and trust in you as providers of a core technology.

All of this is meant as constructive advice to help you build a better project. I love the product and I just want it be as good as it can be.