Cloudformation-coverage-roadmap: AWS::RDS::DBCluster - In-place upgrade of EngineVersion

Created on 15 Oct 2019 · 42Comments · Source: aws-cloudformation/cloudformation-coverage-roadmap

1. Title

AWS::RDS::DBCluster - In-place upgrade of EngineVersion

2. Scope of request

e) other coverage-related issue with the resource/attribute/option

3. Expected behavior

This proposal allows you to do an in-place upgrade of EngineVersion in the DBCluster. Changing EngineVersion in AWS::RDS::DBCluster currently requires replacement of RDS cluster with downtime.

We can do an in-place upgrade of EngineVersion via ModifyDBCluster API, but a cloudformation is not. A cloudformation causes cluster replacement.

4. Suggest specific test cases

A safe upgrading of EngineVersion successes with no-replacement, no-downtime, and no data loss.
Invalid upgrading such as downgrading EngineVersion, incorrect combinations of the parameters will fail.

5. Helpful Links to speed up research and evaluation

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-rds-dbcluster.html
https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_ModifyDBCluster.html

6. Category (required) - Will help with tagging and be easier to find by other users to +1

DB (RDS)

7. Any additional context (optional)

database

Source

ueokande

👍133 🚀8

Most helpful comment

Ping @luiseduardocolon . Any news if this is actively being worked on?

SQLDudeSudarshan on 17 Jul 2020

👍16

All 42 comments

I second this proposal. It's very confusing that you can do an in place upgrade via both the console and the CLI, but when done with Cloudformation it requires replacement.

dan-lind on 15 Oct 2019

👍3

we would appreciate this very much as well. I see it is coming, thank you!

pedrini77 on 24 Dec 2019

We discovered a backwards incompatibility here, so it is back to "working on it". The kanban will reflect this change. We are keeping tabs on the progress here.

luiseduardocolon on 18 Feb 2020

We really look forward to this fix being available. This is stopping us from managing RDS Aurora version upgrades through CFN and have to revert to console/api to perform minor version upgrades. If we do either, our current CFN stack will have drifted.

SQLDudeSudarshan on 30 Apr 2020

👍13

This issue has had status "We're working on it" since February. Would it be possible to get an update on the situation from AWS? This is highly anticipated support needed from Cloudformation.

rsuomela on 8 Jun 2020

👍3

Do you have any rough estimations of when this could be implemented? We're currently deciding between removing Auroras from CF and waiting for this to be implemented. Thanks!

martinoravsky on 25 Jun 2020

I’m closely following this. I’m hoping to get a better estimate in 2 weeks. Ping me here if you don’t hear from me by mid July.

luiseduardocolon on 30 Jun 2020

👍10 👀2

Ping @luiseduardocolon . Any news if this is actively being worked on?

SQLDudeSudarshan on 17 Jul 2020

👍16

@luiseduardocolon Could it be the delay in implementing this request result from internal discussions on how to change the behavior while not causing unintended side effects?

It could be there is the assumption that many users already rely on the replacement behavior? Maybe this can be overcome in keeping the default behavior but adding a new parameter that opts into in-place updates?

Maybe the resource UpdateReplacePolicy should/could be extended with a new option UpdateInPlace?

mbj on 3 Sep 2020

👍1

@luiseduardocolon ... this fix/feauture would be really nice to have since this announcement is going to cause a lot of drift in our Aurora PostgreSQL clusters. Any updates on this?

One thing that would be helpful, is that if the database was upgraded outside of cloudformation, we could update the version in the template and it would check to see if it's the same version of the current cluster before it checks to see if it needs to replace it. At least we could get our template back in sync, and we could make other changes.

Starting September 1, 2020, the following Aurora PostgreSQL versions will no longer be available to order via the AWS Management Console, though they will still be available via the AWS CLI and AWS CloudFormation until November 19, 2020.

Deprecated PostgreSQL Minor Version(s) - 9.6.8, 9.6.9, 9.6.11, 9.6.12, Recommended Minimum Required version - 9.6.16
Deprecated PostgreSQL Minor Version(s) - 10.4, 10.5, 10.6, 10.7, Recommended Minimum Required version - 10.11
Deprecated PostgreSQL Minor Version - 11.4, Recommended Minimum Required version - 11.6

Starting at 12:00 PM Pacific Time on November 19, 2020, if you are on one of the deprecated minor versions and have not applied the minimum required version to your Amazon Aurora PostgreSQL cluster, we will schedule the relevant recommended minor version to be automatically applied during your next maintenance window. Changes will apply to your cluster during your next maintenance window even if auto minor version upgrade is disabled. Further, after November 19, 2020 you will not be able to create instances on the deprecated minor versions using the AWS CLI or AWS CloudFormation.

brianmjones on 4 Sep 2020

👍12

Any update on this? The functional difference between CloudFormation and the console means we can't rely on the former as the source of our Infrastructure-as-Code in this case.

srussellextensis on 16 Sep 2020

👀5 👍2

Any update?

craighurley on 22 Sep 2020

👀5

Bump. 👍

We have a managed template that all of our product teams use. We currently can't let them upgrade their engine version. We would also like to be able to expose a little more control so that we can further explore Serverless Aurora (it requires specific engine versions).

Thanks!

spockNinja on 30 Sep 2020

👍8

Waiting for a long time :(

bijohnvincent on 5 Oct 2020

@bijohnvincent @spockNinja @brianmjones @srussellextensis @craighurley trying to get an update now.

luiseduardocolon on 7 Oct 2020

This has been biting us since the beginning of this year. Over a year since reporting of this and still no news!! :(

monemihir on 8 Oct 2020

m17kea on 8 Oct 2020

me too please

liveFor10 on 8 Oct 2020

@armitagemderivitec @liveFor10

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
("+1" or "me too" comments generate extra noise for issue followers and do not help prioritize the request)

benbridts on 9 Oct 2020

👍9

we would still very much appreciate this update. Thank you for pushing the fix

pedrini77 on 9 Oct 2020

👍4

FYI - As a workaround we were advised by AWS Support to remove all DB resources from the cloudformation template, upgrade manually then import resources back into the template. I managed to get it working but it's easier said than done. For example, if your template includes AWS::RDS::EventSubscription resources these need removing from the template as they need to reference a DBInstance, but cloudformation doesn't support importing EventSubscription resources.

oliver-nettleton-emis on 14 Oct 2020

@luiseduardocolon great to see the issue has been moed to "coming soon". Do you know if this will be shipped before some older versions of aurora postgresql get deprecated on 19th November?

oliver-nettleton-emis on 15 Oct 2020

Definitely +1

I have the same question as _oliver-nettleton-emis_ - is there any indication that this will be shipped before the 19th Nov?

craigpickles-modis on 20 Oct 2020

Not the smoothest path but still happy I could sync my code with what's deployed without having to recreate the databases.
I was able to upgrade both RDS and Aurora from 9.6 to 10.* by following these steps; they used to trigger resource replacement, not anymore:

upgrade the instance or cluster via the console
update the CF template to use the new engine version and parameter groups compatible with it
update the CF stacks

zhelyan on 24 Oct 2020

@zhelyan I tried to follow your steps but got an error

"The specified new engine version is same as current version: 9.6.11 (Service: AmazonRDS; Status Code: 400; Error Code: InvalidParameterCombination; Request ID: 77e56f25-0327-49b9-83cd-c224d459242a; Proxy: null)"

However I did try updating the EngineVersion to a new version in my Cloudformation Template, when I executed the changeset it upgraded the database engine immediately which is what this github issue was for. I'm far happier now we've got this functionality. We just need AWS to confirm if it's fixed and provide documentation on the recommended steps.

oliver-nettleton-emis on 26 Oct 2020

@oliver-nettleton-emis yes, forgot to mention this. Oddly this workaround was only needed for Aurora.

zhelyan on 26 Oct 2020

@oliver-nettleton-emis and @zhelyan

Some time ago (about 2 months) I had to deal with this kind of upgrade as well (I took the step to create a new snapshot and restore a new cluster from that snapshot). I took 'the long way' because I was pretty sure that updating the EngineVersion in the CloudFormation documentation said Update requires: Replacement, but I could be mistaken though.

Now apparently it states: Update requires: No interruption

I think this has changed, but I'm not 100% sure (maybe someone else can confirm?). Also I'm not sure if the engine version will be updated if it was not already (usually they push out the version updates during maintenance windows if auto minor version upgrade is enabled). Considering the error message from @zhelyan it seems some validation is in place, so that might work as well?

dveijck on 2 Nov 2020

👍1

I would really like if this issue also handled the The specified new engine version is same as current version: xx.xx error.

Our pipeline did not define this property when it was created at which point the default version was v10. Now the default version is v11 which means any new deploys done get created with v11. We would really like to stick to v10 but now can't do this due to the error above.

monemihir on 3 Nov 2020

I would really like if this issue also handled the The specified new engine version is same as current version: xx.xx error.

Our pipeline did not define this property when it was created at which point the default version was v10. Now the default version is v11 which means any new deploys done get created with v11. We would really like to stick to v10 but now can't do this due to the error above.

@monemihir a workaround for this issue is to specify the engine version in the cloud formation template when creating the stack eg: 10.7. Then on subsequent updates, use cli to get the current cluster properties. If there is no change to the engine version, the cluster and instances versions can be overridden with "!Ref 'AWS::NoValue'" in the template, sidestepping the The specified new engine version is same as current version: xx.xx issue. Bumping up the version during an update, eg: from 10.7 to 10.11, does not override the version and performs an in place upgrade of the cluster.

We have a mixture of clusters in lower environments that have been auto updated from 10.7 to 10.11, with CF drift, and production cluster that is yet to be upgraded. The original issue of _Update requires: Replacement_ blocked our pipeline but the change to _Update requires: No interruption_, combined with the above workaround appears to be working, The caveat is that it hasn't been deployed in all environments as of yet.

craigpickles-modis on 3 Nov 2020

👍1

@craigpickles-modis you mean to say put EngineVersion: "10.7" in CF code and trigger a CF update even though the cluster is already at 10.11? Wouldn't that trigger a downgrade?

All our clusters have auto-updated to 10.11 and none of our CF code currently specifies the EngineVersion property.

monemihir on 4 Nov 2020

@oliver-nettleton-emis and @zhelyan

Now apparently it states: Update requires: No interruption

I noticed this change in the docs today -- tested this with a sandbox 11.6 Aurora Postgres cluster upgraded to 11.8 purely in CFN. CFN upgraded the cluster, in-place, without any downtime. This is a very welcome change.

smcdonald248 on 4 Nov 2020

👍5

@monemihir I tried the suggestion from @craigpickles-modis today and updated the cloud formation template to EngineVersion: !Ref 'AWS::NoValue' this did get around the error “The specified new engine version is same as current version: xx.xx” and allowed me to run the cloud formation stack successfully after the manual upgrade.

beckyboardman on 4 Nov 2020

👍3

Well, this update is long overdue, but it took some time to bring all the info together. Thanks for your patience.
First, to @oliver-nettleton-emis @zhelyan - I can confirm that we went through several cycles of releasing then pulling back for additional changes, but as of around 2 weeks ago, this functionality is now available (hence the docs reference for not requiring replacement).

However, we also know about the issue brought up by @oliver-nettleton-emis , and we validated that the AWS::NoValue workaround is valid (much simpler than reimporting the cluster, by the way).

Now, just so you know, we are actively testing a fix for this error that will not require said workaround for clusters, and we are looking for a January target release, since we want to be extra careful about testing it. Also, because this is released now and it works in most cases (per your experiments above), you should be able to use it with PostgreSQL. However, if one of you wants to validate/test, please let me know about any issues and I'll report those to the team while we are fixing and testing this other bug.

I am keeping this issue open until the fix mentioned above is released. So, feel free to continue to provide comments, feedback, etc. Thanks again for your patience.

luiseduardocolon on 10 Nov 2020

👍9 ❤4 😕1

It seems that these are great news, however I have a couple doubts, maybe someone that has already experimented with this can help.
In the documentation it says: _EngineVersion Required: No,_ which makes sense as some people have been using this as a workaround for one of the errors.
However, the documentation doesn't seem to be very clear on what we should expect from not specifying this parameter. Is it expected to always bring the latest version (and trigger even minor version upgrades by just running a Cloudformation update)? is it expected to just ignore whatever version there is now, so we can let the DB be upgraded during the maintenance windows?

Thanks!

mquidi86 on 13 Nov 2020

Thank you for the update, I found this improvement is available (at least in my AWS account). But it fails to upgrade major version of Aurora Cluster, suspect is DBInstanceParameterGroupName.

In my environment, CloudFormation attempts to upgrade cluster in-place then failed with this error message "The current DB instance parameter group ....(my parameter group id).... is custom. You must explicitly specify a new DB instance parameter group, either default or custom, for the engine version upgrade. (Service: AmazonRDS; Status Code: 400; Error Code: InvalidParameterCombination; Request ID: ..........; Proxy: null)"

I think CloudFormation calls ModifyDBCluster API behind the scene. The API takes parameter DBInstanceParameterGroupName that is essential for major upgrade.

Could you provide a way to set DBInstanceParameterGroupName parameter in the AWS::RDS::DBCluster resource?

saiya on 5 Jan 2021

👀4 👍1

I was trying this out today with my Aurora Postgres instance with CDK. When I changed engineVersion from 10.7 to 10.14 in the template, I saw this cloudformation diff:

Stack Database
Resources
[~] AWS::RDS::DBCluster db db<redacted>
 └─ [~] EngineVersion
     ├─ [-] 10.7
     └─ [+] 10.14
[~] AWS::RDS::DBInstance db/Instance1 dbInstance<redacted> may be replaced
 └─ [~] EngineVersion (may cause replacement)
     ├─ [-] 10.7
     └─ [+] 10.14
[~] AWS::RDS::DBInstance db/Instance2 dbInstance<redacted> may be replaced
 └─ [~] EngineVersion (may cause replacement)
     ├─ [-] 10.7
     └─ [+] 10.14

It's worth noting that the cluster is actually running 10.12, not 10.7, as we've done manual no-downtime upgrades via the UI before this CFN feature was shipped.

I'm not sure what the behavior would be if I continued. It looks slightly better than before (it used to say "requires replacement" on the cluster), but I'm not sure what the ramifications of replacing the instances are. Is it any different than the no-downtime upgrade via the UI?

blimmer on 22 Jan 2021

yes you will have downtime, even if you have multiple instances. But you will no longer lose the db content

jonny-rimek on 22 Jan 2021

@saiya I ran into the same problem performing a major upgrade from 11.x to 12.4 -- I've opened a support case.

As a stop gap, I was able to perform the major upgrade outside of CFN using console or api, then updated the template with reality to sync up the stack and resources.

smcdonald248 on 3 Feb 2021

👍1

So, I upgraded my PG Aurora from 11.6 to 11.7 via cloudformation, and it took cca 15 minutes for 5GB database. Then I did the same thing, from 11.7 to 11.8, via the console, then updating the Cloudformation template and uploading it. Took roughly the same amount of time.
But, since it's a minor version upgrade, why is RDS taking so much time to upgrade?
If this was a real postgres a plain pg_stop, upgrade binaries, pg_start would be way way way faster.
I'm just interested in internals, what is RDS actually doing with the data during this 15 minutes, knowing that I was being able to connect to the instance(s) most of the time (I think I had maybe less than one minute of downtime).

msplival on 22 Feb 2021

Hm, it seems that changing/replacing the ParameteGroup takes time. As I have one writer and one reader... Oh well :)
Neat! :) Thank you for implementing this, it was a breeze upgrading minor version trough cloudformation.

msplival on 23 Feb 2021

@smcdonald248

As a stop gap, I was able to perform the major upgrade outside of CFN using console or api, then updated the template with reality to sync up the stack and resources.

Based on what you saw, does this mean that if Cloudformation sees an engine version change from X->Y, but the cluster is already at version Y, it does nothing (which would be good and would make me happy)? Rather than seeing it as a change, and so feeling the need to replace the cluster with an identical cluster, causing downtime (which would be stupid and would make me sad)?

Edit: though does the fact this issue is closed mean this issue has actually been fixed? Certainly https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-rds-dbcluster.html#cfn-rds-dbcluster-engineversion doesn't contain anything to make me confident.

plumdog on 3 Mar 2021

@plumdog
My observation was that even though the stack update changed the engine version, the cluster was not replaced because the target version was already in use. This is not consistent behavior with CFN as any change in the template can cause replacement under the right (or wrong) circumstances.

Huge disclaimer: As with all changes that could potentially cause downtime or destroy data -- run the tests youself and see what happens before you try this with anything important!

smcdonald248 on 3 Mar 2021

Was this page helpful?

0 / 5 - 0 ratings