Terraform-provider-azurerm: AKS: Incorrect vm_size default for version >= v2.14.0

Created on 14 Aug 2020  ยท  8Comments  ยท  Source: terraform-providers/terraform-provider-azurerm

Starting with release v2.14.0 creating an AKS cluster without specifying a vm_size for the default_node_pool defaults to a sku that does not meet the min. requirements of more than 2 cores and 4GB memory.

Error: creating Managed Kubernetes Cluster "kbstacctest-ops-westeurope" (Resource Group "terraform-kubestack-testing"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="SystemPoolSkuTooLow" Message="System node pool must use VM sku with more than 2 cores and 4GB memory. Nodepool name: default."

Either the vm_size attribute should be mandatory and not have a default. Or the default should work. Weirdly enough, it seems no change has been made to this part of the schema.

question servickubernetes-cluster

Most helpful comment

๐Ÿ‘‹ I came here from the linked issue and was a bit concerned AKS would be making behaviorally breaking changes as described. Looking at the commits, @pst seems correct -- changing to 2020-03-01 means you started using system and user pools, which enforced extra validation.

I don't see that in the AKS changelog anywhere, although it's doc'd on the relevant site: https://docs.microsoft.com/en-us/azure/aks/use-system-pools#system-and-user-node-pools Could probably call that out as a behavior change in the release notes, though..

All 8 comments

hi @pst

Thanks for opening this issue.

The vm_size field within the default_node_pool block of the azurerm_kubernetes_cluster resource is Required rather than Optional & Defaulted - so users are required to specify a value here (meaning that Terraform doesn't have a default VM Sku assigned here). As such unfortunately whilst this is a breaking behavioural change within the AKS Service, since this field is Required (and thus has to be specified by users) - you'd need to update your Terraform Configuration to use a larger value here.

Since this should be fixed by updating the Terraform Configuration being used here - I'm going to close this issue for the moment - but please let us know if updating the vm_size doesn't work for you and we'll take another look.

Thanks!

cc @JeffreyRichter here's another breaking behavioural change in an existing API

I understand the schema defines it as required. But it does so for both v2.13.0 and v2.14.0. Yet with identical TF configuration v2.13.0 creates a cluster successfully while v2.14.0 fails with above error during apply.

If the attribute is required, Terraform should error during validate or plan already. It does not, it starts to apply and then fails mid-way.

Do you see my point now @tombuildsstuff?

@pst unfortunately this is a breaking behavioural change in the AKS API - not something specific to Terraform. The difference in Provider behaviour here is likely due to the version of the AKS API being used changing between versions 2.13 and 2.14 of the Azure Provider.

Where the AKS API has a history of shipping breaking behavioural changes to existing API versions - we're only able to apply best-effort validation for this resource rather than more granular as we'd like. The API itself doesn't expose this information via an API endpoint - and so from our side we're forced to lean on the AKS API's validation here - rather than validating against a dictionary of all of the valid SKU's for each VM type at plan time (which could change at any time behind the scenes, as is the case in this breaking behavioural change to the Azure API).

As such whilst this breaking behavioural change is unfortunate - by the nature of this being an API change there's not much that Terraform could do ahead of time to prevent this.

Thanks!

I've deleted my last comment. I made a mistake, I did provide a value through a lookup() somewhere else in my config.

But that a breaking change like this between two minor versions is just immediately accepted like this in one of the first class Terraform providers is still disappointing at best.

@pst this isn't a breaking change in Terraform - this is a breaking change by Microsoft in the Azure/AKS API - I'd suggest filing an issue with Microsoft on the AKS repository - but unfortunately we're at the mercy of the API here.

This TF provider changed the API version it is using under the hood between the two releases. If the API you migrated to has breaking changes, you have breaking changes too now. This should have been reflected in the semver.

Mistakes happen and this is nothing personal, but I strongly disagree with your view on this.

If that Frank Turner concert will ever happen I owe you a beer. I'll let this rest now.

๐Ÿ‘‹ I came here from the linked issue and was a bit concerned AKS would be making behaviorally breaking changes as described. Looking at the commits, @pst seems correct -- changing to 2020-03-01 means you started using system and user pools, which enforced extra validation.

I don't see that in the AKS changelog anywhere, although it's doc'd on the relevant site: https://docs.microsoft.com/en-us/azure/aks/use-system-pools#system-and-user-node-pools Could probably call that out as a behavior change in the release notes, though..

I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error ๐Ÿค– ๐Ÿ™‰ , please reach out to my human friends ๐Ÿ‘‰ [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings