Terraform-provider-azurerm: `azurerm_kubernetes_cluster`: `node_count` is always `nil` after initial creation

Created on 12 Mar 2020 · 13Comments · Source: terraform-providers/terraform-provider-azurerm

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

azurerm_2.1.0
TF 0.12.21

Affected Resource(s)

azurerm_kubernetes_cluster
default_node_pool

Terraform Configuration Files

...
default_node_pool {
    ...
    node_count          = 1
    enable_auto_scaling = false
    ...
}
...

Expected Behavior

No error occuring for default_node_pool.

Actual Behavior

Error: Error updating Default Node Pool "lhg-weur-kfkadevmdw-aks" (Resource Group "lhg-weur-kfkadevmdwaks-rg"): containerservice.AgentPoolsClient#CreateOrUpdate: Invalid input: autorest/validation: validation failed: parameter=parameters.ManagedClusterAgentPoolProfileProperties.Count constraint=Null value=(*int32)(nil) details: value can not be null; required parameter

I explicitly do NOT set min_count and max_count at all in that case (autoscaling = false). But when I check the state this is set (again), although if I remove the fields manually from it - maybe thats not relevant and is the default behavior anyways:

    default_node_pool {
        availability_zones    = [
            "1",
        ]
        enable_auto_scaling   = false
        enable_node_public_ip = false
        max_count             = 0
        max_pods              = 250
        min_count             = 0
        name                  = "default"
        node_count            = 1
        node_labels           = {}
        node_taints           = []
        os_disk_size_gb       = 128
        tags                  = {}
        type                  = "VirtualMachineScaleSets"
        vm_size               = "Standard_D2s_v3"
        vnet_subnet_id        = "/subscriptions/xxx/resourceGroups/yyy/providers/Microsoft.Network/virtualNetworks/zzz/subnets/aaa"
    }

Then if I change the max_count and min_count to node_count (in my example its 1) I get:

Error: Error expanding `default_node_pool`: `max_count` and `min_count` must be set to `0` when enable_auto_scaling is set to `false`

It might be the issue is because range is not 0...100, but: 1...100

https://github.com/terraform-providers/terraform-provider-azurerm/blob/55eb723a30da21919865262d49995493ddaf0ce3/azurerm/internal/services/containers/kubernetes_nodepool.go#L71

https://github.com/terraform-providers/terraform-provider-azurerm/blob/55eb723a30da21919865262d49995493ddaf0ce3/azurerm/internal/services/containers/kubernetes_nodepool.go#L84

In #6020 something similar was done for the additional node pools, but it seems not for the default_node_pool.

References

#6020

bug documentation servickubernetes-cluster

Source

wagnst

👍11

Most helpful comment

This has been released in version 2.5.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 2.5.0"
}
# ... other configuration ...

hashibot[bot] on 9 Apr 2020

👍4

All 13 comments

@wagnst thanks for opening this issue... I started looking at this and noticed if you look at the error message closer:

Error: Error updating Default Node Pool "lhg-weur-kfkadevmdw-aks" (Resource Group "lhg-weur-kfkadevmdwaks-rg"): containerservice.AgentPoolsClient#CreateOrUpdate: Invalid input: autorest/validation: validation failed: parameter=parameters.ManagedClusterAgentPoolProfileProperties.Count constraint=Null value=(*int32)(nil) details: value can not be null; required parameter

the error it is not complaining about the MinCount or MaxCount, it is complaining about the Count, which happens to be nil. Here is the structure the code is trying to build to pass to Azure:

ManagedClusterAgentPoolProfileProperties: &containerservice.ManagedClusterAgentPoolProfileProperties{
    Count:                  defaultCluster.Count,
    VMSize:                 defaultCluster.VMSize,
    OsDiskSizeGB:           defaultCluster.OsDiskSizeGB,
    VnetSubnetID:           defaultCluster.VnetSubnetID,
    MaxPods:                defaultCluster.MaxPods,
    OsType:                 defaultCluster.OsType,
    MaxCount:               defaultCluster.MaxCount,
    MinCount:               defaultCluster.MinCount,
    EnableAutoScaling:      defaultCluster.EnableAutoScaling,
    Type:                   defaultCluster.Type,
    OrchestratorVersion:    defaultCluster.OrchestratorVersion,
    AvailabilityZones:      defaultCluster.AvailabilityZones,
    EnableNodePublicIP:     defaultCluster.EnableNodePublicIP,
    ScaleSetPriority:       defaultCluster.ScaleSetPriority,
    ScaleSetEvictionPolicy: defaultCluster.ScaleSetEvictionPolicy,
    NodeLabels:             defaultCluster.NodeLabels,
    NodeTaints:             defaultCluster.NodeTaints,
    Tags:                   defaultCluster.Tags,
},

Now if you look at the object defaultCluster slice that gets created in the ConvertDefaultNodePoolToAgentPool there are quite a few attributes that do not get values, the first one being Count, which is the error that you are seeing returned from the Update call (see deserialized object below):

Name:0xc000fc0ca0 
Count:<nil> 
VMSize:Standard_D2_v2 
OsDiskSizeGB:0xc000e6203c 
VnetSubnetID:<nil> 
MaxPods:0xc000e62038 
OsType:Linux 
MaxCount:0xc000e62040 
MinCount:0xc000e62044 
EnableAutoScaling:0xc000e62035 
Type:VirtualMachineScaleSets 
OrchestratorVersion:<nil> 
ProvisioningState:<nil> 
AvailabilityZones:<nil> 
EnableNodePublicIP:0xc000e62036 
ScaleSetPriority: 
ScaleSetEvictionPolicy: 
Tags:map[] 
NodeLabels:map[] 
NodeTaints:0xc000ee9640

I'm not an expert when it comes to Kubernetes, but I will poke around a bit and see if I can figure out why the node_count isn't getting placed into the Count parameter.

WodansSon on 13 Mar 2020

👍1

@WodansSon actually yesterday evening I had exactly the same idea that it is actually not related to my code at all, rather to what comes back from AzureAPI or is validated by the autorest functionality.

I manually enabled autoscaling on that cluster and set minCount to 1 and maxCount to 1. Edited the state afterwards and then obviously there was nothing to apply anymore (and therefore no error).
But the interesting fact is, no matter which value of the default nodepool I change afterwards leads to the above Count nil error, even if its just the tags field, which is not related to that at all. So I suspect you are right that the Count field (which must be filled from node_count) does not get filled. Maybe it's because in the latest provider version its basically optional, when autoscaling is enabled (I read on TF documentation page)

wagnst on 13 Mar 2020

I'm actually getting the same issue but with a different scenario.
I try to update the max and min count with enable_auto_scaling already enabled :

   enable_auto_scaling   = true
   ~ max_count             = 3 -> 6
   ~ min_count             = 1 -> 3

And i get

Error: Error updating Default Node Pool "blah-blah-rg" (Resource Group "blah-blah-rg"): containerservice.AgentPoolsClient#CreateOrUpdate: Invalid input: autorest/validation: validation failed: parameter=parameters.ManagedClusterAgentPoolProfileProperties.Count constraint=Null value=(*int32)(nil) details: value can not be null; required parameter

mguirao on 13 Mar 2020

I think the bug originates from the ExpandDefaultNodePool function, more specifically:

// Count must be set for the initial creation when using AutoScaling but cannot be updated
autoScaledCluster := enableAutoScaling && d.IsNewResource()

// however it must always be sent for manually scaled clusters
manuallyScaledCluster := !enableAutoScaling && (d.IsNewResource() || d.HasChange("default_node_pool.0.node_count"))

if autoScaledCluster || manuallyScaledCluster {
    // users creating an auto-scaled cluster may not set the `node_count` field - if so use `min_count`
    if count == 0 && autoScaledCluster {
        count = minCount
    }

    profile.Count = utils.Int32(int32(count))
}

So what is happening is if autoScaledCluster or manuallyScaledCluster are both false the profile.Count is never set and will always be nil.

In @mguirao case, the way this plays out is that he has enable_auto_scaling enabled, but since this is not a new resource the second part of the argument fails (e.g. d.IsNewResource()) which means that the autoScaledCluster value will be false so when the condition if autoScaledCluster || manuallyScaledCluster { is evaluated both are false so the profile.Count is never set and will always have the value of nil, so when you call update you get the error that you see.

In @wagnst case, what is happening is he has enable_auto_scaling set to false so when this line of code gets evaluated manuallyScaledCluster := !enableAutoScaling && (d.IsNewResource() || d.HasChange("default_node_pool.0.node_count")), much like @mguirao's situation, neither of the && conditions are met (e.g. it is not a new resource nor has the node_count attribute changed) so manuallyScaledCluster gets set to false as well, and just like above this results in the profile.Count never being set and having the value of nil thus resulting in the same error message.

I built a private and changed the code in the ExpandDefaultNodePool function from the existing code:

if autoScaledCluster || manuallyScaledCluster {
    // users creating an auto-scaled cluster may not set the `node_count` field - if so use `min_count`
    if count == 0 && autoScaledCluster {
        count = minCount
    }

    profile.Count = utils.Int32(int32(count))
}

To this, by adding an else to the if statement that will guarantee that the profile.Count will always be assigned a value:

if autoScaledCluster || manuallyScaledCluster {
    // users creating an auto-scaled cluster may not set the `node_count` field - if so use `min_count`
    if count == 0 && autoScaledCluster {
        count = minCount
    }

    profile.Count = utils.Int32(int32(count))
} else {
    profile.Count = utils.Int32(int32(count))
}

As I said before, I am not a Kubernetes expert, so I don't know if this is the correct fix or not, but adding the else statement fixed both of the issues described here.

@wagnst, you may want to add the above else statement to your PR #6095 because without that change of being able to set max_count and min_count to 0 you will always get this error, unless you totally remove them from your configuration file.

Error: Error expanding `default_node_pool`: `max_count` and `min_count` must be set to `0` when enable_auto_scaling is set to `false`

Once you have made the change to your PR let @tombuildsstuff have a look, as I think he is the Kebernetes expert... it has all sorts of rules that I may not be privy too.

WodansSon on 14 Mar 2020

👍1

@WodansSon I added a change. Instead of else I just pulled out of the statement from the if-statement. Therefore in worst case (count is not set at all) it will be 0.

wagnst on 17 Mar 2020

👍1

I downgraded azurerm from 2.1.0 to 2.0.0 and it worked. :)

rjdkolb on 18 Mar 2020

@rjdkolb, that would make sense since in v2.1.0 the API version was upgraded from 2019-10-01 -> 2019-11-01 which would also account for the new behavior of node_count now being a required attribute for the CreateOrUpdate request.

WodansSon on 19 Mar 2020

👍2

Hello,

I had a similar problem but with enable_auto_scaling enabled on the default_node_pool of a azurerm_kubernetes_cluster.

~ default_node_pool {
            availability_zones    = []
            enable_auto_scaling   = true
            enable_node_public_ip = false
          ~ max_count             = 6 -> 7
            max_pods              = 30
            min_count             = 3
            name                  = "default"
            node_count            = 5
            node_labels           = {
                "XXXX"                    = "YYYYY"
                "WWWWW" = "ZZZZZ"
            }
            node_taints           = []
            os_disk_size_gb       = 30
            tags                  = {}
            type                  = "VirtualMachineScaleSets"
            vm_size               = "Standard_D2s_v3"
            vnet_subnet_id        = "/subscriptions/XXX-XXX-XXXX/resourceGroups/k8s-XXXXXX/providers/Microsoft.Network/virtualNetworks/k8s-vnet-XXXXX/subnets/k8s-XXXXXXX"
        }

It leads to the following error:

Error: Error updating Default Node Pool "k8s-XXXXXX" (Resource Group "k8s-XXXXXX"): containerservice.AgentPoolsClient#CreateOrUpdate: Invalid input: autorest/validation: validation failed: parameter=parameters.ManagedClusterAgentPoolProfileProperties.Count constraint=Null value=(*int32)(nil) details: value can not be null; required parameter

I tested the changing of max_count on a azurerm_kubernetes_cluster_node_pool for the same cluster and this time, it works like a charm.

~ resource "azurerm_kubernetes_cluster_node_pool" "aks_k8s_cluster_additional_node_pool" {
        availability_zones    = []
        enable_auto_scaling   = true
        enable_node_public_ip = false
        id                    = "/subscriptions/XXX-XXX-XXXX/resourcegroups/k8s-XXXXXX/providers/Microsoft.ContainerService/managedClusters/k8s-XXXXX/agentPools/highmem"
        kubernetes_cluster_id = "/subscriptions/XXX-XXX-XXX/resourcegroups/k8s-XXXXXX/providers/Microsoft.ContainerService/managedClusters/k8s-XXXXXX"
      ~ max_count             = 2 -> 4
        max_pods              = 30
        min_count             = 1
        name                  = "highmem"
        node_count            = 1
        node_labels           = {
            "XXXX"                    = "YYY"
            "WWWWW" = "ZZZZZZ"
        }
        node_taints           = [
            "....XXX....",
            "....XXX....",
        ]
        os_disk_size_gb       = 50
        os_type               = "Linux"
        tags                  = {}
        vm_size               = "Standard_E4s_v3"
        vnet_subnet_id        = "/subscriptions/XXX-XXX-XXX/resourceGroups/k8s-XXXXX/providers/Microsoft.Network/virtualNetworks/k8s-XXXXX/subnets/k8s-XXXXX"
    }

ALA-DQ on 1 Apr 2020

For this issue we could not use any of the workarounds except to downgrade to 2.0.0, but then we get issues which are fixed in 2.1.0 and beyond :/

DenisBiondic on 1 Apr 2020

Additional remark: the Count problem on the default_node_pool of the azurerm_kubernetes_cluster happened whaterver the changes we try to do, and not only the min or max number of Node.

This is very annoying because we cannot change anything for the default_node_nool...

If someone could guide me to where (what files, folder...), I may try to help with correction and PR.

ALA-DQ on 2 Apr 2020

This solves the issue:
https://github.com/terraform-providers/terraform-provider-azurerm/pull/6349

jnehlt on 3 Apr 2020

👎1

This has been released in version 2.5.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 2.5.0"
}
# ... other configuration ...

hashibot[bot] on 9 Apr 2020

👍4

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

hashibot[bot] on 6 May 2020

Was this page helpful?

0 / 5 - 0 ratings