azurerm_2.1.0TF 0.12.21azurerm_kubernetes_clusterdefault_node_pool...
default_node_pool {
...
node_count = 1
enable_auto_scaling = false
...
}
...
No error occuring for default_node_pool.
Error: Error updating Default Node Pool "lhg-weur-kfkadevmdw-aks" (Resource Group "lhg-weur-kfkadevmdwaks-rg"): containerservice.AgentPoolsClient#CreateOrUpdate: Invalid input: autorest/validation: validation failed: parameter=parameters.ManagedClusterAgentPoolProfileProperties.Count constraint=Null value=(*int32)(nil) details: value can not be null; required parameter
I explicitly do NOT set min_count and max_count at all in that case (autoscaling = false). But when I check the state this is set (again), although if I remove the fields manually from it - maybe thats not relevant and is the default behavior anyways:
default_node_pool {
availability_zones = [
"1",
]
enable_auto_scaling = false
enable_node_public_ip = false
max_count = 0
max_pods = 250
min_count = 0
name = "default"
node_count = 1
node_labels = {}
node_taints = []
os_disk_size_gb = 128
tags = {}
type = "VirtualMachineScaleSets"
vm_size = "Standard_D2s_v3"
vnet_subnet_id = "/subscriptions/xxx/resourceGroups/yyy/providers/Microsoft.Network/virtualNetworks/zzz/subnets/aaa"
}
Then if I change the max_count and min_count to node_count (in my example its 1) I get:
Error: Error expanding `default_node_pool`: `max_count` and `min_count` must be set to `0` when enable_auto_scaling is set to `false`
It might be the issue is because range is not 0...100, but: 1...100
In #6020 something similar was done for the additional node pools, but it seems not for the default_node_pool.
@wagnst thanks for opening this issue... I started looking at this and noticed if you look at the error message closer:
Error: Error updating Default Node Pool "lhg-weur-kfkadevmdw-aks" (Resource Group "lhg-weur-kfkadevmdwaks-rg"): containerservice.AgentPoolsClient#CreateOrUpdate: Invalid input: autorest/validation: validation failed: parameter=parameters.ManagedClusterAgentPoolProfileProperties.Count constraint=Null value=(*int32)(nil) details: value can not be null; required parameter
the error it is not complaining about the MinCount or MaxCount, it is complaining about the Count, which happens to be nil. Here is the structure the code is trying to build to pass to Azure:
ManagedClusterAgentPoolProfileProperties: &containerservice.ManagedClusterAgentPoolProfileProperties{
Count: defaultCluster.Count,
VMSize: defaultCluster.VMSize,
OsDiskSizeGB: defaultCluster.OsDiskSizeGB,
VnetSubnetID: defaultCluster.VnetSubnetID,
MaxPods: defaultCluster.MaxPods,
OsType: defaultCluster.OsType,
MaxCount: defaultCluster.MaxCount,
MinCount: defaultCluster.MinCount,
EnableAutoScaling: defaultCluster.EnableAutoScaling,
Type: defaultCluster.Type,
OrchestratorVersion: defaultCluster.OrchestratorVersion,
AvailabilityZones: defaultCluster.AvailabilityZones,
EnableNodePublicIP: defaultCluster.EnableNodePublicIP,
ScaleSetPriority: defaultCluster.ScaleSetPriority,
ScaleSetEvictionPolicy: defaultCluster.ScaleSetEvictionPolicy,
NodeLabels: defaultCluster.NodeLabels,
NodeTaints: defaultCluster.NodeTaints,
Tags: defaultCluster.Tags,
},
Now if you look at the object defaultCluster slice that gets created in the ConvertDefaultNodePoolToAgentPool there are quite a few attributes that do not get values, the first one being Count, which is the error that you are seeing returned from the Update call (see deserialized object below):
Name:0xc000fc0ca0
Count:<nil>
VMSize:Standard_D2_v2
OsDiskSizeGB:0xc000e6203c
VnetSubnetID:<nil>
MaxPods:0xc000e62038
OsType:Linux
MaxCount:0xc000e62040
MinCount:0xc000e62044
EnableAutoScaling:0xc000e62035
Type:VirtualMachineScaleSets
OrchestratorVersion:<nil>
ProvisioningState:<nil>
AvailabilityZones:<nil>
EnableNodePublicIP:0xc000e62036
ScaleSetPriority:
ScaleSetEvictionPolicy:
Tags:map[]
NodeLabels:map[]
NodeTaints:0xc000ee9640
I'm not an expert when it comes to Kubernetes, but I will poke around a bit and see if I can figure out why the node_count isn't getting placed into the Count parameter.
@WodansSon actually yesterday evening I had exactly the same idea that it is actually not related to my code at all, rather to what comes back from AzureAPI or is validated by the autorest functionality.
I manually enabled autoscaling on that cluster and set minCount to 1 and maxCount to 1. Edited the state afterwards and then obviously there was nothing to apply anymore (and therefore no error).
But the interesting fact is, no matter which value of the default nodepool I change afterwards leads to the above Count nil error, even if its just the tags field, which is not related to that at all. So I suspect you are right that the Count field (which must be filled from node_count) does not get filled. Maybe it's because in the latest provider version its basically optional, when autoscaling is enabled (I read on TF documentation page)
I'm actually getting the same issue but with a different scenario.
I try to update the max and min count with enable_auto_scaling already enabled :
enable_auto_scaling = true
~ max_count = 3 -> 6
~ min_count = 1 -> 3
And i get
Error: Error updating Default Node Pool "blah-blah-rg" (Resource Group "blah-blah-rg"): containerservice.AgentPoolsClient#CreateOrUpdate: Invalid input: autorest/validation: validation failed: parameter=parameters.ManagedClusterAgentPoolProfileProperties.Count constraint=Null value=(*int32)(nil) details: value can not be null; required parameter
I think the bug originates from the ExpandDefaultNodePool function, more specifically:
// Count must be set for the initial creation when using AutoScaling but cannot be updated
autoScaledCluster := enableAutoScaling && d.IsNewResource()
// however it must always be sent for manually scaled clusters
manuallyScaledCluster := !enableAutoScaling && (d.IsNewResource() || d.HasChange("default_node_pool.0.node_count"))
if autoScaledCluster || manuallyScaledCluster {
// users creating an auto-scaled cluster may not set the `node_count` field - if so use `min_count`
if count == 0 && autoScaledCluster {
count = minCount
}
profile.Count = utils.Int32(int32(count))
}
So what is happening is if autoScaledCluster or manuallyScaledCluster are both false the profile.Count is never set and will always be nil.
In @mguirao case, the way this plays out is that he has enable_auto_scaling enabled, but since this is not a new resource the second part of the argument fails (e.g. d.IsNewResource()) which means that the autoScaledCluster value will be false so when the condition if autoScaledCluster || manuallyScaledCluster { is evaluated both are false so the profile.Count is never set and will always have the value of nil, so when you call update you get the error that you see.
In @wagnst case, what is happening is he has enable_auto_scaling set to false so when this line of code gets evaluated manuallyScaledCluster := !enableAutoScaling && (d.IsNewResource() || d.HasChange("default_node_pool.0.node_count")), much like @mguirao's situation, neither of the && conditions are met (e.g. it is not a new resource nor has the node_count attribute changed) so manuallyScaledCluster gets set to false as well, and just like above this results in the profile.Count never being set and having the value of nil thus resulting in the same error message.
I built a private and changed the code in the ExpandDefaultNodePool function from the existing code:
if autoScaledCluster || manuallyScaledCluster {
// users creating an auto-scaled cluster may not set the `node_count` field - if so use `min_count`
if count == 0 && autoScaledCluster {
count = minCount
}
profile.Count = utils.Int32(int32(count))
}
To this, by adding an else to the if statement that will guarantee that the profile.Count will always be assigned a value:
if autoScaledCluster || manuallyScaledCluster {
// users creating an auto-scaled cluster may not set the `node_count` field - if so use `min_count`
if count == 0 && autoScaledCluster {
count = minCount
}
profile.Count = utils.Int32(int32(count))
} else {
profile.Count = utils.Int32(int32(count))
}
As I said before, I am not a Kubernetes expert, so I don't know if this is the correct fix or not, but adding the else statement fixed both of the issues described here.
@wagnst, you may want to add the above else statement to your PR #6095 because without that change of being able to set max_count and min_count to 0 you will always get this error, unless you totally remove them from your configuration file.
Error: Error expanding `default_node_pool`: `max_count` and `min_count` must be set to `0` when enable_auto_scaling is set to `false`
Once you have made the change to your PR let @tombuildsstuff have a look, as I think he is the Kebernetes expert... it has all sorts of rules that I may not be privy too.
@WodansSon I added a change. Instead of else I just pulled out of the statement from the if-statement. Therefore in worst case (count is not set at all) it will be 0.
I downgraded azurerm from 2.1.0 to 2.0.0 and it worked. :)
@rjdkolb, that would make sense since in v2.1.0 the API version was upgraded from 2019-10-01 -> 2019-11-01 which would also account for the new behavior of node_count now being a required attribute for the CreateOrUpdate request.
Hello,
I had a similar problem but with enable_auto_scaling enabled on the default_node_pool of a azurerm_kubernetes_cluster.
~ default_node_pool {
availability_zones = []
enable_auto_scaling = true
enable_node_public_ip = false
~ max_count = 6 -> 7
max_pods = 30
min_count = 3
name = "default"
node_count = 5
node_labels = {
"XXXX" = "YYYYY"
"WWWWW" = "ZZZZZ"
}
node_taints = []
os_disk_size_gb = 30
tags = {}
type = "VirtualMachineScaleSets"
vm_size = "Standard_D2s_v3"
vnet_subnet_id = "/subscriptions/XXX-XXX-XXXX/resourceGroups/k8s-XXXXXX/providers/Microsoft.Network/virtualNetworks/k8s-vnet-XXXXX/subnets/k8s-XXXXXXX"
}
It leads to the following error:
Error: Error updating Default Node Pool "k8s-XXXXXX" (Resource Group "k8s-XXXXXX"): containerservice.AgentPoolsClient#CreateOrUpdate: Invalid input: autorest/validation: validation failed: parameter=parameters.ManagedClusterAgentPoolProfileProperties.Count constraint=Null value=(*int32)(nil) details: value can not be null; required parameter
I tested the changing of max_count on a azurerm_kubernetes_cluster_node_pool for the same cluster and this time, it works like a charm.
~ resource "azurerm_kubernetes_cluster_node_pool" "aks_k8s_cluster_additional_node_pool" {
availability_zones = []
enable_auto_scaling = true
enable_node_public_ip = false
id = "/subscriptions/XXX-XXX-XXXX/resourcegroups/k8s-XXXXXX/providers/Microsoft.ContainerService/managedClusters/k8s-XXXXX/agentPools/highmem"
kubernetes_cluster_id = "/subscriptions/XXX-XXX-XXX/resourcegroups/k8s-XXXXXX/providers/Microsoft.ContainerService/managedClusters/k8s-XXXXXX"
~ max_count = 2 -> 4
max_pods = 30
min_count = 1
name = "highmem"
node_count = 1
node_labels = {
"XXXX" = "YYY"
"WWWWW" = "ZZZZZZ"
}
node_taints = [
"....XXX....",
"....XXX....",
]
os_disk_size_gb = 50
os_type = "Linux"
tags = {}
vm_size = "Standard_E4s_v3"
vnet_subnet_id = "/subscriptions/XXX-XXX-XXX/resourceGroups/k8s-XXXXX/providers/Microsoft.Network/virtualNetworks/k8s-XXXXX/subnets/k8s-XXXXX"
}
For this issue we could not use any of the workarounds except to downgrade to 2.0.0, but then we get issues which are fixed in 2.1.0 and beyond :/
Additional remark: the Count problem on the default_node_pool of the azurerm_kubernetes_cluster happened whaterver the changes we try to do, and not only the min or max number of Node.
This is very annoying because we cannot change anything for the default_node_nool...
If someone could guide me to where (what files, folder...), I may try to help with correction and PR.
This solves the issue:
https://github.com/terraform-providers/terraform-provider-azurerm/pull/6349
This has been released in version 2.5.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:
provider "azurerm" {
version = "~> 2.5.0"
}
# ... other configuration ...
I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!
Most helpful comment
This has been released in version 2.5.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example: