Support mode = "System" in azurerm_kubernetes_cluster's default_node_pool.
Currently, azurerm_kubernetes_cluster_node_pool supports mode=System|User to select the type of node pool you want in your AKS cluster. However, the default node pool does not support that attribute.
resource "azurerm_kubernetes_cluster" "aks" {
default_node_pool {
name = "system"
mode = "System"
}
...
}
hi @palmerabollo
Thanks for opening this issue.
When creating an AKS cluster in Azure - the default node pool has to have a mode of "system" and cannot be changed to a user mode/pool (else cluster creation will fail) - meaning that we set the mode to "system" here and don't expose that field (since there's no other value it can be).
Whilst there's some longer-term questions around cycling the default node pool itself (e.g. should we spin up a temporary node pool, how do we handle when all external node pools are deleted [at least one has to exist]) - at this time we don't support doing so, meaning that this field can't be usably set to another value anyway unfortunately.
Since the AKS Service doesn't allow this - and the closest thing we can do is to support cycling the default node pool, I'm going to close this in favour of #7093 which is tracking cycling the default node pool - would you mind subscribing to that for updates?
Thanks!
@tombuildsstuff I was coming to open the same request, I think you are missing one thing. you can convert the default node pool into a user node pool after you add a new node pool using the System mode.
So using the CLI we can today, create an AKS cluster with a default system node pool and one extra node pool.
in a second operation add a new system node pool as the 3rd node pool and in a third operation, change the default node pool mode from system to user.
I would expect that we should be able to do the same here. the API supports it. I'm not sure why terraform couldn't support it.
https://docs.microsoft.com/en-us/azure/aks/use-system-pools#add-a-dedicated-system-node-pool-to-an-existing-aks-cluster
You can do the following operations with node pools:
Change a system node pool to be a user node pool, provided you have another system node pool to take its place in the AKS cluster.
for those reasons, could we reopen this request.
I believe there's an hint left directly in the code about that:
https://github.com/terraform-providers/terraform-provider-azurerm/blob/ae075ddf82d8ac40f945742d3ca3ae449709ac81/azurerm/internal/services/containers/kubernetes_nodepool.go#L197
Being able to change the default pool's mode would be really useful to handle more advanced AKS setups (especially if you need later on to increase the size of the system pool, which is currently only the default pool and you can't change it, AFAIK...).
@djsly ultimately that's enabled by cycling the default node pool, which is tracked in #7093 - however the main question for the default node pool remains:
Whilst there's some longer-term questions around cycling the default node pool itself (e.g. should we spin up a temporary node pool, how do we handle when all external node pools are deleted [at least one has to exist]) - at this time we don't support doing so, meaning that this field can't be usably set to another value anyway unfortunately.
We had a conversation with the AKS Team a while back around creating a cluster with no default node pool and instead only using external node pools, but unfortunately they didn't believe that was workable due to the way AKS works.
Since the AKS API requires that at least one system node pool is present - we require one at creation time (and also when all external node pools are required) - whilst we could enable the removal of the default node pool (or cycling it where the instance count changes) - the question remains:
It's technically possible to do all of these, the question is more how - ultimately these are all gated by #7093 either way, so for the moment I'm going to leave this closed in favour of that issue, would you mind subscribing to that one? Once we support cycling the default node pool (which is dependent on the questions above) then adding support for a user mode is possible - but if I'm honest this feels like a design limitation of AKS requiring the default node pool to exist (compared to say, Amazon EKS which allows connecting them later) - so it might be worth opening an issue on the AKS repository about this since that removes this problem altogether?
@tombuildsstuff thanks for clarifying the situation. Those looks all valid questions/concerns. However I'm not sure why allowing to override only the mode property of the default node pool would create major issues. I think the plan would display correctly, and then simply fail during the apply phase with the aforementioned error (unless I'm missing something).
So it would be totally up to the user to be aware of the AKS dynamics and to make sure to create the pools accordingly. In case he didn't, AKS API would surface the issue and prevent the change.
I understand this is not the best approach, as "cycling the default pool" would be much more comprehensive, but it would be a behavior we observed in many other plan != apply situations, when TF cannot know in advance if something can actually be created/changed and fails with the API error. #7093 looks far from being available for use and in the meantime this option sound a much quicker and viable intermediate feature to allow for still managing such a cluster via TF.
Otherwise in the short term I only see two options (deleting the cluster is not one unfortunately ๐ ):
PS: Sorry to insist, I'm just trying to find a viable (not necessarily perfect) course of action to handle such situation that can be applied sooner than later ๐
The AKS upstream issue sounds a really good idea, but very hard to get implemented; not in a short term most likely.
@lorenzo-biava unfortunately exposing the field will cause hard to diagnose issues for other users (and isn't possible until #7093 is done) - ultimately this isn't possible without having an answer to the questions above (or AKS removing this limitation) and so is blocked by that.
As mentioned above, I'd suggest filing an issue on the AKS Repository since this is an AKS limitation requiring the default node pool.
Not sure I follow the ask. But trying to answer the questions above:
Following up on 7093
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error ๐ค ๐ , please reach out to my human friends ๐ [email protected]. Thanks!
Most helpful comment
hi @palmerabollo
Thanks for opening this issue.
When creating an AKS cluster in Azure - the default node pool has to have a mode of "system" and cannot be changed to a user mode/pool (else cluster creation will fail) - meaning that we set the mode to "system" here and don't expose that field (since there's no other value it can be).
Whilst there's some longer-term questions around cycling the default node pool itself (e.g. should we spin up a temporary node pool, how do we handle when all external node pools are deleted [at least one has to exist]) - at this time we don't support doing so, meaning that this field can't be usably set to another value anyway unfortunately.
Since the AKS Service doesn't allow this - and the closest thing we can do is to support cycling the default node pool, I'm going to close this in favour of #7093 which is tracking cycling the default node pool - would you mind subscribing to that for updates?
Thanks!