I've have recently moved my subscription to another tenant and removed all resources to recreate them anew. The recreating failed however. It would seem that the create aks command is using an incorrect tenant id.
The create command will first create a new service principal in the correct tenant (afterwards it is there in the portal) but then will it look for that service principal to create the cluster in the incorrect (old) tenant.
az account show
{
"environmentName": "AzureCloud",
"id": "dd8e960a-3b91-4c33-ab40-***********",
"isDefault": true,
"name": "Visual Studio Professional",
"state": "Enabled",
"tenantId": "07287daf-e2f6-4594-90a6-***********",
"user": {
"name": "bjorn****************",
"type": "user"
}
}
az aks create --resource-group TestCluster --name TestCluster --node-count 1 --node-vm-size Standard_A2_v2 --kubernetes-version 1.8.1 --ssh-key-value .\clusterkey.pub
AAD role propagation done[############################################] 100.0000%
Operation failed with status: 'Bad Request'.
Details: Service principal clientID: 1b2e67f2-6ccd-4038-b76d-*********** not found
in Active Directory tenant f56e47b2-0545-44e0-ae76-***********,
Please see https://aka.ms/acs-sp-help for more details.
MSI / azure-cli (2.0.23) / Windows 10 build 1709 (16299.192) / Powershell
azure-cli (2.0.23)
acr (2.0.17)
acs (2.0.22)
advisor (0.1.0)
appservice (0.1.22)
backup (1.0.3)
batch (3.1.7)
batchai (0.1.3)
billing (0.1.6)
cdn (0.0.10)
cloud (2.0.10)
cognitiveservices (0.1.9)
command-modules-nspkg (2.0.1)
configure (2.0.12)
consumption (0.2.0)
container (0.1.15)
core (2.0.23)
cosmosdb (0.1.15)
dla (0.0.15)
dls (0.0.18)
eventgrid (0.1.5)
extension (0.0.6)
feedback (2.0.6)
find (0.2.7)
interactive (0.3.11)
iot (0.1.15)
keyvault (2.0.15)
lab (0.0.13)
monitor (0.0.13)
network (2.0.19)
nspkg (3.0.1)
profile (2.0.16)
rdbms (0.0.9)
redis (0.2.10)
reservations (0.1.0)
resource (2.0.19)
role (2.0.15)
servicefabric (0.0.7)
sql (2.0.17)
storage (2.0.21)
vm (2.0.20)
Python location 'C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\python.exe'
Extensions directory 'C:\Users\Bjorn\.azure\cliextensions'
Python (Windows) 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 17:54:52) [MSC v.1900 32 bit (Intel)]
Got the same/similar problem here. I got access to two Azure subscriptions (one from the company and another Visual Studio subscription). Now I try to create a AKS cluster for the VS subscription. First I struggled with that I couldn't create the service principal because I don't have global admin access for the company Azure subscription. So I created an Azure AD in the Visual Studio subscription, but it still failed. Then I figured out that I need to change the directory used for the VS subscription to the new Azure AD directory. Now I can create service principals, both from the CLI and portal. Creating the AKS cluster still fails.
Here is an excerpt from the debug log for the az aks create
command. I've slightly modified it to hide various identifiers. It appears that the Azure Management API uses the wrong Azure AD tenant, but I don't know how to change that. The Azure AD tenant id only shows in the error message from the Management API.
urllib3.connectionpool : Starting new HTTPS connection (1): management.azure.com
urllib3.connectionpool : https://management.azure.com:443 "PUT /subscriptions/(my-vs-sub-id)/resourceGroups/rg1/providers/Microsoft.ContainerService/managedClusters/cluster1?api-version=2017-08-31 HTTP/1.1" 400 253 msrest.http_logger : Request URL: 'https://management.azure.com/subscriptions/(my-vs-sub-id)/resourceGroups/rg1/providers/Microsoft.ContainerService/managedClusters/cluster1?api-version=2017-08-31'
msrest.http_logger : Request method: 'PUT'
msrest.http_logger : Request headers:
msrest.http_logger : 'User-Agent': 'python/3.6.1 (Windows-10-10.0.16299-SP0) requests/2.18.4 msrest/0.4.21 msrest_azure/0.4.19 azure-mgmt-containerservice/3.0.0 Azure-SDK-For-Python AZURECLI/2.0.23'
msrest.http_logger : 'Accept-Encoding': 'gzip, deflate'
msrest.http_logger : 'Accept': 'application/json'
msrest.http_logger : 'Connection': 'keep-alive'
msrest.http_logger : 'Authorization': '*'
msrest.http_logger : 'x-ms-client-request-id': '875412fe-f538-11e7-8835-fc3fdb8712c2'
msrest.http_logger : 'CommandName': 'aks create'
msrest.http_logger : 'Content-Type': 'application/json; charset=utf-8'
msrest.http_logger : 'accept-language': 'en-US'
msrest.http_logger : 'Content-Length': '913'
msrest.http_logger : Request body:
msrest.http_logger : {"location": "westeurope", "properties": {"dnsPrefix": "rg1-cluster1-e0654c", "kubernetesVersion": "1.7.7", "agentPoolProfiles": [{"name": "nodepool1", "count": 3, "vmSize": "Standard_D1_v2", "dnsPrefix": "rg1-cluster1-e0654c", "storageProfile": "ManagedDisks", "osType": "Linux"}], "linuxProfile": {"adminUsername": "azureuser", "ssh": {"publicKeys": [{"keyData": "ssh-rsa blahblahblah\n"}]}}, "servicePrincipalProfile": {"clientId": "my-service-principal", "secret": "secret-id"}}}
msrest.http_logger : Response status: 400
msrest.http_logger : Response headers:
msrest.http_logger : 'Cache-Control': 'no-cache'
msrest.http_logger : 'Pragma': 'no-cache'
msrest.http_logger : 'Content-Length': '253'
msrest.http_logger : 'Content-Type': 'application/json'
msrest.http_logger : 'Expires': '-1'
msrest.http_logger : 'x-ms-correlation-request-id': '4fe1ae1a-4bae-429b-80d2-7f2c2308f9bb'
msrest.http_logger : 'x-ms-request-id': 'f19cd32e-ab0d-4b35-9a27-62e82a3ca88d'
msrest.http_logger : 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains'
msrest.http_logger : 'Server': 'nginx'
msrest.http_logger : 'x-ms-ratelimit-remaining-subscription-writes': '1199'
msrest.http_logger : 'x-ms-routing-request-id': 'WESTEUROPE:20180109T122655Z:4fe1ae1a-4bae-429b-80d2-7f2c2308f9bb' msrest.http_logger : 'Date': 'Tue, 09 Jan 2018 12:26:55 GMT'
msrest.http_logger : Response content:
msrest.http_logger : b'{\n "code": "ServicePrincipalNotFound",\n "message": "Service principal clientID: (my-service-principal-id) not found in Active Directory tenant (company-ad-tenant-id), Please see https://aka.ms/acs-sp-help for more details."\n }'
msrest.exceptions : Operation failed with status: 'Bad Request'. Details: Service principal clientID: (my-service-principal-id) not found in Active Directory tenant (company-ad-tenant-id), Please see https://aka.ms/acs-sp-help for more details.
msrest.http_logger : b'{\n "code": "ServicePrincipalNotFound",\n "message": "Service principal clientID: (my-service-principal-id) not found in Active Directory tenant (company-ad-tenant-id), Please see https://aka.ms/acs-sp-help for more details."\n }'
@rjtsdl, to me this looks more like #5190
@yugangw-msft seems very similar to #5190 however to me it does not create the cluster no matter how often I try. And I've tried from local CLI (v2.0.23), Azure cloud shell and portal (including re-deploy the failed deployment that was created).
Since it fails across different clients (CLI/portal) the error appears to be that some part of the management API use the incorrect Azure AD tenant id to lookup the service principal that was created.
I've seen #5190 as well, but there the problem seems to be the timing between the 2 steps..
In the first step you create a service principal and afterwards as a second step u use if for the aks create.
The SP isn't yet available for use in the aks create but when you wait and retry in a minute.. It works.
The problem @ahaeber is reporting seems like the same as mine.. No mater how long you wait, the aks create won't find the SP because it is looking in the old tenant AAD instead of the new one.
This is strange because the creation of the SP is working and created in the new tenant AAD.
@hardcorehead87 , i think can you check if your ~/.azure/acsServicePrincipal.json file exist. If so, can you remove it, and do retry.
The logic is, we cached the SPN in that file, and use Sub_id as the key. I guess, you moved your sub to new Tenant, the pre-created SPN doesn't work anymore. Remove, and retry will create a new one for you.
Just tried to delete ~/.azure/acsServicePrincipal.json and execute the command again. It takes longer time, apparently because it creates a new service principal, but fails at the end with the same error. The management API still looks for the service principal in the wrong Azure Active Directory. Is there any option to specify to az
what Azure AD tenant to use for the request?
I've already tried this and have the same result as @ahaeber .. The error message still says the aks create can't find the SPN in the old tenant.
I'm using a user account that's available in both the old and new tenant.. I also tried making a user in the new tenant that's only available in the new tenant thinking that it could also be something user related but no dice. When executing the aks create command using that new-tenant-only user (remove accesstoken.json and acsServpcePrinceipal.json to be sure) still the error message shows the it's looking for the SPN in the wrong tenant.
@ahaeber @hardcorehead87 ,
So both of you have the scenario of moving from tenant to a different one?
@rjtsdl Correct. I actually have the same usecase as @ahaeber .. The azure subscription credit coming with my MSDN (visual studio subscription) is created on under the company tenant and I moved it to its own tenant.
@hardcorehead87 thx
Yesterday, I merged #5208. The PR was basically to remove some trick on SPN for AKS. I don't know if it will fix the issue your are facing :(
Do you want to try the latest azure-cli (the latest nightly build) which contains the fix? Btw, there is a azure-cli release cut-off today. @yugangw-msft . If so, you probably would get latest from official channel very soon.
@rjtsdl Docker image with you recent change can be used:
docker run -v ${HOME}:/root -it azuresdk/azure-cli-python:latest
I've tried with the latest docker image (b4d588470811 age: 5 days ago) and I get the same result.
Operation failed with status: 'Bad Request'. Details: Service principal clientID: f01508c9-8bc5-4299-9899-56f7f38a22e5 not found in Active Directory tenant f56e47b2-0545-44e0-ae76-***
Cleared the json config files to force-recreate the SPN.
The SPN gets created in the new tenant but the bad request still states the previous one.
Actually :latest
was built 5 days ago, but the Docker tag for :dev
is from last night. Looks like it has the changes from #5208, so it's worth trying to see if that helps with this issue:
docker run -v ${HOME}:/root -it azuresdk/azure-cli-python:dev
Tried azuresdk/azure-cli-python:dev (a4cf731726d9 age: About an hour ago) with the same result.
@hardcorehead87 thx for trying it out.
I found a bug in our AKS/ACS rp code. It is a simple fix. I will request to get it deployed ASAP.
I will update it with ETA later.
@hardcorehead87
We have finished the rollout. Try it out :)
Unfortunately I get an other error with the latest dev image;
Finished service principal creation[##################################] 100.0000%
Operation failed with status: 'Bad Request'. Details: Changing property 'servicePrincipalProfile.clientId' is not allowed.
@hardcorehead87 are you trying to update existing cluster?
This afternoon I quickly tried it at work and after a long time eventually it failed. I assumed the creation of the cluster failed completely but apparently the AKS cluster object was created. There was however no extra ResourceGroup that contained the VMs for the cluster. This explains and answers your 'update existing cluster' question.
I've removed the AKS object and retried. It's running at the moment and taking a suspiciously long time to create the cluster. Will report back.
I've been able to create an AKS cluster and everything was assigned to the correct cluster.
Thank you for the fix and help with this issue.
@hardcorehead87 good to know. Glad it works!
Small issue where the Service Principal didn't give the AKS cluster rights to access the Registry in the same subscription. I wasn't owner of the SP so I did that and I've added it as contributer on the subscription. Doing this fixed the issue where the cluster wasn't able to retrieve the images. Don't know if it is on or the other or even both that are required for correct working.
@hardcorehead87 , well ACR has different kind of authentication. You need to manage it in the ACR policy. (Just add the SP with read permission)
Yeah, by default, it don't and shouldn't have access to ACR, even though they may in the same subscription.
Btw, ACR is private registry, while dockerhub, quay.io could be public.
@rjtsdl With the AKS clusters created previously (and all ACS clusters as well apparently.. just checked) the accompanying SP gets contributer rights on the containing subscription. On the ACR policy this results in inherited contributer rights. This happend out-of-the-box.
So not doing this anymore and needing to do it manually now is a conscious choice to deviate from the old way of doing it and not and oversight? I'm not complaining about the choice. Being aware of what applications have which rights is a good thing. I just wonder of this is documented anywhere. Had I not spend so much time with the SP stuff I wouldn't have known to look at this.
@hardcorehead87 I think it will do if you/SP are a contributor of the sub.
I'm the owner of the sub yet the created SP didn't get any rights. I'll try again tomorrow in a different test case. See if I can reproduce this behavior.
@hardcorehead87 , be careful, az aks
cache SPN based on subscription id. So make sure, you deleted ~/.azure/aksServicePrincipals.json
file, and try again. It will force to create a new SPN. I suspect you still using the cached SPN, which is created before you become owner/contributor.
Seeing as the original ticket also involves SPN issues I specifically removed the the aksServicePrincipals.json with every try I've done. But thanks for the warning. I'll report back.
@hardcorehead87 , thx, if it is still having the issue. Create another issue then. :)
HI All
This thread is a bit old so not sure if anyone will see this. We were experiencing this exact issue and tracked it down with Microsoft's assistance.
There is an issue (and it varies from region to region) with Azure AD sync / replication of service principals ie you create a service principal and it takes a short while (in our case Australia East more than 5 seconds) to be available in AD
When you run the command
az aks create
the first thing it does is try to create and then use a service principal. Our solution was to do the following
create the service principal
az ad sp create-for-rbac --name <
poll aad until the service principal was created
az ad sp list --spn <
create the aks cluster with the service principal
az aks create --resource-group <
problem solved
Hope this helps someone
Most helpful comment
@hardcorehead87 thx for trying it out.
I found a bug in our AKS/ACS rp code. It is a simple fix. I will request to get it deployed ASAP.
I will update it with ETA later.