I am getting a "ClientAuthenticationError: IMDS endpoint unavailable" when the IMDS endpoint is available (in an ACI container). I checked that the IMDS endpoint was available and able to return an access token by sending a request via curl.
I am getting this when attempting to create an azure.keyvault.secrets.SecretClient with azure.identity.ManagedIdentityCredential().
I traced the issue to the ImsCredential class, where the following call is raising a DecodeError exception:
https://github.com/Azure/azure-sdk-for-python/blob/a2bf32a647da4ca63f85f51f800bd3a5886fb733/sdk/identity/azure-identity/azure/identity/_credentials/managed_identity.py#L138
Which originates again from this call in AuthnClient:
https://github.com/Azure/azure-sdk-for-python/blob/a2bf32a647da4ca63f85f51f800bd3a5886fb733/sdk/identity/azure-identity/azure/identity/_authn_client.py#L263
Ultimately leading to:
https://github.com/Azure/azure-sdk-for-python/blob/a2bf32a647da4ca63f85f51f800bd3a5886fb733/sdk/core/azure-core/azure/core/pipeline/policies/_universal.py#L382
And this is the error I ultimately get: "azure.core.exceptions.DecodeError: Cannot deserialize content-type: text/plain"
I'm assuming this has to do with the request that is sent to check if the endpoint exists.
I'm able to work around this by calling request_token() explicitly via a wrapper around the credential:
class MyCredentialWrapper:
def __init__(self, credential):
self._client = credential._client
self._client_id = credential._client_id
def get_token(self, *scopes, **kwargs):
token = self._client.get_cached_token(scopes)
if not token:
resource = scopes[0]
if resource.endswith("/.default"):
resource = resource[: -len("/.default")]
params = {"api-version": "2018-02-01", "resource": resource}
if self._client_id:
params["client_id"] = self._client_id
token = self._client.request_token(scopes, method="GET", params=params)
return token
And using that in place of the ManagedIdentityCredential:
credential = MyCredentialWrapper(ManagedIdentityCredential())
secret_client = SecretClient(key_vault_uri, credential)
I am hoping this can be fixed. Thanks!
Update: it seems like this may be a problem directly related to working from Azure Container Instances. We are not running into the same issue when doing the same thing in Azure Functions, for example.
Also, the above workaround only appears to work some of the time when deploying on ACI (maybe around 75% success).
Thanks for reporting this. It looks similar to an issue with AKS pod identity (Azure/aad-pod-identity#340), whose controller sends JSON responses as text/plain. I don't know whether ACI uses the same implementation but that would explain the DecodeError. I'm working on test infrastructure for container scenarios now, I'll add ACI to my list.
Any updates on this? Is this on the roadmap? Or is there any reliable workaround for this?
Pod Identity's text/plain issue was fixed in version 1.5.4 and I haven't seen it since (1.5.5 is the latest). Are you still seeing "DecodeError: Cannot deserialize content-type: text/plain"?
I should have been clearer: We got an error message while using an ACI with managed identity:
ImdsCredential: IMDS endpoint unavailable
We did not dig further into this and did not see the decode error, but only suspected that this issue is affecting us. However, we get an at least similar error while trying to access a key vault with the managed identity of an ACI:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/azure/identity/_credentials/default.py", line 105, in get_token
return super(DefaultAzureCredential, self).get_token(scopes, *kwargs)
File "/usr/local/lib/python3.8/site-packages/azure/identity/_credentials/chained.py", line 71, in get_token
raise ClientAuthenticationError(message=error_message)
azure.core.exceptions.ClientAuthenticationError: No credential in this chain provided a token.
Attempted credentials:
EnvironmentCredential: Incomplete environment configuration. See https://aka.ms/python-sdk-identity#environment-variables for expected environment variables
ImdsCredential: IMDS endpoint unavailable
We simply tried to adapt
https://github.com/Azure-Samples/azure-sdk-for-python-keyvault-secrets-get-set-managedid
to ACI. The example works fine if we deploy the container to a web app.
Workaround (do not use, see below)
We found a rather simple workaround:
1.: We installed the Azure CLI in our python slim base image
# install Azure CLI to use the managed identity of the ACI
# https://docs.microsoft.com/de-de/cli/azure/install-azure-cli-apt?view=azure-cli-latest
RUN curl -sL https://aka.ms/InstallAzureCLIDeb | bash
2.: We run
CMD az login --identity && python XXX
on startup. Az login sets the environment variables such that we can simply use DefaultAzureCredential(or ManagedIdentityCredential()) in python as expected.
There's no interaction between azure-identity and the Azure CLI (there will be in the future). Your workaround works because it takes time. ACI containers commonly start before the managed identity endpoint is ready. An application can therefore fail to authenticate when it tries to do so immediately upon starting but succeed when its first authentication attempt comes some seconds after starting.
This is an issue on AKS as well, where I've worked around it with an init container. I think you could do something similar for ACI with a liveness probe (see my comment here).
Thank you, @chlowell, you are completely right! Could we have read this somewhere in the docs?
Workaround (better)
If it is okay for you that your container sleeps a bit before it starts to do its work, you can simply use:
CMD sleep 30 && python XXX
Off-topic
Is this delay also the reason why one can't use an ACI's managed identity to deploy images from Azure container registries?
Could we have read this somewhere in the docs?
Not in the azure-identity docs; possibly in docs for the AKS/ACI managed identity implementation (Pod Identity).
Is this delay also the reason why one can't use an ACI's managed identity to deploy images from Azure container registries?
I think that's due to other details of the ACI implementation I can only speculate about. I'd guess the cluster deploying the container instance can't use a user-assigned managed identity to pull an image.
I'm closing this issue now because the initial problem, a deserialization error when using pod identity, has been fixed upstream. Feel free to open another issue if you encounter other problems.
Thank you, @chlowell, you are completely right! Could we have read this somewhere in the docs?
Workaround (better)
If it is okay for you that your container sleeps a bit before it starts to do its work, you can simply use:
CMD sleep 30 && python XXXOff-topic
Is this delay also the reason why one can't use an ACI's managed identity to deploy images from Azure container registries?
Thank you, @chlowell, you are completely right! Could we have read this somewhere in the docs?
Workaround (better)
If it is okay for you that your container sleeps a bit before it starts to do its work, you can simply use:
CMD sleep 30 && python XXXOff-topic
Is this delay also the reason why one can't use an ACI's managed identity to deploy images from Azure container registries?
I am facing the same issue , while trying to run a container as an executable. deployed in a virtual network, it needs to access vault using system assigned identity.
My code usually runs 3-5 minutes and work's fine if i use a service principle by passing AZURE_CLIENT_ID,AZURE_CLIENT_SECRET,AZURE_TENANT_ID as environment variables.
I want to use managed identities to avoid passing environment variables.
I tried the below arguments while creating the container, but it still didn't work.
--command-line "/bin/bash -c 'sleep 90; /usr/local/bin/python xxxx.py'"
2020-05-21:02:09:37,349 INFO [_universal.py:412] Request URL: 'http://169.254.169.254/metadata/identity/oauth2/token'
2020-05-21:02:09:37,349 INFO [_universal.py:413] Request method: 'GET'
2020-05-21:02:09:37,349 INFO [_universal.py:414] Request headers:
2020-05-21:02:09:37,349 INFO [_universal.py:417] 'Metadata': 'REDACTED'
2020-05-21:02:09:37,349 INFO [_universal.py:417] 'User-Agent': 'azsdk-python-identity/1.3.1 Python/3.8.3 (Linux-4.15.0-1082-azure-x86_64-with-glibc2.2.5)'
2020-05-21:02:09:37,352 DEBUG [connectionpool.py:226] Starting new HTTP connection (1): 169.254.169.254:80
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/azure/identity/_credentials/default.py", line 105, in get_token
return super(DefaultAzureCredential, self).get_token(*scopes, **kwargs)
File "/usr/local/lib/python3.8/site-packages/azure/identity/_credentials/chained.py", line 71, in get_token
raise ClientAuthenticationError(message=error_message)
azure.core.exceptions.ClientAuthenticationError: No credential in this chain provided a token.
Attempted credentials:
EnvironmentCredential: Incomplete environment configuration. See https://aka.ms/python-sdk-identity#environment-variables for expected environment variables
ImdsCredential: IMDS endpoint unavailable
@praaadip: You are facing a different (additional) issue, that is why the sleep alone does not help:
As you can read in the docs in the other limitations section,
You can't use a managed identity in a container group deployed to a virtual network.
you cannot use managed identities in combination with virtual networks. I was told this is about to change, but cannot provide you further info.
@praaadip: You are facing a different (additional) issue, that is why the sleep alone does not help:
As you can read in the docs in the other limitations section,
You can't use a managed identity in a container group deployed to a virtual network.
you cannot use managed identities in combination with virtual networks. I was told this is about to change, but cannot provide you further info.
Thank you. i was in an assumption that 90 seconds is not enough and totally missed to check the limitations.
Most helpful comment
Thanks for reporting this. It looks similar to an issue with AKS pod identity (Azure/aad-pod-identity#340), whose controller sends JSON responses as text/plain. I don't know whether ACI uses the same implementation but that would explain the
DecodeError. I'm working on test infrastructure for container scenarios now, I'll add ACI to my list.