I am currently running docker containers that execute azure cli commands. Some of the azure cli commands involve "az group create" / "az group deploy" / "az vm create" / "az vm extension set" etc. It is not all the time, but these commands hang at times.
Delving deeper into the processes that are being run, I found many such following processes by running htop.
113 jenkins 20 0 93708 18324 3152 S 0.0 0.1 0:00.38 /usr/lib/python2.7/site-packages/azure/cli/core/telemetry_upload.pyc
These processes run for hours at times. When I kill these processes, my scripts and commands continue so I believe there is some connection to my scripts hanging due to these telemetry_upload calls.
I saw that there used to be issues with the telemetry_upload in the past, but I thought that the issues were patched. Can I get some eyes on this possible issue?
This situation has been happening for the past week and hitting a stumping point.
Install Method (e.g. pip, interactive script, apt-get, Docker, MSI, edge build) / CLI version (az --version) / OS version / Shell Type (e.g. bash, cmd.exe, Bash on Windows)
Installation Method:
pip install azure-cli
Environment that uses the azure-cli:
Docker Containers
CLI version
```bash-4.3$ az --version
azure-cli (2.0.31)
acr (2.0.23)
acs (2.0.31)
advisor (0.5.1)
appservice (0.1.31)
backup (1.1.1)
batch (3.2.0)
batchai (0.2.0)
billing (0.1.8)
cdn (0.0.14)
cloud (2.0.13)
cognitiveservices (0.1.12)
command-modules-nspkg (2.0.1)
configure (2.0.15)
consumption (0.3.0)
container (0.1.22)
core (2.0.31)
cosmosdb (0.1.20)
dla (0.0.19)
dls (0.0.21)
eventgrid (0.1.12)
eventhubs (0.1.2)
extension (0.0.12)
feedback (2.1.1)
find (0.2.9)
interactive (0.3.19)
iot (0.1.19)
keyvault (2.0.21)
lab (0.0.21)
monitor (0.1.5)
network (2.0.28)
nspkg (3.0.2)
profile (2.0.22)
rdbms (0.2.1)
redis (0.2.12)
reservations (0.1.2)
resource (2.0.27)
role (2.0.22)
servicebus (0.1.2)
servicefabric (0.0.12)
sql (2.0.25)
storage (2.0.31)
vm (2.0.30)
Python location '/usr/bin/python'
Extensions directory '/jenkins/.azure/cliextensions'
Python (Linux) 2.7.14 (default, Feb 22 2018, 22:05:16)
[GCC 6.2.1 20160822]
Legal docs and information: aka.ms/AzureCliLegal
Linux 5df116efa954 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 12:24:47 UTC 2017 x86_64 GNU/Linux```
This comment may help until the underlying issue is resolved
https://github.com/Azure/azure-cli/issues/6248#issuecomment-385115522
You can disable telemetry with an environment variable.
Thanks @derekbekoe! We turned off telemetry prior to this issue being posted using configure but it seemed like it didn't work. Using the environment variable helped, so thank you for the link!
I would still like to know what is exactly going on with the telemetry_upload.py in order for it to stall so much.
@steveyang95 when you cancel/interrupt/kill the process do you manage to get a stack? that will be helpful
@steveyang95 do you happen to have installed any azure cli extesion? If so do you mind provide the list of the extension you installed?
@troydai I used strace on the process id and got the following:
bash-4.3$ strace -p 2159
strace: Process 2159 attached
read(3, <unfinished ...>) = ?
+++ killed by SIGKILL +++
But if there is something else that you would like, please let me know the commands to run.
As for the az cli extensions, there is none that I am aware of. I do use az vm extension set to use custom scripts.
thanks for the information @steveyang95 . the blocking is at the client.flush() where the application insight is blocked trying to transmit data. I'll see what I can do.
Follow, to see the result.
And update has been made to the telemetry_upload.py to ensure a 10-second timeout: https://github.com/Azure/azure-cli/pull/6290#event-1608892477
The change will ensure a telemetry upload process will not live longer than 10 seconds when the request is failed. This will avoid further accumulating processes.
Thanks @troydai!!
You're welcome @steveyang95 . We're still tracking the Applicaiton Insight service issue internally, which is the root cause of the problem.
@troydai
Has #6290 been pushed out ?
Reason being that I've just completed a large upload to azure of many files and I've been left with a bunch of telemetry_upload processes.
For example, you can see this process started at 09:25 which is many hours ago now...
root 95822 0.0 0.3 222856 15036 pts/1 S 09:25 0:00 /usr/lib64/az/bin/python /usr/lib64/az/lib/python2.7/site-packages/azure/cli/core/telemetry_upload.pyc {"c4395b75-49cc-422c-bc95-c7d51aef5d46": [{"name": "azurecli/command", "properties": {"Reserved.DataModel.ProductName": "azurecli", "Context.Default.VS.Core.TelemetryApi.ProductVersion": "[email protected]", "Context.Default.AzureCLI.Params": "--validate-content,--no-progress,--content-md5,-c,-f,-n", "Reserved.DataModel.FeatureName": "command", "Context.Default.AzureCLI.DefaultOutputType": "unknown", "Reserved.DataModel.EntitySchemaVersion": 4, "Context.Default.AzureCLI.ClientRequestId": "26902c28-5687-11e8-8cf3-000c299b467e", "Reserved.DataModel.Source": "DataModelAPI", "Context.Default.VS.Core.User.IsOptedIn": "True", "Reserved.SequenceNumber": 1, "Context.Default.AzureCLI.Source": "az", "Reserved.DataModel.CorrelationId": "151b55f3-2db0-413b-b3ad-95d2345449ff", "Context.Default.VS.Core.User.Id": "7505b598-55cf-11e8-96a5-000c299b467e", "Context.Default.VS.Core.OS.Version": "#1 smp wed may 9 18:05:47 utc 2018", "Context.Default.AzureCLI.StartTime": "2018-05-13 08:25:10.232124", "Context.Default.VS.Core.User.IsMicrosoftInternal": "False", "Context.Default.AzureCLI.ShellType": "/bin/bash", "Context.Default.AzureCLI.OutputType": "json", "Reserved.DataModel.EntityType": "UserTask", "Reserved.ChannelUsed": "AI", "Context.Default.AzureCLI.RawCommand": "storage blob upload", "Context.Default.AzureCLI.EndTime": "2018-05-13 08:25:10.784942", "Reserved.DataModel.EntityName": "storage-blob-upload", "Context.Default.VS.Core.MacAddressHash": "6c27f38b9bce933d73763d377e5be0fd1a5fdc46db8a48a4f3555aad260b0c7a", "Context.Default.VS.Core.OS.Type": "linux", "Context.Default.AzureCLI.PythonVersion": "2.7.5", "Context.Default.AzureCLI.Locale": "en_GB,UTF-8", "Context.Default.VS.Core.ExeName": "azurecli", "Reserved.DataModel.Severity": 0, "Context.Default.VS.Core.Machine.Id": "6c27f38b-9bce-933d-7376-3d377e5be0fd", "Context.Default.AzureCLI.InstallationId": "7505b598-55cf-11e8-96a5-000c299b467e", "Reserved.EventId": "3f619f13-3d4a-412a-9432-9be622b0e8f4", "Reserved.DataModel.Action.Result": "Success", "Context.Default.AzureCLI.EnvironmentVariables": "[]", "Reserved.TimeSinceSessionStart": 0, "Reserved.SessionId": "5291997d-a516-4c5b-bfad-0258beb2db62", "Context.Default.AzureCLI.CoreVersion": "2.0.32", "Context.Default.VS.Core.ExeVersion": "2.0.32@none", "Reserved.DataModel.Action.Type": "Atomic"}}]}
It will not be released until 5/22. For the time being, you can disable the telemetry through az configure command.
Having the same problem (well, a loop hanging after telemetry_upload.py stops responding and doesn't return).
However, i wonder why should the default be to upload telemetry data (what the hell is that for???) when i am just trying to download a blob. This doesn't feel good in "open source" software... guys, why are you uploading OS version, Python version and machine id strings to Azure?? Bad MS habits?
Changelog for 2.0.34 claims to have some improvements that might be relevant to this issue
Improved telemetry upload reliability
source: https://docs.microsoft.com/en-us/cli/azure/release-notes-azure-cli?view=azure-cli-latest#june-5-2018
@stevenyang95 is this still a problem for you in recent releases?
@troydai I have disabled the telemetry overall and have not seen any issues.
Most helpful comment
thanks for the information @steveyang95 . the blocking is at the
client.flush()where the application insight is blocked trying to transmit data. I'll see what I can do.