It takes seconds for any command to complete, the first time you run Azure CLI it's really really slow to get going. It's especially annoying that even the built-in help takes more than a couple of seconds to display.
Install Method: How did you install the CLI? (e.g. pip, interactive script, apt-get, Docker, MSI, nightly)
MSI
CLI Version: What version of the CLI and modules are installed? (Use az --version)
azure-cli (2.0.16)
acr (2.0.11)
acs (2.0.14)
appservice (0.1.15)
batch (3.1.2)
billing (0.1.4)
cdn (0.0.7)
cloud (2.0.7)
cognitiveservices (0.1.7)
command-modules-nspkg (2.0.1)
component (2.0.7)
configure (2.0.10)
consumption (0.1.4)
container (0.1.9)
core (2.0.15)
cosmosdb (0.1.12)
dla (0.0.11)
dls (0.0.13)
eventgrid (0.1.3)
feedback (2.0.6)
find (0.2.6)
interactive (0.3.9)
iot (0.1.11)
keyvault (2.0.9)
lab (0.0.10)
monitor (0.0.9)
network (2.0.13)
nspkg (3.0.1)
profile (2.0.11)
rdbms (0.0.6)
redis (0.2.8)
resource (2.0.13)
role (2.0.11)
servicefabric (0.0.3)
sf (1.0.8)
sql (2.0.10)
storage (2.0.14)
vm (2.0.13)
Python (Windows) 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 17:54:52) [MSC v.1900 32 bit (Intel)]
Python location 'C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\python.exe'
OS Version: What OS and version are you using?
Windows 10 Pro 64-bit
Shell Type: What shell are you using? (e.g. bash, cmd.exe, Bash on Windows)
cmd.exe
On Windows, the python may be impacted by the Windows Defender (or the other AV programs) for the startup. You can verify if it is indeed the root cause by adding python.exe to your Windows Defender (or other AV) whitelist.
However, we do NOT recommend this as a solution. It is important to keep your environment secure.
Okay, so I did some digging, here's what I found.
Real-time system protection had basically no impact, we're talking sub 100 ms if any.
Then I used to procmon to try and figure out what it was that was taking time. There's a lot of file scanning going on in the C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\Lib directory.
I noticed the occasional ACCESS DENIED error when it was trying to write back a precompiled python script and given that it's installed under Program Files that's normal.
I then ran in an elevated command prompt and the performance doubled.
I then tried running in a normal command prompt and the performance was the same. So there's a 2x perf hit right there because you are not running as an admin all the time!?
But even with a 2x perf gain it's IMAO too slow. About 1.5 seconds for each run, even if it's not doing anything.
I'm assuming you can trace this back to all the file scanning that is still happening. I think you need to have an index file so that you don't have to resort to full file scanning every time you run a command. You've made the most common case progressively slower as more commands are added to the CLI tools.
You can see my measurements in the Google spreadsheet I linked below.
https://docs.google.com/spreadsheets/d/1K-gdsBQ9fKKwYuzz39Kvc7INzY_RWsIJdF6o3hQE6M0/edit?usp=sharing
Okay, so I went back and checked two more things.
If you run
az configure
and say yes, you get to answer a few questions, I chose to disable logging and telemetry.
That's another 2x perf
Is there any reason logging and telemetry cannot be async?
edit: I updated the spreadsheet with the additional timing information.
Thank you for the measuring. And I assume you're running az for all your testing. Can you try running az cloud list on your machine and measure the time?
Telemetry is async. After the command process is stopped, a new process is started to uploading telemetry in the background (which runs telemetry_upload.py). It is not expected to impact the perceiving of the command performance.
Can you turn on the logging but keep telemetry off to test the logging's impact?
yes, lol, just running az cloud list the timings are:
~
0.2596116
0.210088
0.2070234
0.2459465
0.2079574
0.230976
0.2270889
0.2093336
0.216172
0.2167034
~
...that's another 3x.
So, yes. Here's the deal. When you run az we collect all the modules to print the full sub-commands list, therefore it reads more files. When you run a subcommand directly, for example az cloud, we can pinpoint to the individual command module and reduce the number of imported modules.
The az cloud list command performance here is in line with our tests. Do you think it is an acceptable performance?
timings with logging enabled
~~~
az
0.7710953
0.7403197
0.7441196
0.7853553
0.7542274
0.7374869
0.7466202
0.7708723
0.7846359
0.7925548
~~~
~~~
az cloud list
0.2445698
0.2627537
0.2784114
0.2815754
0.2404466
0.2456656
0.2328953
0.2678627
0.2267679
0.2247044
~~~
~~~
az with telemetry enabled
1.6212685
1.6762963
1.5584562
1.7061276
1.5949917
1.5022682
1.7164641
1.7227445
1.5963353
1.915089
~~~
~~~
az cloud list with telemetry enabled
1.3457957
1.4373729
1.0507627
0.9569793
1.3228079
1.2569795
1.2466929
1.0923828
1.17191
1.1996368
~~~
@troydai the 250 ms for az cloud list is acceptable but it doesn't take much more to make it feel sluggish which I think is sad.
more timings I think these are without telemetry, I wanted to test the execution time of the command itself without touching the network vs touching the network, I think it's still a bit too high. And I think all the file scanning is too slow on say Windows... maybe it's less of an issue elsewhere.
~~~
az resource list --help
0.6549282
0.6460973
0.6551568
0.6934735
0.6563829
0.6648705
0.6467416
0.6695768
0.6433284
0.6766644
~~~
~~~
az resource list -g ...
2.7330824
1.2484462
1.2020567
1.1738106
1.1390877
1.2295843
1.5307826
1.1470495
1.2544168
1.2158291
~~~
Ideally, no command that doesn't traverse the network should have a latency/overhead greater than 250 ms. That is my firmly held opinion, given what I've learned so far.
Thank you for the measurement.
I notice that the telemetry seems to take a toll on the performance for Windows platform (as far as I can tell we didn't observe the delay on Linux/Unix). I think spawning a new process is blocking the command from exiting and cause the performance issue here. I will do more testing on the Windows platform.
I'm glad that the 250ms response time works for you. However, in many cases, the command communicates with the Azure service. Whenever there is network communication the network latency will dwarf any other factor.
I do agree with you that we can and should do much better when the command is run locally. And thank you for bringing our attention to this.
I filed following two issues to track the specific topics:
@troydai super!
It's just really annoying having to wait for the CLI to respond if you make a typo. For example, I just started using the CLI and I use the --help command to explore commands a lot. But I had to wait for 3 seconds each time. 750 ms is better, a lot better but it's noticeable and a bit annoying. 250 ms feels great.
I agree. -h serve help information and it should be fast since it will be hit repeatedly just like your scenario.
I logged another issue regarding the performance difference between admin and user priviliege. In all, I think these three issues conclude all the discovers here.
So I will close this issue now. Please tracking above three github issue for the progress. And feel free open new github issues for any problem you meet with the Azure CLI. Again, thank you for reporting issues and the detailed measurement.
Let me ask here again what I already asked in #6265 :
why should the default be to upload "telemetry data" at all?? Inmy case that seems to be reporting my OS version, Python version and machine id strings to Azure... What the hell?
Please turn that default off.
Telemetry does not seem to be done async, or at least blocks the main command to finish if it hangs (see #6265 and #6248 ).
Disabling it using "az configure" gave me a perceived factor 2 in speed running a bash for-loop:
for BLOB in ... ; do az storage blob download --container-name .... ; done
This in on Linux, not Windows.
A year and a half later and this is still a problem, on linux anyway. Running az configure and turning off logging and telemetry makes performance bearable.
Most helpful comment
Okay, so I went back and checked two more things.
If you run
and say yes, you get to answer a few questions, I chose to disable logging and telemetry.
That's another 2x perf
Is there any reason logging and telemetry cannot be async?
edit: I updated the spreadsheet with the additional timing information.