Azure-storage-azcopy: azcopy command hangs

Created on 26 Jul 2019  ·  53Comments  ·  Source: Azure/azure-storage-azcopy

Which version of the AzCopy was used?

AzCopy 10.2.1

Which platform are you using? (ex: Windows, Mac, Linux)

Linux Server release 6.10

What command did you run?

azcopy ( without any argument) same thing happens for other success command , failed command errors out and doesnt get hung .

What problem was encountered?

azcopy commands are getting hung .

How can we reproduce the problem in the simplest way?

Download the mentioned version and run the command in Linux 6 or Linux 7

Have you found a mitigation/solution?

NO

I have to press control C to cancel azcopy command to come in main prompt

# azcopy
AzCopy 10.2.1
Project URL: github.com/Azure/azure-storage-azcopy

AzCopy is a command line tool that moves data into/out of Azure Storage.
To report issues or to learn more about the tool, go to github.com/Azure/azure-storage-azcopy

The general format of the commands is: 'azcopy [command] [arguments] --[flag-name]=[flag-value]'.

Usage:
  azcopy [command]

Available Commands:
  copy        Copies source data to a destination location
  env         Shows the environment variables that can configure AzCopy's behavior
  help        Help about any command
  jobs        Sub-commands related to managing jobs
  list        List the entities in a given resource
  login       Log in to Azure Active Directory to access Azure Storage resources.
  logout      Log out to terminate access to Azure Storage resources.
  make        Create a container/share/filesystem
  remove      Delete entities from Azure Storage Blob/File/ADLS Gen2
  sync        Replicate source to the destination location

Flags:
      --cap-mbps uint32      caps the transfer rate, in Mega bits per second. Moment-by-moment throughput may vary slightly from the cap. If zero or omitted, throughput is not capped.
  -h, --help                 help for azcopy
      --output-type string   format of the command's output, the choices include: text, json. (default "text")
      --version              version for azcopy

Use "azcopy [command] --help" for more information about a command.
^C

I am attaching strace output of the this command ( just removed domain names from the output)
(thanks - I've downloaded and saved the strace - JohnRusk)

All 53 comments

There is nothing in logs

[root@dev-mado21 ~]# ls -al .azcopy/
total 12
drwxr-xr-x   3 root root 4096 Jul 25 10:15 .
dr-x------. 23 root root 4096 Jul 26 10:03 ..
drwxr-xr-x   2 root root 4096 Jul 25 10:15 plans
[root@dev-mado21 ~]# ls -al .azcopy/plans/
total 8
drwxr-xr-x 2 root root 4096 Jul 25 10:15 .
drwxr-xr-x 3 root root 4096 Jul 25 10:15 ..
[root@dev-mado21 ~]# ls -al .azcopy/plans/
total 8
drwxr-xr-x 2 root root 4096 Jul 25 10:15 .
drwxr-xr-x 3 root root 4096 Jul 25 10:15 ..

Very weird. Thanks for logging it. It's not immediately clear what's going on! Can you tell me a little about the specs of the machine/VM where its running?

Oh, and just checking, this is Red Hat Linux that we're talking about, right?

yes, RHEL 6 machine.

azcopy command hangs , I have to press control c to come out of the command and get the prompt back .

How many CPUs on the machine? Just one by any chance?

BTW, I've logged this for investigation. I don't have a timeframe for that yet, sorry.

No, This machine have 2 vcpu , and I have tried azcopy in other machine which has more vcpu .
same story.

Tried N number of times , but still same story , Are you guys able to see this issue in your environment.

tar -xvzf /stage/redhat/tmp/azcopy_linux_amd64_10.2.1.tar.gz
azcopy_linux_amd64_10.2.1/
azcopy_linux_amd64_10.2.1/azcopy
azcopy_linux_amd64_10.2.1/ThirdPartyNotice.txt
[root@dev-dragon21 ~]# cd azcopy_linux_amd64_10.2.1/
[root@dev-dragon21 azcopy_linux_amd64_10.2.1]# ./azcopy
AzCopy 10.2.1
Project URL: github.com/Azure/azure-storage-azcopy

AzCopy is a command line tool that moves data into/out of Azure Storage.
To report issues or to learn more about the tool, go to github.com/Azure/azure-storage-azcopy

The general format of the commands is: 'azcopy [command] [arguments] --[flag-name]=[flag-value]'.

Usage:
  azcopy [command]

Available Commands:
  copy        Copies source data to a destination location
  env         Shows the environment variables that can configure AzCopy's behavior
  help        Help about any command
  jobs        Sub-commands related to managing jobs
  list        List the entities in a given resource
  login       Log in to Azure Active Directory to access Azure Storage resources.
  logout      Log out to terminate access to Azure Storage resources.
  make        Create a container/share/filesystem
  remove      Delete entities from Azure Storage Blob/File/ADLS Gen2
  sync        Replicate source to the destination location

Flags:
      --cap-mbps uint32      caps the transfer rate, in Mega bits per second. Moment-by-moment throughput may vary slightly from the cap. If zero or omitted, throughput is not capped.
  -h, --help                 help for azcopy
      --output-type string   format of the command's output, the choices include: text, json. (default "text")
      --version              version for azcopy

Use "azcopy [command] --help" for more information about a command.
^C

We have never seen it happen, and no-one else has reported it to us. I have not personally tested on RHEL.

This could be my environment , but I have older version working fine , it is just this one which is giving issue.

When you say older version, do you mean AzCopy 8, or an older version of 10?

it is azcopy 7.2.0

------------------------------------------------------------------------------
azcopy 7.2.0-netcore Copyright (c) 2018 Microsoft Corp. All Rights Reserved.
------------------------------------------------------------------------------
# azcopy is designed for high-performance uploading, downloading, and copying
data to and from Microsoft Azure Blob, and File storage.

# Command Line Usage:
    azcopy --source <source> --destination <destination> [options]

# Options:
    [--source-key] [--dest-key] [--source-sas] [--dest-sas] [--verbose] [--resume]
    [--config-file] [--quiet] [--parallel-level] [--source-type] [--dest-type]
    [--recursive] [--include] [--check-md5] [--dry-run] [--preserve-last-modified-time]
    [--exclude-newer] [--exclude-older] [--sync-copy] [--set-content-type] [--blob-type]
    [--delimiter] [--include-snapshot]

------------------------------------------------------------------------------
For azcopy command-line help, type one of the following commands:
# Detailed command-line help for azcopy      ---   azcopy --help
# Detailed help for any azcopy option        ---   azcopy --help source-key
# Command line samples                       ---   azcopy --help sample
You can learn more about azcopy at http://aka.ms/azcopy.
------------------------------------------------------------------------------

I am interested in latest version as it has authentication with service principal, which is not available in older versions.

OK, I've run some tests and found the cause of the problem. The problem only happens on help commands (i.e. AzCopy with no parameters or with --help on the command line). It happens when the domain name aka.ms cannot be resolved - because, when running a help command, AzCopy checks a specific aka.ms URL to see if a newer version has been released. Unfortunately, with help commands, that check is done synchronously - i.e. AzCopy waits until the URL is loaded. And that never happens if the domain name can't be resolved. (E.g. your DNS or doesn't support it, or your machine can only see Azure and not the public internet)

Fortunately, the commands that actually do stuff, e.g. copy data, all use an asynchronous version of that check (i.e. they don't wait for the answer). So all of those commands will run fine. It is only the help commands that are affected.

I'll make sure we fix this in a future release, but for now, you can safely use CTRL-C to exit the help commands, as you have been doing, and you won't see the problem on the copy and sync commands.

@JohnRusk This happens with all subcommands of azcopy .
As I said , If the entered command errors out , it gives me prompt immediately , but when I enter a valid command , it gets hung.

It happens even if you successfully transfer a file?

I am not able to login to do the transfer . ( Login succeed but it just stays there , doesnt return the prompt .
Main objective to use azcopy to automate our file transfer in script .

DO you have any interim fix for that ? I reckon it must be similar to what you pointed earlier . It might be trying to resolve some hostnames from internet for which host will not have access to .

Thanks for that info. I don't know the cause of the login issue. Which type of login are you using: MSI, Service Principal or just the default one, where it sends you to a website to complete the login?

I am using service principal , and authentication is successful but it doesn’t give me prompt back , it gets hung there .

Thanks for that info. I'll have a look for the cause on Monday

I've had a look. It turns out that the login code path invokes the same synchronous call to aka.ms. (The documentation in the code says its only the help commands that do that, but it looks like its help _and_ login.)

You can safely use CRTL-C to exit at that point (and your login will still work). But I'm not sure whether that will be a workable workaround for you. We'll fix the bug in our next release, which is likely to be at around the end of August.

Thats no good . People use azcopy kind tool to automate file movement from/to azure storage , any manual interruption defeats the purpose.

Do you have any workaround for this ?

Maybe you could programmatically kill the AzCopy process from a bash script. E.g. something like the approach discussed here: https://stackoverflow.com/questions/5789642/how-to-send-controlc-from-a-bash-script Just be carefult to choose an approach that guaranteeds you kill the right PID.

If you want to be sure that the login has indeed completed before you kill it, you could probably grep the output of keyctl show to look for the AzCopy key that gets put in there when the login succeeds. (You can see if it if run one login with manually CTL-C, and then run keyctl show)

BTW, the fix for this issue (login in cases where aka.ms cannot be resolved) is currently in code-review, and due for release in our next release, as noted above.

One last thing, we recently did some minor updates to the guidance here, on authentication for unattended scenarios: https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10 . Basically the guidance is to use Managed Service Identity (MSI) if running on an Azure VM, and Service Principal Authentication (with a client secret or, even better, a certificate) if running on-premises. The plain 'AzCopy login' (with no parameters) is not suitable or intendend for scripted use.

@imlight sorry for the inconvenience. In order to help us establish a repro of the issue, could you please elaborate a bit on the network environment of your VM? Is it in Azure? Does it have a proxy?

Good point Ze. I have assumed that the machine does have some network connectivity (eg at least to the storage account in question), and just can't reach aka.ms. But I could be wrong. Maybe there's a proxy that's the only way out, and AzCopy is not using the proxy for anything, and so getting all traffic blocked.

This host is part of our on premise infrastructure , no cloud VM.

We do have proxy setting in place to access azure storage account from our internal network.

Can you point out which domain names are required access to make azcopy running ?
(Just wondering why these domains were added in latest azcopy version as compared to older one which didnt had this issue )

Just

  1. access to your storage account and
  2. to https://aka.ms/azcopyv10-version-metadata which currently redirected to https://azcopyvnext.azureedge.net/releasemetadata/latest_version.txt

The links in 2 were added in v10 so we can automatically notify users of updates. However, the update check should not block usage of the app if it fails. As discussed, there's currently a bug there, where it does block usage of the app in some cases, as you have reported.

Re the proxy, to make AzCopy use it, you need to set the right environment variable, as discussed here: https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-configure

From my host machine

  1. I am able to login to azure account with AD credentials ( that justifies I have internet access azure)
  2. Able to download https://aka.ms/azcopyv10-version-metadata & https://azcopyvnext.azureedge.net/releasemetadata/latest_version.txt from myserver without any error .

But still azcopy command hangs .

Were both 1 and 2 are done through some tool that is _not_ AzCopy? If so, I think that confirms Ze's proxy theory and (as you suggest) disproves my aka.ms resolution theory. (Although that bug does also exist in the code. It's just probably not causing _your_ problem).

Can you check that the necessary environment variable is set, to allow AzCopy to use the proxy: https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10

Hopefully, that will solve the issue. I'm sorry it has taken us so long to figure out what's happening!

1 and 2 is tested with CURL and WGET .

When I use azcopy for login , with service principal , it gets succeed and I get below message (and then command gets stuck there , which suggests azcopy is using environment variable http_proxy and https_proxy )

INFO: SPN Auth via secret succeeded.

Yes, that does indeed suggest it can use the proxy. I wonder if possibly the version check is _not_ using the proxy. That would explain what you are seeing. Let me check....

Nope, that does not look like the cause, since the version check code is (correctly) invoking the exact same proxy usage code as the rest. I wonder if its something to do with the fact that aka.ms does a redirect, and somehow that's not being handled properly, but I can't think how that could be...

I'll need to do some more testing. Thanks for all the info you've provided so far.

PS Our proxy server test environment is booked today for other work, but I'll get onto the testing as soon as it becomes available.

Hello Gents, Any update on this ?

Sorry, not yet. Am caught up in some high-priority stuff. This is next on my list.

@imlight I've had a look at it now.

I don't have an exact cause, but I do have a way you can get more information, and hopefully that will be enough to diagnose the problem.

The process is as follows:

  1. Run ./azcopy (don't need any parameters)
  2. Wait for well over 30 seconds. E.g. wait for 1 minute.
  3. CTRL-C to kill it.
  4. There should now be a system log message. On my RHEL 7.x test environment, I can view the logged message with sudo tail /var/log/messages

That should display a message from AzCopy saying why it failed to reach the aka.ms URL. My guess is that it's probably either something related to the proxy, or something related to SSL. E.g. Maybe Go's supported TLS protocols don't intersect with those supported by your OS and aka.ms or maybe your OS doesn't trust the signing cert at aka.ms. Or maybe it will be something totally different.

In any case, if there is an error message there, and if it relates to either of the version check URLs, as noted above, that will tell us two things:
(a) We'll know what's causing the problem and
(b) We'll know that the changes in #519 will indeed fix it.

On the other hand, if there is no error in the log... then it will still be a mystery.

You were right, I see error message in /var/log/messages

Aug  6 15:10:43 dev-mado21 /usr/local/bin/azcopy[11714]: 2019/08/06 15:10:43 ==> REQUEST/RESPONSE (Try=6/197.259391ms, OpTime=56.606956225s) -- REQUEST ERROR#012   GET https://aka.ms/azcopyv10-version-metadata?timeout=901#012   User-Agent: [AzCopy/10.2.1 Azure-Storage/0.7 (go1.12; linux)]#012   X-Ms-Client-Request-Id: [6a58d472-662e-4eb2-7a15-1d5596521702]#012   X-Ms-Version: [2018-11-09]#012   --------------------------------------------------------------------------------#012   ERROR:#012-> github.com/Azure/azure-pipeline-go/pipeline.NewError, /home/vsts/go/pkg/mod/github.com/!azure/[email protected]/pipeline/error.go:154#012HTTP request failed#012#012Get https://aka.ms/azcopyv10-version-metadata?timeout=901: x509: certificate signed by unknown authority#012#012goroutine 1 [running]:#012github.com/Azure/azure-storage-azcopy/ste.stack(0xc0001fcf90, 0xc0001f2900, 0x0)#012#011/home/vsts/work/1/s/ste/xferLogPolicy.go:139 +0x9d#012github.com/Azure/azure-storage-azcopy/ste.NewRequestLogPolicyFactory.func1.1(0xc732e0, 0xc00006aba0, 0xc0001f2900, 0xa74420, 0x1176788, 0x0, 0x0)#012#011/home/vsts/work/1/s/ste/xferLogPolicy.go:106 +0x662#012github.com/Azure/azure-pipeline-go/pipeline.PolicyFunc.Do(0xc0000a2dc0, 0xc732e0, 0xc00006aba0, 0xc0001f2900, 0x0, 0xc00009c130, 0xc65100, 0xc0001a4450)#012#011/home/vsts/go/pkg/mod/github.com/!azure/[email protected]/pipeline/core.go:43 +0x44#012github.com/Azure/azure-storage-azcopy/ste.NewVersionPolicyFactory.func1.1(0xc732e0, 0xc00006aba0, 0xc0001f2900, 0x115a980, 0x8, 0x203000, 0x203000)#012#011/home/vsts/work/1/s/ste/mgr-JobPartMgr.go:69 +0x1b2#012github.com/Azure/azure-pipeline-go/pipeline.PolicyFunc.Do(0xc0001b52c0, 0xc732e0, 0xc00006aba0, 0xc0001f2900, 0xc0001bccc0, 0x457970, 0xc000078c00, 0x0)#012#011/home/vsts/go/pkg/mod/github.com/!azure/[email protected]/pipeline/core.go:43 +0x44#012github.com/Azure/azure-storage-blob-go/azblob.responderPolicy.Do(0xc66ac0, 0xc0001b52c0, 0xc0001a81e0, 0xc732e0, 0xc00006aba0, 0xc0001f2900, 0x115a980, 0x10, 0x10, 0xc000479250)

But let me confirm I am able to download

[root@dev-mado21 ~]# wget https://aka.ms/azcopyv10-version-metadata --no-check-certificate
--2019-08-06 15:13:15--  https://aka.ms/azcopyv10-version-metadata
Resolving PROXY...PROXY_IP
Connecting to PROXY...PROXY_IP|:80... connected.
WARNING: cannot verify aka.ms’s certificate, issued by “/C=US/ST=California/O=Zscaler Inc./OU=Zscaler Inc./CN=Zscaler Intermediate Root CA (zscalertwo.net) (t) ”:
  Unable to locally verify the issuer’s authority.
Proxy request sent, awaiting response... 301 Moved Permanently
Location: https://azcopyvnext.azureedge.net/releasemetadata/latest_version.txt [following]
--2019-08-06 15:13:16--  https://azcopyvnext.azureedge.net/releasemetadata/latest_version.txt
Connecting to PROXY...PROXY_IP|:80... connected.
WARNING: cannot verify azcopyvnext.azureedge.net’s certificate, issued by “/C=US/ST=California/O=Zscaler Inc./OU=Zscaler Inc./CN=Zscaler Intermediate Root CA (zscalertwo.net) (t) ”:
  Unable to locally verify the issuer’s authority.
Proxy request sent, awaiting response... 200 OK
Length: 7 [text/plain]
Saving to: “azcopyv10-version-metadata.3”

100%[=====================================================================================================================================>] 7           --.-K/s   in 0s

2019-08-06 15:13:16 (691 KB/s) - “azcopyv10-version-metadata.3” saved [7/7]

I have added aka.ms in our proxy list and rightly as you see , I am able to download the version file ( but with --no-check-certificate argument ) . How do I make this working in my current setup ?

Great, thanks. Those two pieces of information are enough to confirm what's going on. AzCopy's error message contains this text

certificate signed by unknown authority

I think that's the real error when AzCopy tries to access the aka.ms link. (I don't know why that gets wrapped in a "timeout" error, maybe because the cert error causes it to never complete).

Also, when you do wget locally, as you mentiond you are supressing the cert error with --no-check-certificate, which confirms that your local machine does not trust the cert that is being provided for the two version check URLs (the aka.ms one, and the link it redirects to).

Why doesn't it trust that cert? Because the traffic is being intercepted and re-signed by a ZScaler device/service. (We can see that in the cert name). So the cert it sees is not the original one provided by Azure (which it would trust).

Why isn't this a problem for the main traffic, to the login URL when you log in? I don't know for sure, but I'm guessing that there may be an exception in ZScaler to let the login URL through un-encrypted.

But what about AzCopy traffic to your Storage account? (I.e. after you log in, and you want to transfer some data). Will it be re-signed by ZScaler. I'm not sure. That depends on how the exceptions are set up in ZScaler.

How can you make everything work?

  1. Login. For now, the only way to get login to work is to either

    1. install on your AzCopy machine a root cert that will allow that machine to verify the ZScaler cert. The ZScaler doc on this is brief, and I think you'll probably need to get your company's ZScaler admin person to help, if you want to install that certificate on your machine. I don't know how to install a cert on your OS. You'll need to look that up, or get your ZScaler admin person to help. OR
    2. Get your company's ZScaler admin person to add the two version check URLs as exceptions, that ZScaler will not intercept.
  2. What about after you log in, how can you make real traffic to Azure storage work? As noted above, we don't know yet whether you'll also have the same cert problem with your actual Storage traffic. One way to test this is to do a login manually, then hit CTRL-C when it freezes (after it displays the message INFO: SPN Auth via secret succeeded.). Then you'll be logged in. Try doing a test AzCopy transfer - e.g. move just one file, to a test container in Blob Storage. If that works, then you know that there's no certificate problem with the main traffic, and the problem only affects the two version check URLs. But, if that fails, then you know there's a problem for the main traffic too, and you need to either install the ZScaler cert (as above) or get your Storage URL added as an exception (as above) so that ZScaler will not intercept it. If you have the choice, getting it added as an exception is probably best for performance. When you've finished this test, do AzCopy logout and hit CTRL-C after it hangs, to log back out.

Hope this helps.

Thanks for detailed answer.

When I do this , it fails ,

wget https://aka.ms/azcopyv10-version-metadata
--2019-08-06 16:58:13--  https://aka.ms/azcopyv10-version-metadata
Resolving PROXY
Connecting to PROXY|:80... connected.
ERROR: cannot verify aka.ms’s certificate, issued by “/C=US/ST=California/O=Zscaler Inc./OU=Zscaler Inc./CN=Zscaler Intermediate Root CA (zscalertwo.net) (t) ”:
  Unable to locally verify the issuer’s authority.
To connect to aka.ms insecurely, use ‘--no-check-certificate’.

But when I do this , it works fine ( I got help from our Zscalar admin to add below URL in allow list )

wget https://azcopyvnext.azureedge.net/releasemetadata/latest_version.txt
--2019-08-06 17:01:51--  https://azcopyvnext.azureedge.net/releasemetadata/latest_version.txt
Resolving PROXY
Connecting to PROXY|:80... connected.
Proxy request sent, awaiting response... 200 OK
Length: 7 [text/plain]
Saving to: “latest_version.txt.2”

100%[=====================================================================================================================================>] 7           --.-K/s   in 0s

2019-08-06 17:01:52 (545 KB/s) - “latest_version.txt.2” saved [7/7]

But Zscalar admin when he tried to add aka.ms in allow list he suggests it doesnt have any certificate.

As I notice , when I say no-check-certificate on aka.ms , it redirects the URL to *.azyreedge.net .

@JohnRusk We have tried part 2 of your 1st solution , and given aka.ms is redirecting and doesnt have any certificate , it is failing . Check my previous update . With no check certificate option it does a redirect , but with no "no-check-certificate" it fails to do redirect and errors out.
I am just thinking is there a better way to address this .
Meanwhile I will keep working with my team.

And regarding Point 2 . We have been using azcopy and az cli ( older versions ) in our infrastructure so , we didnt had any issues to connect azure from our on-prem.

Looks like this is what you need to get the cert of https://aka.ms
https://serverfault.com/questions/700812/view-the-ssl-certificate-of-a-page-that-immediately-redirects-to-another (I think it does have a cert and that process should get it for you)

This issue is fixed now, thanks for your relentless help.

For everyone benefit I had to add install root cert of ZSCALER in my linux host by adding the cert in files like
/etc/pki/tls/certs/ca-bundle.trust.crt
/etc/pki/tls/certs/ca-bundle.crt

And azcopy is working fine now.

Thanks for the update. I will close the issue now.

@JohnRusk I am having the same issue, but the logs are not being written in Windows 10. Any tips?

EDIT: I was running it with no internet connection, so that was the issue

Hi @jonathonwpowell, the tool was retrying the failed requests, not hanging. It fails eventually.

@jonathonwpowell I see your closed your other issue, are you still awaiting a reply to the question about tips? Also, are you asking about the hanging (which is fixed in the up-coming 10.3 release) or certificate issues with ZScaler?

On Windows 10, take a look at the Event Log. Serious errors in AzCopy generally get logged there.

@JohnRusk Thanks for the quick response! I was trying to use AzCopy from a VM that wasn't connected to the internet, once I realized that and connected it to the internet I no longer had any issue.

Yeah, that'll do it ;-)

Thanks for your reply.

Note: there are multiple issues in this thread. The one still requiring code change is now tracked separately in #599

I got the same issue on ubuntu18.04(docker) with azcopy10.6.0: azcopy cp just hang with INFO: Scanning...

I solved this problem by running wget https://azcopyvnext.azureedge.net/releasemetadata/latest_version.txt beforehand.
Looks like wget did some works and fixed things.

Thanks for letting us know @cdpath, do you know if there are any proxy/firewall in your environment?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Icybiubiubiu picture Icybiubiubiu  ·  4Comments

AMoghrabi picture AMoghrabi  ·  5Comments

DavidLafond picture DavidLafond  ·  5Comments

brettrowberry picture brettrowberry  ·  4Comments

LoofahBu picture LoofahBu  ·  3Comments