Azure-storage-azcopy: azcopy brakes while read line pipe

Created on 22 Apr 2020  路  23Comments  路  Source: Azure/azure-storage-azcopy

Which version of the AzCopy was used?

Note: The version is visible when running AzCopy without any argument

10.3.4

Which platform are you using? (ex: Windows, Mac, Linux)

rhel 6.10

What command did you run?

Note: Please remove the SAS to avoid exposing your credentials. If you cannot remember the exact command, please retrieve it from the beginning of the log file.

grep "string" file | while read line; do azcopy copy "${line}"; done

What problem was encountered?

even if there are multiple lines in the file (like complete urls), only the first line is processed. next read line just returns (as if the pipe is empty). no error whatsoever and couldn't figure out till now why that may be. The pipe:[x] file descriptor is still available

Example from /proc/pid/fd of the script containing the azcopy:

dr-x------ 2 root root 0 Apr 22 18:33 .
dr-xr-xr-x 8 root root 0 Apr 22 18:33 ..
lr-x------ 1 root root 64 Apr 22 18:35 0 -> pipe:[165068451]
l-wx------ 1 root root 64 Apr 22 18:35 1 -> file.log
l-wx------ 1 root root 64 Apr 22 18:35 2 -> file.log

lsof lists the pipe (a sleep 5000 was put right after as to be able to debug). Script running as cron job.

bash 27276 root 0r FIFO 0,8 0t0 165068451 pipe
sleep 27329 root 0r FIFO 0,8 0t0 165068451 pipe

My guess is that it's related to azcopy being able to write content to blob storage from pipe (must be doing something with stdin).

How can we reproduce the problem in the simplest way?

create a file named file with multiple lines like:
testfile1 https://.....

and run the above command

Have you found a mitigation/solution?

force an empty stdin to azcopy:

: | azcopy whatever

(thanks @stewartadam for the obvious workaround)

All 23 comments

looks like it may be due to the second usage bellow (azcopy cp --help excerpt) as i guess it doesn't check if a source file parameter is passed to azcopy. Doesn't seem to corrupt this first file but it behaves as if it consumes the content of the pipe (stdin) when processing it and therefore the pipe gets consumed and no other files are being copied due to while read line existing on EOF.

Upload a single file by using a SAS token:

  • azcopy cp "/path/to/file.txt" "https://[account].blob.core.windows.net/[container]/[path/to/blob]?[SAS]"

Upload a single file by using a SAS token and piping (block blobs only):

  • cat "/path/to/file.txt" | azcopy cp "https://[account].blob.core.windows.net/[container]/[path/to/blob]?[SAS]"

I haven't quite understood what you're aiming to do here, sorry. What is in your text file, is it a set of pairs of (local filename, remote URL)?

it doesn't matter what is in the text file (except for context and syntax purposes). Let's say something like this (simplified) - on each line:

string2 sometext
string1 /tmp/file1 https://storageaccount1.blob.core.windows.net/container1/file1?[SAS]
string1 /tmp/file1 https://storageaccount2.blob.core.windows.net/container1/file1?[SAS]
string1 /tmp/file1 https://storageaccount3.blob.core.windows.net/container1/file1?[SAS]
...
string1 /tmp/file1 https://storageaccountn.blob.core.windows.net/container1/file1?[SAS]
string2 someother text

Basicaly copy the same file to multiple storage accounts using a while read loop in something like the following (the actual logic is a bit more complex but same idea):

grep "string1" file | awk '{$1=""; print $0}' | while read line; do azcopy copy "${line}"; done

So you have essentially a set of lines, where each line is a set of parameters to AzCopy? In that case, the first thing I'd suggest you check is whether azcopy thinks its getting one parameter or many? (You want many, right - at least 2, source and dest?) To check that, I'd add at bogus switch to the end of your lines in the file, like --this-does-not-exist. If AzCopy fails and says that not a valid command line option, then it is getting separate parameters. If it doesn't, then I'd suspect that its getting all the params in such a way that it things there's only one parameter (which happens to have spaces in it).

Oh, one thing that might be important: what you're trying to do looks a bit similar to something that we already provide native support for. Have a look at this, and see if it suits: https://github.com/Azure/azure-storage-azcopy/wiki/Listing-specific-files-to-transfer

it can't think that it's getting only one parameter as it copies the first file (disregard the " as it was plaed ad-hoc as an example).

The functionality mentioned does not allow copying to multiple blob stores the same file from what i saw.

It should be easily reproducible ;)

We don't have time to reproduce this this week, sorry @andrei-muresanu.

BTW, it doesn't yet sound like an actual AzCopy bug. It still sounds like something about the way your script is invoking it (although I don't know exactly what the issue might be).

I believe this is an azcopy bug, I can easily reproduce. Mention above is correct, azcopy appears to be manipulating stdin and exhausting pipes that don't belong to it, causing bash to terminate its loop prematurely.

Workaround is to force an empty stdin to azcopy:

: | azcopy whatever

Example repro: since azcopy doesn't support more than once source argument, one must call azcopy multiple times if the files are in different directories (or if those directories contain more than the specific files to be uploaded):

find repo_path -name '*.pdf' | while read line;do
  azcopy copy "$line" https://dst-uri-with-sas
done

Only a single file is uploaded. If you swap to any other command, e.g. echo, the loop proceeds as usual:

find repo_path -name '*.pdf' | while read line;do
  echo azcopy copy "$line" https://dst-uri-with-sas
done

IIRC, it is only supposed to read stdin if it thinks its in pipe mode (i.e. if it thinks you did not pass it a source and a destination, but only passed it one). I'm wondering if we need to change it so that pipe mode can only in invoked if you supply some kind of parameter to explicitly opt in to that mode.

Linux convention for some commands is to use a single dash instead of the source parameter as a mark to use stdin which could be adopted, but I鈥檓 curious why azcopy would think it needs to be in stdin mode here - it definitely was passed a src and dst argument...

I'm curious about the same thing, and I don't know the answer.

Even i am facing the same issue with azcopy command with while loop by reading the text file where i am trying to copy the files to multiple blob stores. It reads the first line and copy to the blob and terminates. However just with the echo loop it works fine. Any workaround for this problem.

I'm also encountering this issue. If use while read line loop, it doesn't hit second loop. (It occurs only if the processing in the loop is azcopy.) On the other hand, if use for loop, we can success even after second loop.

e.g.) cat /tmp/uploadfile.txt
file1
file2

Fail case: (only run for "file1")

while read line  
do  
  azcopy copy "/tmp/$line" "blob_path"  
done < /tmp/uploadfile.txt

Success case: (run for "file1" and "file2")

for line in `cat /tmp/uploadfile.txt`
do  
  azcopy copy "/tmp/$line" "blob_path"  
done

This issue has not resolved yet in the latest v10.5.0. Has the design changed that we cannot use "while read line" loop after v10.3.4?
@JohnRusk, would you know the plan this issue will be resolved?

attn @adreed-msft or @zezha-msft

@adreed-msft or @zezha-msft still an issue with 10.6.0.

If it helps narrow down the code path, it does not happen when no destination is provided:

$ echo -e "one\ntwo\nthree" | while read line;do azcopy copy $i; done
INFO: The parameters you supplied were Source: '~pipe~' of type Pipe, and Destination: '5' of type Local
... # error output repeated three times

But when a bogus http destination is provided, it kills the loop:

$ echo -e "one\ntwo\nthree" | while read line;do azcopy copy $i https://foo; done
INFO: The parameters you supplied were Source: '5' of type Local, and Destination: 'https://foo' of type Local
... # error output only happens once

@andrei-muresanu ,

We've fixed std. pipe related flows in the recent releases. Please take a look once. Reach out if the issue persists?

Confirmed working, thanks!

I'm also encountering this issue. If use while read line loop, it doesn't hit second loop. (It occurs only if the processing in the loop is azcopy.) On the other hand, if use for loop, we can success even after second loop.

e.g.) cat /tmp/uploadfile.txt
file1
file2

Fail case: (only run for "file1")

while read line  
do  
  azcopy copy "/tmp/$line" "blob_path"  
done < /tmp/uploadfile.txt

Success case: (run for "file1" and "file2")

for line in `cat /tmp/uploadfile.txt`
do  
  azcopy copy "/tmp/$line" "blob_path"  
done

This issue has not resolved yet in the latest v10.5.0. Has the design changed that we cannot use "while read line" loop after v10.3.4?
@JohnRusk, would you know the plan this issue will be resolved

I too am facing the issue. Do you know how I can resolve it ?

@sailpradh per above, have you tried with the latest azcopy version?

@stewartadam I am using the latest version of azcopy( v10.10.0) in ubuntu (v20.04) version. I solved the problem rather using the for loop rather than the for loop .

I am having the same problem on newest version # azcopy --version
azcopy version 10.11.0

This doesn't work
find ${BACKUPS_PATH} -type f -mmin ${BACKUPS_AGE} -print0 |
while IFS= read -r -d $'\0' file; do
azcopy copy "${file}" "${AZURE_STORAGE_ACCOUNT}${BACKUPS_CONTAINER}/${file#${BACKUPS_PATH}}" --put-md5 --log-level=ERROR
done

But this works
find ${BACKUPS_PATH} -type f -mmin ${BACKUPS_AGE} > /tmp/uploadfiles.txt
for file in cat /tmp/uploadfiles.txt
do
azcopy copy "${file}" "${AZURE_STORAGE_ACCOUNT}${BACKUPS_CONTAINER}/${file#${BACKUPS_PATH}}" --put-md5 --log-level=ERROR
done

Please reopen this issue!

Please reopen the issue. For now I could use a workaround, but the issue exists.

I am also facing this issue with azcopy version 10.11.0.

@siminsavani-msft could you please take a look? It seems that the issue is not fully resolved (I haven't tried a repro). Please try the examples listed here.

Was this page helpful?
0 / 5 - 0 ratings