Azure-storage-azcopy: Logging in v6 was very clear and v10 needs to be same as output is used for validation.

Created on 7 Dec 2018  ·  40Comments  ·  Source: Azure/azure-storage-azcopy

[Original title was: AzCopy v10 does not allow to provide verbose logging path in different location. Also logging in v6 was very clear and v10 needs to be same as output is used for validation ]

Which version of the AzCopy was used? v10

Note: The version is visible when running AzCopy without any argument

Which platform are you using? (ex: Windows, Mac, Linux)

What command did you run?

Note: Please remove the SAS to avoid exposing your credentials. If you cannot remember the exact command, please retrieve it from the beginning of the log file.

This was in v6.1 and v10 does not have option to give verbose logging path dynamically.
$result = ./AzCopy.exe /Source:$sourceFolder /Dest:$jdaVmDest /s /xo /y /v:$verboseLogFile /pattern:PJR* /NC:500 /z:$journalLogFile

What problem was encountered? Verbose logging path should be allowed with same level of logging as v6.1

How can we reproduce the problem in the simplest way?

Have you found a mitigation/solution?

feature request

Most helpful comment

@zezha-msft Thanks you! Although I love the log file path being specified in each command, the environment variable can give us an option for now.

All 40 comments

Hi @jiteshkumargouthamchand, thanks for reaching out!

Since this is a backward compatibility issue, I'll let @seguler take a look first.

Hi @artemuwka, could you please take a look at this feature request?

We have the same issue regarding the log path. In V8 we can specify the log file path with option /V whereas in V10 there's no option to specify the log file path. Is this feature request accepted and in the backlog?

@mazong1123 you can specify the path for logs using an environment variable. Please refer to ./azcopy env

@zezha-msft Thanks you! Although I love the log file path being specified in each command, the environment variable can give us an option for now.

@zezha-msft do we have a PBI for this? Want to avoid duplicates.

@artemuwka I have not. Do you think this feature should be added?

We have a workaround for now so I'd suggest we add it to the triage items for now and look into it post GA.

You can do --log-level VERBOSE, but even the standard level (INFO) has everything you need. (in fact, it has too much, and we're going to slim it down a bit!)

In AzCopy v6 the verbose logging was pretty straight forward wherein each line had which file is being transferred but in v10 when we do debug or info logging there are lot of additional detail.. In case of failures or bulk uploads this feature will help in indeitifying the file or folders it failed whereas in v10 since logging is huge and lot of extra info it becomes difficult to find the issued file. Please let me know if there is an option for verbose logging in v10 as much as was in v6 ?

V10 does not use the same logging format as 6. Your best option is to follow the tips in the link I posted above.

In addition, the log file name is not definable. So in case we want to append such as Log_yy_mm_dd.log it does not support whereas in v6 you could define the file name as well.

The reason this is important is how do one know which log refers to which run in case we have multiple jobs running at same time.

V10 names are not date-based, they are job-based. So, unlike v6, in v10, if you run 3 different jobs on the same day, you get 3 different log files, each named with the job id (which is a GUID). We have no plans to change that to match v6, sorry.

But we do have plans to allow the _location_ of the log files to be changed.

Yes, you do know which run refers to which log, because the log filename is always the job ID.

BTW, to be clear, v10 is a total re-write, with a different architecture, command line syntax and implementation. That's why there are no plans to make any specific aspects of v10 match the corresponding aspects of v6

We have scenario where we run AzCopy for 300 servers one at a time and hence in v6 we were logging in same verbose log for all the 300 AzCopy in a single file. Also the AzCopy result of v6 is simple and straight forward based on which we retry in case of failures whereas with v10 we have lot of details and it is difficult to perform the retry.

With current v10 having separate job for each run will be a difficult for logging and troubleshoot purpose.

All 300 instances were writing to the same file?

yes as we cn give the file name along with verbose logging path and then we made dynamic with timestamp . So all 300 servers are in one verbose log.

Also sample AzCopy result of v10 is : sever 1 : [2019/06/30 13:01:13] Transfer summary: ----------------- Total files transferred: 410 Transfer successfully: 410 Transfer skipped: 0 Transfer failed: 0 Elapsed time: 00.00:01:02
server 2 : [2019/06/30 13:02:27] Transfer summary: ----------------- Total files transferred: 470 Transfer successfully: 470 Transfer skipped: 0 Transfer failed: 0 Elapsed time: 00.00:01:12

Whereas v10 has much more complex return and is difficult
We then use the failure count and re-try

Can you try something: try running AzCopy with --output-type json on the command line. That will give you machine-readable output in JSON form. Maybe you could capture that and make your retry decisions based on that.

BTW, I do not think we will ever support multiple instances of AzCopy v10 logging to the same log file. Sorry. But maybe something like the JSON solution above might let you do what you need.

The AzCopy result is actually including the file scan results when using sync which is not required. The items in Italic/Bold is suffice. THis will grow as we have a bigger set of data between src and dest. : Job 8d632115-f6cb-7442-6190-666b8a8c21fc has started Log file is located at: D:PassportCopyTest2LogsVerbose/8d632115-f6cb-7442-6190-666b8a8c21fc.log 0 Files Scanned at Source, 0 Files Scanned at Destination INFO: azcopy.exe: A newer version 10.2.0 is available to download 0 Files Scanned at Source, 0 Files Scanned at Destination 3213 Files Scanned at Source, 0 Files Scanned at Destination 6515 Files Scanned at Source, 0 Files Scanned at Destination 9788 Files Scanned at Source, 0 Files Scanned at Destination 13000 Files Scanned at Source, 0 Files Scanned at Destination 16245 Files Scanned at Source, 0 Files Scanned at Destination 19544 Files Scanned at Source, 0 Files Scanned at Destination 22811 Files Scanned at Source, 0 Files Scanned at Destination 26095 Files Scanned at Source, 0 Files Scanned at Destination 29392 Files Scanned at Source, 0 Files Scanned at Destination 32961 Files Scanned at Source, 0 Files Scanned at Destination 34514 Files Scanned at Source, 0 Files Scanned at Destination 34514 Files Scanned at Source, 5000 Files Scanned at Destination 34514 Files Scanned at Source, 15000 Files Scanned at Destination 34514 Files Scanned at Source, 20000 Files Scanned at Destination 34514 Files Scanned at Source, 25000 Files Scanned at Destination 34514 Files Scanned at Source, 30000 Files Scanned at Destination 34514 Files Scanned at Source, 40000 Files Scanned at Destination 34514 Files Scanned at Source, 45000 Files Scanned at Destination 34514 Files Scanned at Source, 50000 Files Scanned at Destination 34514 Files Scanned at Source, 55000 Files Scanned at Destination 34514 Files Scanned at Source, 60000 Files Scanned at Destination 34514 Files Scanned at Source, 70000 Files Scanned at Destination 34514 Files Scanned at Source, 75000 Files Scanned at Destination 34514 Files Scanned at Source, 80000 Files Scanned at Destination 34514 Files Scanned at Source, 90000 Files Scanned at Destination 34514 Files Scanned at Source, 95000 Files Scanned at Destination 34514 Files Scanned at Source, 105000 Files Scanned at Destination 34514 Files Scanned at Source, 110000 Files Scanned at Destination 34514 Files Scanned at Source, 115000 Files Scanned at Destination 34514 Files Scanned at Source, 120000 Files Scanned at Destination 34514 Files Scanned at Source, 125000 Files Scanned at Destination 34514 Files Scanned at Source, 135000 Files Scanned at Destination 34514 Files Scanned at Source, 145000 Files Scanned at Destination 34514 Files Scanned at Source, 150000 Files Scanned at Destination 34514 Files Scanned at Source, 160000 Files Scanned at Destination 34514 Files Scanned at Source, 165000 Files Scanned at Destination 34514 Files Scanned at Source, 175000 Files Scanned at Destination 34514 Files Scanned at Source, 180000 Files Scanned at Destination 34514 Files Scanned at Source, 185000 Files Scanned at Destination 34514 Files Scanned at Source, 195000 Files Scanned at Destination 34514 Files Scanned at Source, 200000 Files Scanned at Destination 34514 Files Scanned at Source, 210000 Files Scanned at Destination 34514 Files Scanned at Source, 215000 Files Scanned at Destination 34514 Files Scanned at Source, 225000 Files Scanned at Destination 34514 Files Scanned at Source, 230000 Files Scanned at Destination 34514 Files Scanned at Source, 240000 Files Scanned at Destination 34514 Files Scanned at Source, 250000 Files Scanned at Destination 34514 Files Scanned at Source, 255000 Files Scanned at Destination 34514 Files Scanned at Source, 265000 Files Scanned at Destination 34514 Files Scanned at Source, 270000 Files Scanned at Destination 34514 Files Scanned at Source, 275000 Files Scanned at Destination 34514 Files Scanned at Source, 285000 Files Scanned at Destination 34514 Files Scanned at Source, 290000 Files Scanned at Destination 34514 Files Scanned at Source, 300000 Files Scanned at Destination 34514 Files Scanned at Source, 310000 Files Scanned at Destination 34514 Files Scanned at Source, 315000 Files Scanned at Destination 34514 Files Scanned at Source, 320000 Files Scanned at Destination 34514 Files Scanned at Source, 330000 Files Scanned at Destination 34514 Files Scanned at Source, 335000 Files Scanned at Destination 34514 Files Scanned at Source, 345000 Files Scanned at Destination 34514 Files Scanned at Source, 350000 Files Scanned at Destination 34514 Files Scanned at Source, 360000 Files Scanned at Destination 34514 Files Scanned at Source, 370000 Files Scanned at Destination 34514 Files Scanned at Source, 375000 Files Scanned at Destination 34514 Files Scanned at Source, 385000 Files Scanned at Destination 34514 Files Scanned at Source, 390000 Files Scanned at Destination 34514 Files Scanned at Source, 400000 Files Scanned at Destination 34514 Files Scanned at Source, 405000 Files Scanned at Destination 34514 Files Scanned at Source, 410000 Files Scanned at Destination 34514 Files Scanned at Source, 420000 Files Scanned at Destination 34514 Files Scanned at Source, 430000 Files Scanned at Destination 34514 Files Scanned at Source, 435000 Files Scanned at Destination 233 Done, 0 Failed, 267 Pending, 500 Total, 2-sec Throughput (Mb/s): 0.0674 _Job 8d632115-f6cb-7442-6190-666b8a8c21fc Summary Files Scanned at Source: 34514 Files Scanned at Destination: 439426 Elapsed Time (Minutes): 2.4672 Total Number Of Copy Transfers: 500 Number of Copy Transfers Completed: 500 Number of Copy Transfers Failed: 0 Number of Deletions at Destination: 0 Total Number of Bytes Transferred: 2516284 Total Number of Bytes Enumerated: 2516284 Final Job Status: Completed_

That's the on-screen output, isn't it. Not the log?

yes but we use this and do our custom summary log using this output which is send as an attachment for 300 runs of AzCopy in one log. Also our retry we do usingthis output : $failedCnt = $result[$result.Length-6].Split(“:”)[1].Trim(). Hence this request to keep it simple and remove this unnecessry info

So another key question is if this position does not change in future releases $failedCnt = $result[$result.Length-6].Split(“:”)[1].Trim(). Please confirm

I will change the title of this, to focus on the log format (not the location) and re-open it and mark it as a feature request.

Note that the specific scenario of combining 300 logs into one file is something which we are probably not likely to support.

Thanks much.Pls. confirm on "So another key question is if this position does not change in future releases $failedCnt = $result[$result.Length-6].Split(“:”)[1].Trim(). Please confirm" where $result is the output of AzCopy which is on screen and we capture in this variable. Trying to understand that the line position from bottom on failure will not change.

So another key question is if this position does not change in future releases $failedCnt = $result[$result.Length-6].Split(“:”)[1].Trim(). Please confirm

When output-type is text (the default) I don't think we provide any guarantees about the positions or layout of the output. We don't recomend that people parse the text-mode output. If you want something more reliable, for parsing programmatically, I would recommend that you should use --output-type json. I think you can get the same information there. We would consider a change to the json output to be a breaking change, and so would usually not make such changes and if we had to make such a change (for some reason) we should document it as a breaking change in the changeLog.

JSON Output is different than plain text as you see for instance FilesScannedAtSource shows as 0 in JSON whereas in Plain there is a value. So hoping the CopyTransfersFailed element value exists there if there are failure count > 0. Below is output from plain and json:
0 Files Scanned at Source, 0 Files Scanned at Destination

Job f8373d72-92b5-1e4e-7262-2ed8afd0dd65 Summary
Files Scanned at Source: 11
Files Scanned at Destination: 115
Elapsed Time (Minutes): 0.0333
Total Number Of Copy Transfers: 3
Number of Copy Transfers Completed: 3
Number of Copy Transfers Failed: 0
Number of Deletions at Destination: 0
Total Number of Bytes Transferred: 9
Total Number of Bytes Enumerated: 9
Final Job Status: Completed

PS C:UserskdlqDownloadsazcopy_windows_amd64_10.1.2azcopy_windows_amd64_10.1.2> C:Userskdlq3D ObjectsAzCopyv10.ps1
{"TimeStamp":"2019-07-02T21:23:22.3151673-07:00","MessageType":"Init","MessageContent":"{"LogFileLocation":"C:\\Users\\kdlq/.azcopy/1e6988d1-4cdd-2343-5e36-2096462
2ed2d.log","JobID":"1e6988d1-4cdd-2343-5e36-20964622ed2d"}"}
{"TimeStamp":"2019-07-02T21:23:22.3151673-07:00","MessageType":"Progress","MessageContent":"{"FilesScannedAtSource":0,"FilesScannedAtDestination":0}"}
{"TimeStamp":"2019-07-02T21:23:24.3158543-07:00","MessageType":"Exit","MessageContent":"{"ErrorMsg":"","ActiveConnections":0,"CompleteJobOrdered":true,"JobStatus
":"Completed","CopyTotalTransfers":3,"CopyTransfersCompleted":3,"CopyTransfersFailed":0,"BytesOverWire":9,"DeleteTotalTransfers":0,"DeleteTransfersCompleted
":0,"DeleteTransfersFailed":0,"FailedTransfers":[],"PerfConstraint":0,"TotalBytesTransferred":9,"TotalBytesEnumerated":9}"}

I think it will be populated in the JSON, but you should test. Maybe do a download of a few small text files, but before you start the download, lock one of the existing files using the instructions here: https://superuser.com/a/649819 Then the download of that file should fail, and you'll have a failure you can test with.

If you plan to use --overwrite=false, you should also test how skipped files are reported. (I can't remember the details, sorry).

Sure i will test failure scenario. But FYI, there is an issue with FilesScannedAtDestination and source in JSON when there are files to be moved it shows as 0.

If i use JSon result of the AzCopy i will need to use something like this to get failed count:
$json = $Result | ConvertFrom-Json
$output= $json[2].MessageContent.Split(",")[6].Split(":")[1]

and in case i use the text format i need to use as this : failedCnt = $result[$result.Length-6].Split(“:”)[1].Trim().

Let me know if this is the right approach as i do not see much difference between json an dtext output. so please confirm. If you had a better way to parse in JSon do le tme know . Thanks.

You asked for something that will probably not change: that is the JSON. I'm not recommending it because its easier, but because it's less likely to change in future releases.

Sure. Thanks. So is this the right way to parse from JSON from the resultant of AzCopy to parse the failed count :
$json = $Result | ConvertFrom-Json
$output= $json[2].MessageContent.Split(",")[6].Split(":")[1]

I don't know, sorry. PowerShell JSON parsing falls outside the boundaries of normal AzCopy support, and it's not an area we're familiar with.

In case anybody arrives here like I did wondering how best to parse the JSON output from azcopy (v 10.3.0):

It appears the output string from the command is a series of newline-separated messages that are themselves parsable. If you do a split('n') on the output, then loop through each string and call JSON.parse() on it, you'll be able to get to the message type, message content, errors, etc. This is actually how the test suite in the azcopy repo itself handles the output (see https://github.com/Azure/azure-storage-azcopy/blob/master/testSuite/scripts/utility.py).

It would be nice if the CLI actually outputted real JSON that can be parsed with a single call, not the pseudo-JSON it currently produces. I would think it would just be a matter of creating an outer object with a single array property, then putting each message object inside that array.

IIRC, the reason they are separate messages is that an AzCopy job can run for a long time (hours or days if there's enough data to move), so we can't put all the output into one JSON object. Instead, we have to emit it incrementally to stdout as we go. The intention is that tools consuming it will consume it as a series of messages. Just as you are doing with the n split.

Thanks for pointing this out and documenting it here.

Makes sense, thanks for the info.

Closing due to inactivity.

Was this page helpful?
0 / 5 - 0 ratings