Azure-storage-azcopy: AzCopy sync uploads truncated file contents

Created on 11 Feb 2019  路  9Comments  路  Source: Azure/azure-storage-azcopy

Which version of the AzCopy was used?

azcopy version 10.0.7-Preview

Which platform are you using? (ex: Windows, Mac, Linux)

Same behaviour observed on Mac and Linux version

What command did you run?

azcopy sync "$PWD/public/" "https://storageaccount.blob.core.windows.net/\$web/$sas" --recursive

What problem was encountered?

All files in source directory are synchronised, however a large number of the files were truncated in the destination BLOB storage

  • The directory being synchronised contains 969 files
  • The number of files being updated in a sync include ~260 files (the remainder are considered the same and are not sync'd)
  • The directory is typical contents of a static website: images, HTML, CSS, etc
  • Example files being truncated include index.html (61KB)

How can we reproduce the problem in the simplest way?

Take a copy of my site's output (not publicly available, but can be provided on request), and sync with a BLOB storage account with static site enabled

Have you found a mitigation/solution?

I'm using the VSCode Azure Storage to successfully upload all files, albeit performing a full upload each time

Most helpful comment

Hi @mpowney, I'm extremely sorry for the inconvenience. I've confirmed there is indeed a corruption bug. Thank you for your awesome repro steps! I really appreciate your help!

Luckily, I rewrote the sync command recently(it's sitting on dev branch) and I've validated that this bug is fixed there. I'll update this thread once it is released(very soon).

All 9 comments

Hi @mpowney, thanks for reaching out!

How were you observing that the files were truncated? Were there any pattern in the files being affected? Were the files being modified as they are being synced?

The files are text files like html and css etc. The behaviour described in this bug seems to only occurs when a file already exists, but the contents has changed. Any new files uploaded are uploaded in full. I'll do some more digging to find out more ...

When the contents of the truncated file is copied back from the BLOB storage to my local machine, compared to the original file it is the same file size in bytes:

-rw-r--r--@ 1 mark  staff  59525 23 Feb 12:54 public/index-fail1.html
-rw-r--r--  1 mark  staff  59525 17 Feb 19:46 public/index.html

Although the file size is the same, the azcopy'd file seems to have extra characters embedded that pushes it out, it's as if the transfer has added extra data, but has stopped at the same byte size therefore truncating the last parts of the file.

It might be interesting to know, although it is an HTML (text) file, it does contain some binary data.

I may have found a similar problem investigating this bug further. And can reliably reproduce it. I've created a gist with three files:

  • first-index.html
  • second-index.html
  • third-index.html

If you take all three, then follow these steps:

  1. Copy first-index.html to a clean folder, and rename it to index.html
  2. Execute azcopy sync "$PWD/" "https://storageaccount.blob.core.windows.net/container/$sas" --recursive
  3. Copy second-index.html to the same folder, renaming it to index.html
  4. Execute azcopy sync "$PWD/" "https://storageaccount.blob.core.windows.net/container/$sas" --recursive

(these steps were performed using azcopy-preview 10.0.7 for Mac OS)

Observed behaviour:

The last portion of index.html when viewing it in the Azure web interface contains a series of unknown characters at the end.

screen shot 2019-02-23 at 1 49 35 pm

Following from the previous comment, the originally described behaviour can be reproduced with the three files from the gist:

  1. Copy third-index.html to a clean folder, and rename it to index.html
  2. Execute azcopy sync "$PWD/" "https://storageaccount.blob.core.windows.net/container/$sas" --recursive
  3. Copy first-index.html to the same folder, and rename it to index.html
  4. Execute azcopy sync "$PWD/" "https://storageaccount.blob.core.windows.net/container/$sas" --recursive

Observed behaviour:

The contents of index.html, when viewed via the Azure portal web interface, is truncated

Hi @mpowney, thank you so much for the detailed description on repro steps! We'll investigate and get back to you soon.

Hi @mpowney, I'm extremely sorry for the inconvenience. I've confirmed there is indeed a corruption bug. Thank you for your awesome repro steps! I really appreciate your help!

Luckily, I rewrote the sync command recently(it's sitting on dev branch) and I've validated that this bug is fixed there. I'll update this thread once it is released(very soon).

Hi @mpowney, sorry for the inconvenience, and I really appreciate your patience!

I've rewritten the sync command and released it in 10.0.8. This issue should be fixed. Please let us know if you still encounter any problem.

Here is a wiki explaining how the new sync command works. Please feel free to reach out if you have any question or concern.

Great, thanks for the quick responses and turn-around

Was this page helpful?
0 / 5 - 0 ratings