10.3.2
Windows
azcopy sync https://[storageacct].blob.core.windows.net/?[token] https://[destacct].blob.core.windows.net/?[token]
Failed with error service level URLs are not supported in sync
See command above.
Not in 1 line of code :)
Just checking, you know that copy supports this, right? (I know that's not necessarily a workaround, if you need _sync_, but just wanted to make sure you know).
cc @zezha-msft re the sync feature request.
@JohnRusk Yup, I might switch over to that in the meantime. If I'm trying to sync a storage account with 10M+ blobs on a regular basis (backing up an account), do you have a good way to estimate the cost/performance of doing so via sync and via copy? I've seen the wiki. If I use azcopy copy src dest --overwrite false, does that have similar performance/cost characteristics to sync?
@JohnRusk @zezha-msft Also, is it possible to have the destination container created if it doesn't exist for a sync, as well?
Hi @dpolivy, the copy command with --overwrite flag is much more performant by design, as it doesn't have to worry about the extra files at the destination that are not present at the source. Therefore, unlike sync, copy doesn't need to scan all of source and destination to perform the comparison, instead it looks up the files at the destination on a file-by-file basis, i.e. for every file at the source.
That being said, sync is probably less costly, as it sends much fewer I/O calls.
The purpose of sync is to compare two containers that already exist, thus it doesn't make sense to create the destination if it doesn't exist. Copy does that though.
Please let us know if you have other questions.
Our PM will take in this feature request and evaluate its priority. Thank you!
@zezha-msft I do think there's value in having sync create the destination if it doesn't exist. In an automated scenario, the first time you run sync, the destination may not exist. In that case, it's essentially just doing a copy. But, it's a lot easier to automate a single command (azcopy sync ...) than it is to update automation to test if the destination exists, if not, create it, then copy (or just sync). Hopefully that makes sense.
Perhaps you have some better suggestions for me. The scenario is using blob storage for lots of images and other website resources. While LRS/ZRS/GRS provide redundancy, if something happens to a blob (corruption, app error, etc), that change still gets replicated to all copies. Thus, there's a desire to have a second copy of the data to help avoid any data loss issues. At the same time, we want to minimize charges for API calls and data transfer on the storage accounts, since most of the data is not changing day-to-day. A simple solution is to just automate azcopy sync from source to the backup account on a regular schedule, with a modified folder structure at the destination to give some history, e.g. azcopy sync src.blob.core.windows.net/?token dest.blob.core.windows.net/[year]/[month]?token. I'd love it if azcopy could manage the rest if I just provided this command (sync the whole service, create destination containers/folders as necessary). I'm open to other approaches, but I've already spent a lot of time on this and just want to finally get it implemented :)
Hi @dpolivy, thanks for the insight, I haven't thought of sync in that context of perpetually running script. It makes sense to me, I'm just worried that regular (1-time sync) users might get confused, but maybe it's not a big deal. @rishabpoh thoughts?
For your backup scenario, are source blobs ever deleted? Are source blobs updated in place?
@zezha-msft Yes, blobs can be deleted (and we have soft delete on to help protect against that). In general, we do not update source blobs in place; if a change is made, the filename is modified. That's the intent.
Hi @dpolivy, if the following conditions are met:
Then I believe the copy command with --overwrite=false fits your need quite well.
@zezha-msft I can probably get by in the short term, but long term I'd love for this to be part of sync.
While I have you, I've got this (finally) running in an Azure Container Image to do the storage-to-storage account copy (different regions). I have 4 CPUs and 2 GB allocated to my container. In my test account, I have 6.4M blobs, all fairly small files (~2-50k). It's been running for almost 3 hours, and is only 10% done with 1M blobs copied and average throughput of 18Mbps. To me, that seems long, but is that considered normal for S2S transfers? Any tips to speed the process up?
Hi @dpolivy, glad to hear you are unblocked.
You can set AZCOPY_CONCURRENCY_VALUE (can be seen in ./azcopy env) to a higher value, like 1000, to improve the performance.
On very small files, it also helps to reduce AzCopy's logging level. E.g. with --log-level WARNING
Oh, and by default AzCopy 10.3.x checks that the blob sizes match after each copy. You can save on IOPS (and therefore get slightly better perf) by turning that check off. Use --check-length=false
@zezha-msft @JohnRusk Awesome, thanks for the tips, I'll give those a try. Have you seen any issues with S2S copies where turning check length off might mask bad copies, or is that not as necessary on the S2S jobs?
We enabled the length check to give an extra level of integrity checking. But, before and after adding that feature, we never saw any case when AzCopy actually needed it. I.e. we've never seen or heard of any length errors in AzCopy's S2S copies.
(And in fact, we've never heard of any length errors in AzCopy's uploads or downloads either - although there was once a related issue with downloading files that had content types that showed they were compressed. They were getting automatically decompressed when they shouldn't have been, and had their lengths truncated to the compressed length. That's the closest we've ever seen to an error with lengths, and it was really a compression issue rather than any data loss or corruption in transit, and it was fixed many releases ago).
@JohnRusk apparently this one becomes a show-stopper for us too. The ability to sync the whole storage account as a part of the automated procedure that does copy at the first run and syncs later.
IMO AzCopy should mirror the functionality of tools such as DistCp, and provide alternative in-portal means + Powershell to synch storage accounts.
any update on this functionality?
Most helpful comment
@JohnRusk apparently this one becomes a show-stopper for us too. The ability to sync the whole storage account as a part of the automated procedure that does copy at the first run and syncs later.