We are trying to migrate historical data from an NFS server (that is still in use). We have date-based directories with hundreds of thousands of files in each directory (in some cases, several million files). Because we need to migrate this data from the still-active server, we are sensitive to adding load to the server, and are seeking information on the right settings for the various tunables for AzCopy.
Specifically, we're looking for guidance on the effects (and potential interactions) of AZCOPY_CONCURRENCY_VALUE, AZCOPY_CONCURRENT_FILES, AZCOPY_BUFFER_GB, and AZCOPY_TUNE_TO_CPU. We've been using the default values, and CPU load quickly rises to 50+ on this server. If left for long, NFS clients experience problems communicating with this server.
We're not in any particular rush to copy this data, so we're happy to reduce concurrency and performance (and increase the overall time to transfer) in order to not impair the primary function of this server; but I haven't been able to discern the recommended environment variable settings to accomplish this.
azcopy version 10.3.1
CentOS Linux release 7.6.1810 (Core)
$ lscpu | grep -m 1 CPU\(s\):
CPU(s): 4
$ free -h
total used free shared buff/cache available
Mem: 7.6G 274M 289M 369M 7.1G 6.4G
Swap: 2.0G 57M 1.9G
md5-adea22b59d29d603dc5fab296928e2be
for i in $(ls -tr); do
./azcopy copy $i/'*' "https://${ACCOUNT}.blob.core.windows.net/${YEAR}-${MONTH}-${i}/${SAS}" --put-md5 --log-level ERROR --cap-mbps 20;
done
CPU load rises to 50+. NFS clients experience timeouts.
Just to make sure I understand this right: you’re running azcopy on the server where you’re running nfsd, right?
You can always reduce cpu usage by using one of the methods described here: https://scoutapm.com/blog/restricting-process-cpu-usage-using-nice-cpulimit-and-cgroups
However, your load might still be impacted. Have you had a look at the iotop tool? Are you possibility maxing out the performance of your disk?
Stefan
- nov. 2019 kl. 17:25 skrev Scott Merrill notifications@github.com:

We are trying to migrate historical data from an NFS server (that is still in use). We have date-based directories with hundreds of thousands of files in each directory (in some cases, several million files). Because we need to migrate this data from the still-active server, we are sensitive to adding load to the server, and are seeking information on the right settings for the various tunables for AzCopy.Specifically, we're looking for guidance on the effects (and potential interactions) of AZCOPY_CONCURRENCY_VALUE, AZCOPY_CONCURRENT_FILES, AZCOPY_BUFFER_GB, and AZCOPY_TUNE_TO_CPU. We've been using the default values, and CPU load quickly rises to 50+ on this server. If left for long, NFS clients experience problems communicating with this server.
We're not in any particular rush to copy this data, so we're happy to reduce concurrency and performance (and increase the overall time to transfer) in order to not impair the primary function of this server; but I haven't been able to discern the recommended environment variable settings to accomplish this.
Which version of the AzCopy was used?
azcopy version 10.3.1
Which platform are you using? (ex: Windows, Mac, Linux)
CentOS Linux release 7.6.1810 (Core)
$ lscpu | grep -m 1 CPU(s):
CPU(s): 4
$ free -h
total used free shared buff/cache available
Mem: 7.6G 274M 289M 369M 7.1G 6.4G
Swap: 2.0G 57M 1.9G
What command did you run?for i in $(ls -tr); do
./azcopy copy $i/'*' "https://${ACCOUNT}.blob.core.windows.net/${YEAR}-${MONTH}-${i}/${SAS}" --put-md5 --log-level ERROR --cap-mbps 20;
done
What problem was encountered?CPU load rises to 50+. NFS clients experience timeouts.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
We have not yet tried nice or ionice, though they're on our radar. We were hoping that AzCopy could be told to slow down through the environment variables enumerated above. It seems that the default is to try to shovel as much data as possible up to Azure as quickly and as concurrently as possible. Throttling that through OS controls may be possible but that seems like a last resort, if the application has any means of reducing its throughput directly.
Is this right
you’re running azcopy on the server where you’re running nfsd, right?
Sorry, yes. That is correct: we are running azcopy on the NFS server itself.
Thanks. I wonder how much of the CPU usage is inside the AzCopy codebase, and how much is in the NFS file processing. Hard to say, and it probably doesn't matter because, no matter what the answer to that, the solution is probably to slow down AzCopy.
To really slow AzCopy down, here's what I'd suggest:
AZCOPY_CONCURRENCY_VALUE=AUTO # will start at 4 and auto-seek higher if needed, but probably won't need higher given your cap-mbps
AZCOPY_CONCURRENT_FILES=1 # if too slow, double and retest until you get speed you like
AZCOPY_BUFFER_GB # will probably have no effect when concurrent-files is very low. Set to 1 if in doubt.
AZCOPY_TUNE_TO_CPU # only used in benchmark mode. Not relevant for actual movement of real data.
# and continue to use --cap-mbps on the command line, as you are doing
If you run with these settings, and everything is fine, you might like to try increasing your cap-mbps value to get a faster speed (assuming you have available network bandwidth). If you run again with the higher cap, and AzCopy displays a message "Disk may be limiting speed" that might mean you should try slightly increasing the concurrent files setting (that will add load, so don't increase it more than necessary to reach your desired throughput).
Please let us know how this goes for you.
Thanks for the suggestions! This is what I was looking for. I'll give these a shot.
Can you elaborate on this, though?
how much is in the NFS file processing
Why would NFS be involved at all, if I'm running azcopy on the NFS server itself, and azcopy has direct access to the filesystem? I'm copying from the server the same directory that the server is exporting via NFS to other systems.
Thanks!
Oh, yeah. You're right, and I'm probably wrong. I hadn't considered that you were reading directly from that directory.
Identifying the right combination of these settings has proven challenging. After resuming a previously cancelled job of 6 million files, we've played with a couple of iterations. We've settled on AZCOPY_CONCURRENT_FILES=4 and AZCOPY_CONCURRENCY_VALUE=4.
Throughput is low (Throughput (Mb/s): 0.622), but this seems to keep CPU load at about ~4. On a 4 vCPU VM, we're okay with this balance. We're seeing about 100-200 files transferred per 2-second interval. While I'd like to see faster data transfer, we're not willing to take on any additional risk to CPU load since this is an active NFS server.
Thanks for the guidance! This got us moving forward with more confidence.
Most helpful comment
Just to make sure I understand this right: you’re running azcopy on the server where you’re running nfsd, right?
You can always reduce cpu usage by using one of the methods described here: https://scoutapm.com/blog/restricting-process-cpu-usage-using-nice-cpulimit-and-cgroups
However, your load might still be impacted. Have you had a look at the iotop tool? Are you possibility maxing out the performance of your disk?
Stefan