[Reported as comment on #243, and transcribed here by @JohnRusk
I am having this same issue using 10.2.1. I am using macOS. ulimit is unlimited. Working fine on Windows.
My CMD:
.\azcopy cp "source" "dest" --recursive=true
Error:
Job 0bce51fd-48da-4342-6d14-7c4ef2221479 has started
Log file is located at: /Users/bill/.azcopy/0bce51fd-48da-4342-6d14-7c4ef2221479.log
1496 Done, 0 Failed, 8504 Pending, 0 Skipped, 10000 Total (scanning...), 2-sec Throughput (Mb/s): 1395.1759panic: couldn't create job part plan file "0bce51fd-48da-4342-6d14-7c4ef2221479--00001.steV7": open /Users/bill/.azcopy/plans/0bce51fd-48da-4342-6d14-7c4ef2221479--00001.steV7: too many open files
goroutine 1 [running]:
github.com/Azure/azure-storage-azcopy/ste.JobPartPlanFileName.Create(0xc003a00080, 0x31, 0xbce51fd00000000, 0x4e7c146d434248da, 0x1791422f2, 0x30300000100, 0xc0000d4c90, 0xc0000d4cc0, 0x0, 0x0, ...)
/Users/vsts/agent/2.153.2/work/1/s/ste/JobPartPlanFileName.go:124 +0x1e15
github.com/Azure/azure-storage-azcopy/ste.ExecuteNewCopyJobPartOrder(0xbce51fd00000000,
@Will-Bill it would be helpful if we could see a couple of things for your log AzCopy log file. Could you please either zip it up and email it to me (my email address is in my profile) or, let us know the following info:
The "Max open files when downloading" (this will be displayed somewhere in the first 10 lines or so of the AzCopy log file)
The "PERF lines" (these will tell us how many files are actually open). Instructions on how to extract these lines from the log are here: https://github.com/Azure/azure-storage-azcopy/wiki/Disk-and-service-constraint-indicators-in-AzCopy#note-on-use-of-log-files-and-brief-tips-on-debugging-other-performance-issues-not-covered-above
Once we have that info from you, we'll see if we can figure out why this is happening.
@JohnRusk thanks. Here is the log:
2019/08/18 16:54:50 PERF: primary performance constraint is Unknown. States: W: 0, F: 0, S: 0, T: 0
2019/08/18 16:54:52 PERF: primary performance constraint is Unknown. States: W: 8248, F: 0, S: 208, T: 8456
Oh, is this copying from service to service (e.g. one blob account to another)? That's what it looks like. From that log, I would expect it to have about 200 to 300 open network connections, and about 3 (yes three) open files. Presumably MacOS counts both the network and file counts together, as open handles (just like Linux does). But that's still only a few hundred. So I'm at a loss to explain how this could be happening, except to wonder if something else your machine has lots of file handles open, and somehow that may be a factor. Let me ask around the team and see if we can come up with any ideas or tests we can run.
Yes, I am copying from a source container to a dest container (both in azure in different regions) using storage account shared access keys for both endpoints.
Thanks for confirming. That's super-weird that you're seeing a file related issue, since we hardly touch the file system in that case. I've asked our team's Mac guru, and we'll see what he thinks.
thanks.
It works with smaller data sets. It seems like when it gets passed scanning 10k objects it crashes.
E.G This is fine:
0 Done, 0 Failed, 298 Pending, 0 Skipped, 298 Total,
Job b5ec0271-6279-c14a-701d-5c070a5a29ba summary
Elapsed Time (Minutes): 0.0333
Total Number Of Transfers: 298
Number of Transfers Completed: 298
Number of Transfers Failed: 0
Number of Transfers Skipped: 0
TotalBytesTransferred: 25555826
Final Job Status: Completed
This is not:
Job 42392e26-04a4-f54b-447b-a5ea0f83a2be has started
Log file is located at: /Users/bill/.azcopy/42392e26-04a4-f54b-447b-a5ea0f83a2be.log
792 Done, 0 Failed, 9208 Pending, 0 Skipped, 10000 Total (scanning...), 2-sec Throughput (Mb/s): 241.4687panic: couldn't create job part plan file "42392e26-04a4-f54b-447b-a5ea0f83a2be--00001.steV7": open /Users/bill/.azcopy/plans/42392e26-04a4-f54b-447b-a5ea0f83a2be--00001.steV7: too many open files
Yes, I agree the 10k file count looks significant. Thanks for all the info.
Further info:
ulimit returns 'unlimited'
ulimit -a returns:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 256
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 2837
virtual memory (kbytes, -v) unlimited
Oh, I see this:
open files (-n) 256
That looks like the most likely cause of the issue to me. Unix-like OSs include open network connections in that count too, and AzCopy is using over 200 in your case. Can you try increasing that limit, to say 1000 and trying again. OR set the AZCOPY_CONCURRENCY_VALUE to, say 100, and leave your -n limit alone. That is also likely to solve it.
That did it, thanks for your help.
Great. On that basis, I'm going to close this issue.
I am having the same issue but modifying AZCOPY_CONCURRENCY_VALUE didn't help at all.
I tried setting it from 100 to 1000 but no dice. My ulimit output looks different from Will-Bill
ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-v: address space (kbytes) unlimited
-l: locked-in-memory size (kbytes) unlimited
-u: processes 2837
-n: file descriptors 256
Copy output
./azcopy copy "https://s3-us-west-2.amazonaws.com/dl-ore/path/folder/" "https://account.blob.core.windows.net/container/path/" --recursive=true
INFO: Scanning...
INFO: Using OAuth token for authentication.
INFO: AWS S3 to Azure Blob copy is currently in preview. Validate the copy operation carefully before removing your data at source.
INFO: trying to copy the source as bucket/folder/list of files
Job 1ad76d25-f4b2-e14c-591e-df41bc2d452b has started
Log file is located at: /Users/me123/.azcopy/1ad76d25-f4b2-e14c-591e-df41bc2d452b.log
4699 Done, 0 Failed, 55301 Pending, 0 Skipped, 60000 Total (scanning...), 2-sec Throughput (Mb/s): 92.6014panic: couldn't create job part plan file "1ad76d25-f4b2-e14c-591e-df41bc2d452b--00006.steV7": open /Users/me123/.azcopy/plans/1ad76d25-f4b2-e14c-591e-df41bc2d452b--00006.steV7: too many open files
Any help would be much appreciated.
This will be the key issue in ulimit: -n: file descriptors 256
Out of that 256, AzCopy normally defaults to using 64 for the payload files, a few more for plan and log files, and an amount equal to AZCOPY_CONCURRENCY_VALUE (plus some extra, in some cases). Although, in your case, since you're doing AWS to Azure, it won't be using the 64 for the payload files because there are no local files. So I would have expected a concurrency value of 100 to work. You could try something like 50. Or you could see if you can increase the number of file descriptors that are allowed in your ulimit.
BTW, in version 10.3 we are making some improvements that may also help with this issue in some cases.
PS, if neither of my suggested solutions work, please open a new issue to track it.
Thanks for the quick response.. yes I will open another ticket if needed.
So far 50 seems to be keeping it plugging along.
Thanks!
Most helpful comment
Oh, I see this:
open files (-n) 256
That looks like the most likely cause of the issue to me. Unix-like OSs include open network connections in that count too, and AzCopy is using over 200 in your case. Can you try increasing that limit, to say 1000 and trying again. OR set the AZCOPY_CONCURRENCY_VALUE to, say 100, and leave your -n limit alone. That is also likely to solve it.