Azure-storage-azcopy: http request failed -> context canceled / context deadline exceeded / connection reset by peer

Created on 4 Sep 2019  路  9Comments  路  Source: Azure/azure-storage-azcopy

Environment

AzcopyVersion 10.2.1
OS-Environment linux
OS-Architecture amd64
Job-Command sync /home/yada https://redacted.blob.core.windows.net/media?se=2020-09-03t10%3A57%3A00z&sig=-REDACTED-&sp=racwdl&sr=c&st=2019-09-02t10%3A57%3A38z&sv=2018-03-28 --recursive --cap-mbps=10
Number of CPUs: 32
Max file buffer RAM 16.000 GB
Max open files when downloading: 1047774
Max concurrent transfer initiation routines: 64
Network speed is 300+ down / 25 up

Issue

95% of uploads fail. Files are usually 1 - 8 Gb. Errors in log are http request failed -> context canceled / context deadline exceeded / connection reset by peer + entries for 'perf: primary performance constraint is Unknown'

Most helpful comment

Yeah that's fixed this issue. Excellent , thanks!

All 9 comments

Hi @digiface, thanks for reaching out.

Could you please search for the keyword UPLOADFAILED instead? E.g. cat xxxx.log | grep UPLOADFAILED to see what are the actually failures.

Aha. Ok this is more like it: INFO: [P#0-T#0] Early close of chunk in singleChunkReader with context still active

'UPLOADFAILED' does not appear in most log files. I get sections like these:

2019/09/02 16:28:37 PERF: primary performance constraint is Unknown. States: R:  4, D:  0, W: 1236, F:  0, B: 300, T: 1540
2019/09/02 16:28:39 JobID=eb3235e1-e98f-7846-4407-c0cde955c58a canceled
2019/09/02 16:28:39 ==> REQUEST/RESPONSE (Try=3/3m49.575929024s[SLOW >3s], OpTime=26m31.717406729s) -- REQUEST ERROR
   PUT https://redacted.blob.core.windows.net/some.file?blockid=n2yynmrlzgqtmwqzmy04mjq3ltyynwetodawndnjzwm2ota1&comp=block&se=2020-09-03t10%3A57%3A00z&sig=-REDACTED-&sp=racwdl&sr=c&st=2019-09-02t10%3A57%3A38z&sv=2018-03-28&timeout=901
   Content-Length: [8388608]
   User-Agent: [AzCopy/10.2.1 Azure-Storage/0.7 (go1.12; linux)]
   X-Ms-Client-Request-Id: [85174742-40b3-409b-6302-52545c702cfa]
   X-Ms-Version: [2018-03-28]
   --------------------------------------------------------------------------------
   ERROR:
-> github.com/Azure/azure-pipeline-go/pipeline.NewError, /home/vsts/go/pkg/mod/github.com/!azure/[email protected]/pipeline/error.go:154
HTTP request failed

Put https://redacted.blob.core.windows.net/some.file?blockid=N2YyNmRlZGQtMWQzMy04MjQ3LTYyNWEtODAwNDNjZWM2OTA1&comp=block&se=2020-09-03T10%3A57%3A00Z&sig=-REDACTED-&sp=racwdl&sr=c&st=2019-09-02T10%3A57%3A38Z&sv=2018-03-28&timeout=901: context canceled

goroutine 104 [running]:

And

2019/09/02 16:07:16 ==> REQUEST/RESPONSE (Try=1/5m8.965240092s[SLOW >3s], OpTime=5m8.965502592s) -- REQUEST ERROR
   PUT https://redacted.blob.core.windows.net/z.mp4?blockid=nzk2zwu0ymitngflzc0znzqzltvmywutm2flmmzjzdnjzwqz&comp=block&se=2020-09-03t10%3A57%3A00z&sig=-REDACTED-&sp=racwdl&sr=c&st=2019-09-02t10%3A57%3A38z&sv=2018-03-28&timeout=901
   Content-Length: [8388608]
   User-Agent: [AzCopy/10.2.1 Azure-Storage/0.7 (go1.12; linux)]
   X-Ms-Client-Request-Id: [8eb2a167-522e-4e12-51d6-5b18adfcf16e]
   X-Ms-Version: [2018-03-28]
   --------------------------------------------------------------------------------
   ERROR:
-> github.com/Azure/azure-pipeline-go/pipeline.NewError, /home/vsts/go/pkg/mod/github.com/!azure/[email protected]/pipeline/error.go:154
HTTP request failed

Put https://redacted.blob.core.windows.net/some.file?blockid=Nzk2ZWU0YmItNGFlZC0zNzQzLTVmYWUtM2FlMmZjZDNjZWQz&comp=block&se=2020-09-03T10%3A57%3A00Z&sig=-REDACTED-&sp=racwdl&sr=c&st=2019-09-02T10%3A57%3A38Z&sv=2018-03-28&timeout=901: read tcp 172.16.15.1:43966->13.74.208.64:443: read: connection reset by peer

goroutine 274 [running]:
github.com/Azure/azure-storage-azcopy/ste.stack(0xc007ee40c0, 0xc000e3f000, 0x0)
/home/vsts/work/1/s/ste/xferLogPolicy.go:139 +0x9d

Occasionally one that says HTTP/1.x transport connection broken

@JohnRusk could you please take a look? The line that got logged originates from here.

Got a lot of them too in my problem with token refreshes . File size is sometimes only KBs

Any update/insights here would be appreciated. We now have 51Mpbs upload bandwidth and it gets degraded over about 30 minutes to 20Mpbs with AzCopy running capped at 10Mpbs. Uploads start off succeeding then everything gets chewed up and there's not enough meaningful information in the logs to know why it's happening.

Thanks for the reminder. I was travelling when the original issue came through, and have not caught up yet.

Can I make sure I've understood correctly: you have 51 Mbps in total. AzCopy is running capped at 10 Mbps, and over time the bandwidth available to _other_ traffic degrades to 20 Mbps (when you would have expected 40). Is that right?

Given your CPU count, the current version of AzCopy is going to default to using a very high connection count, even though that's unnecessary at your bandwidth. The best thing I can think of would be to do a test where you set the environment variable AZCOPY_CONCURRENCY_VALUE to about 4 or 8 (try 4 first). That will make it use only 4 connections, which is a much more reasonable number for your network. (Basically, you have a big machine - 32 CPUs - on an small network, and that's not something that the current release handles very well by default. But it should, in theory, be fine if you set that environment variable to a low value.)

Please let us know how that test works for you.

Ah I wondered where the nc flag had gone. Will give it a go now, thanks

Yeah that's fixed this issue. Excellent , thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

IGx89 picture IGx89  路  4Comments

martinwoodward picture martinwoodward  路  6Comments

brettrowberry picture brettrowberry  路  4Comments

ldepaiva picture ldepaiva  路  3Comments

TiloWiklund picture TiloWiklund  路  5Comments