Describe the bug
When trying to push large amount of data (4000MB in my case) that's "readable stream" (e.g. BytesIO or file reader - anything implementing "read") then the upload speed caps at around 8Mbps.
(For the context I'm working on 4000MB block upload support for Azure Storage SDK).
To Reproduce
Execute test_put_block_stream_large with LARGE_BLOCK_SIZE bumped to some large value (i.e. 4000MB upcoming , or 100MB currently supported threshold).
OR
Use scenario from my fork as reference.
Expected behavior
Upload speed of "readable" data is not capped by httpclient and can leverage full network bandwith available.
Possible solution
The https://bugs.python.org/msg305571 suggest quite handy workaround that could be part of pipeline I guess. So far I didn't see any way to inject different blocksize to httpclient.
Screenshots
Original test:
I was uploading 4000MB of data in single request without any modifications using "readable" stream. That took over 1 hour!!


Turns out httpclient is using 8192 byte buffer when readable stream is passed:



Then I started to play with blocksize. I was editing http client's source and bumping the blocksize.
After bumpting to 8192*1024 upload speed was more than 2X faster


And after bumping it to 1081921024 I managed to upload that payload in ~4 and half minute.

Additional context
This is going to impact future users of "large block"/"large blob" (4000MB new limit for single block / 200TB limit for single blob). Users of that feature are most likely work with streams - either uploading data from network or data produced on the fly by computations. Therefore it's important to address this deficiency.
Thanks @kasobol-msft for the detailed description, @chlowell can you take a look at this
Sure. Looks like it's an issue users will encounter through the storage libraries, that may require a change in azure-core. @lmazuel, @rakshith91, @xiangyan99, your thoughts?
My first reaction is this is something we should fix ASAP. I will gather more data and investigate possible fixes.
I spent some time investigating this issue, and while I haven't fully isolated the root cause(s), I have several conclusions:
http.client or the requests package, but it does repro when using the azure-core Pipeline or the Storage SDK. But @kpajdzik has seen some regressions at the http.client layer from his personal machine.https://<account>.blob.core.windows.net/<container>/<blob>?<sas-token>.pip install -r requirements.txtpython app.py <url> <size> (10MB is a good starting size)Client: Azure VM, DS3_v2, West US 2, Windows Server 2019
Storage Account: Australia East, Premium BlockBlobStorage
[http.client, stream] Put 10,000,000 bytes in 0.39 seconds (195.21 Mbps), Response=201
[http.client, array] Put 10,000,000 bytes in 0.37 seconds (207.69 Mbps), Response=201
[requests, stream] Put 10,000,000 bytes in 0.39 seconds (195.05 Mbps), Response=201
[requests, array] Put 10,000,000 bytes in 0.37 seconds (208.01 Mbps), Response=201
[Pipeline, stream] Put 10,000,000 bytes in 12.97 seconds (5.88 Mbps), Response=201
[stage_block, stream] Put 10,000,000 bytes in 13.68 seconds (5.58 Mbps)
This shows that http.client and requests have no perf issues when uploading a stream, but the Azure SDK does have a perf issue at the Pipeline layer.
[Pipeline, stream] Put 5,000,000 bytes in 7.17 seconds (5.32 Mbps), Response=201
[Pipeline, stream] Put 5,000,000 bytes in 6.20 seconds (6.15 Mbps), Response=201
[Pipeline, stream] Put 5,000,000 bytes in 6.20 seconds (6.15 Mbps), Response=201
[Pipeline, array] Put 5,000,000 bytes in 0.65 seconds (58.37 Mbps), Response=201
[Pipeline, array] Put 5,000,000 bytes in 0.24 seconds (160.23 Mbps), Response=201
[Pipeline, array] Put 5,000,000 bytes in 0.23 seconds (163.72 Mbps), Response=201
[Pipeline, stream] Put 5,000,000 bytes in 0.23 seconds (166.16 Mbps), Response=201
[Pipeline, stream] Put 5,000,000 bytes in 0.22 seconds (173.42 Mbps), Response=201
[Pipeline, stream] Put 5,000,000 bytes in 0.22 seconds (173.34 Mbps), Response=201
[Pipeline, array] Put 5,000,000 bytes in 0.24 seconds (158.44 Mbps), Response=201
[Pipeline, array] Put 5,000,000 bytes in 0.23 seconds (165.01 Mbps), Response=201
[Pipeline, array] Put 5,000,000 bytes in 0.24 seconds (162.09 Mbps), Response=201
This shows that "Pipeline,stream" is much slower than "Pipeline,array". However, once "Pipeline,array" has been executed once, then both have the same perf.
A similar issue was reported against curl on Windows that might be related:
This should be fixed by #14442:
|Stream (Mbps)| |Array (Mbps)| |
-----|-----|-----|-----|-----
azure-core|Pipeline|stage_block|Pipeline|stage_block
1.8.2|5.93|5.88|287|302
@kasobol-msft: Would you like to verify as well?
@mikeharder works like a charm.