Azure-sdk-for-python: Customer is asserting that V12 Storage SDK has slower performance that V2.1 sdk

Created on 25 Jan 2020  路  11Comments  路  Source: Azure/azure-sdk-for-python

Ask from XClient : Provide suggestions towards this concern and whether we have any upcoming improvements in V12 along these lines.
The customer also wants to know about the support lifecycle timelines for v2, if they decide to continue using v2.

customer is comparing the performance of Python SDK v12 with Python Storage SDK v2.
They observe that SDK v2 performance is better while doing concurrent upload or download of 600 bytes blob 1800 times using ThreadPoolExecutor,.
The code and test results are present in the IcM attachments. Engineer was able to get similar results in-house too.

Ask from XClient : Provide suggestions towards this concern and whether we have any upcoming improvements in V12 along these lines.
The customer also wants to know about the support lifecycle timelines for v2, if they decide to continue using v2.

Client Service Attention Storage bug customer-reported

Most helpful comment

In case it helps anyone else out there: when moving from Azure blob storage v2 to v12, we experienced a significant regression in performance when uploading many small files (-50%). However, we discovered a seemingly undocumented session argument to BlobServiceClient which allowed us to share one session per process across many BlobServiceClient instances. We were doing this in v2, but originally thought it was not possible in v12. For example:

import requests
from azure.storage.blob import BlobServiceClient

storage_account_name = "mystorageaccount"
session = requests.Session()
clients = [BlobServiceClient(
    account_url=f'https://{storage_account_name}.blob.core.windows.net/',
    session=session,
) for client in range(10)]

Using a shared session led to fewer connection round-trips and resulted in performance matching previous levels seen on v2. The original problem was particularly pronounced in our case because we are uploading files from a region in Europe to one in America, but I imagine it can have an impact any time where there are many BlobServiceClient instances being created. If possible in your case, it should also be equivalent to use a single BlobServiceClient for all the uploads in a process.

All 11 comments

ICM 171649877 has the samples and repro steps

I was also able to reproduce this perf regression using the customer-provided applications. We are investigating the root cause and how we can make V12 meet or exceed the performance of V2.

@mikeharder any update on this? https://github.com/dask/adlfs/issues/57 is another result of the changes introduced with V12.

@mgsnuno: Thanks for reporting the issue at dask/adlfs#57. If this is indeed a perf regression in azure-storage-blob v12, we would like to investigate and fix it.

However, it's not clear to me that dask/adlfs#57 is related to this issue. It looks like dask/adlfs#57 is about large files, while this issue is specific to small files (600 bytes).

Can you please open a new issue in this repo with details for how to repro your scenario, and the size of the perf regression you are seeing from v2 to v12?

For the original scenario uploading and download 600 byte blobs, using the customer-provided repro app in the ICM, I was able to verify v12 is 7-21% slower than v2:

Scenario | Size | Parallel | 2.0.1 - time | 2.0.1 - ops/s | 12.1.0 - time | 12.0.1 - ops/s | 12.1.0/2.0.1 - ops/s
-- | -- | -- | -- | -- | -- | -- | --
UploadSerial | 600 | 1 | 13994 | 129 | 14992 | 120 | 93%
DownloadSerial | 600 | 1 | 9282 | 194 | 10313 | 175 | 90%
UploadParallel | 600 | 4 | 3864 | 466 | 4900 | 367 | 79%
DownloadParallel | 600 | 4 | 4324 | 416 | 5296 | 340 | 82%

I was also able to get similar results using the Python perf framework (https://github.com/mikeharder/azure-sdk-perf/tree/master/python/azure-storage-blob-perfstress).

I think the next step is for someone to profile the v2 and v12 code, and find areas for improvement in v12.

Here's the profile investigation Rakshith has done about this issue, while currently there's no obvious clue...https://microsoft.sharepoint.com/teams/AzureDeveloperExperience/_layouts/15/Doc.aspx?sourcedoc={690480f5-58fe-4d8e-ab3e-cb0dcb01b6f0}&action=edit&wd=target%28Services.one%7C7edf4d8f-3cf7-4200-bcd2-567d6a7e429c%2FPerformance%7Cb53f1222-0012-4b47-bcac-13f62d27a6a6%2F%29

Note: There was a perf improvement for parallel upload in version 12.3.2 https://github.com/Azure/azure-sdk-for-python/blob/azure-storage-blob_12.3.2/sdk/storage/azure-storage-blob/CHANGELOG.md#1232-2020-6-12

@mikeharder I understand your request, I haven't done it yet because it is pretty clear in dask/adlfs the issue as mentioned in https://github.com/dask/adlfs/issues/57, but I don't have code replicating with azure V12 storage alone. The main issue is not the file being big or small and more than speed, it's the memory usage, I see 2x-5x higher memory usage.

If you can replicate the issue as I described in dask/adlfs#57 awesome, then you can maybe debug the V12 usage. If you want an example without dask, I'll work on it.

@mgsnuno: We are eager to investigate any performance regression in v12, whether it's lower throughput or higher memory usage. But to investigate we will need a repro using only azure-storage-blob.

In case it helps anyone else out there: when moving from Azure blob storage v2 to v12, we experienced a significant regression in performance when uploading many small files (-50%). However, we discovered a seemingly undocumented session argument to BlobServiceClient which allowed us to share one session per process across many BlobServiceClient instances. We were doing this in v2, but originally thought it was not possible in v12. For example:

import requests
from azure.storage.blob import BlobServiceClient

storage_account_name = "mystorageaccount"
session = requests.Session()
clients = [BlobServiceClient(
    account_url=f'https://{storage_account_name}.blob.core.windows.net/',
    session=session,
) for client in range(10)]

Using a shared session led to fewer connection round-trips and resulted in performance matching previous levels seen on v2. The original problem was particularly pronounced in our case because we are uploading files from a region in Europe to one in America, but I imagine it can have an impact any time where there are many BlobServiceClient instances being created. If possible in your case, it should also be equivalent to use a single BlobServiceClient for all the uploads in a process.

@hermanschaaf: Thank you for reporting this.

The missing documentation is a known issue covered by #12122.

I opened a new issue #13797 to discuss changing Python to share HTTP connections by default, so you would not need to manually share a session.

Was this page helpful?
0 / 5 - 0 ratings