Azure-sdk-for-python: Page blob upload is slow

Created on 13 Dec 2019  路  6Comments  路  Source: Azure/azure-sdk-for-python

The Page blob upload is very compute intensive and slow. Uploading 30gb (~28gb empty chunks) images can take over an hour and peg a CPU the entire time:

$ time az storage blob upload --account-name {name} --container-name {name} --type page --name upload-test-upstream.vhd --file image.vhdfixed
...
real    80m6.493s
user    75m11.158s
sys    25m0.718s

It appears the issue is related to the _is_chunk_empty method which eats up a lot of CPU doing comparison of chunks.

def _is_chunk_empty(self, chunk_data):
        # read until non-zero byte is encountered
        # if reached the end without returning, then chunk_data is all 0's
        for each_byte in chunk_data:
            if each_byte != 0 and each_byte != b'\x00':
                return False
        return True

I think this comparison can be done more efficiently which will speed up the upload and limit the burden on the CPU.

Client Storage customer-reported

All 6 comments

A possible solution:

def _is_chunk_empty(self, chunk_data):
        return (not any(bytearray(chunk_data)))

Yields a significant performance enhancement:

$ time az storage blob upload --account-name {name} --container-name {name} --type page --name upload-test-upstream.vhd --file image.vhdfixed
...
real    13m22.084s
user    6m49.520s
sys    0m22.547s

@smarlowucf thanks for reporting this, someone from our team will take a look at this //cc: @mayurid @rakshith91

@smarlowucf Thanks for reporting the issue. Taking a look into this now.

Thanks for working with Microsoft on GitHub! Tell us how you feel about your experience using the reactions on this comment.

@smarlowucf Did some profiling on this and you are right to observe the bottleneck. Here are some slowest calls while uploading a 10gb page blob. Created a pr. Thanks a lot for reporting this.

Before: 440 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
2560 431.393 0.169 431.393 0.169 uploads.py:280(_is_chunk_empty)

After: 120 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
2561 7.103 0.003 7.103 0.003 {method 'read' of '_io.BufferedReader' objects}

Thanks for the quick response! :+1:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Koppens picture Koppens  路  4Comments

smereczynski picture smereczynski  路  4Comments

AmudhaPalani picture AmudhaPalani  路  4Comments

ghost picture ghost  路  4Comments

Korijn picture Korijn  路  3Comments