Is your feature request related to a problem? Please describe.
When I run the Storage SDK against a single account using a ThreadPool, multiprocessing.pool.ThreadPool, I see frequent warnings like urllib3.connectionpool:HttpConnectionPool is full, discarding connection. Urllib3 has a well-defined default connection pool max size of 10. When my ThreadPool is larger than the default, the warnings are frequent, probably correlating with an increase in round trip time.
Describe the solution you'd like
The ideal change would mean I can easily configure the underlying connection pool size. Maybe other parameters too, like number of pools.
Describe alternatives you've considered
I can eliminate the warnings on a 32-count pool using this workaround that patches the urllib3 pool parameters.
Additional context
Repro:
import logging
from azure.storage.blob import ContainerClient
from multiprocessing.pool import ThreadPool
from os import getenv
# Warnings only show up in logs
logging.basicConfig(
format='%(asctime)s - %(levelname)s [%(name)s] %(message)s',
level=logging.INFO)
logging.getLogger('azure').setLevel(logging.WARNING)
if __name__ == '__main__':
connection_string = getenv('AZURE_STORAGE_CONNECTION_STRING')
container_name = getenv('AZURE_STORAGE_CONTAINER_NAME')
thread_count = 32 # anything >10 should trigger a warning
num_blobs = 1000
client = ContainerClient.from_connection_string(
connection_string,
container_name)
pool = ThreadPool(thread_count)
def upload(i: int):
name = str(i)
client.upload_blob(name, data='', overwrite=True)
pool.map(upload, range(num_blobs))
Hi @ctstone
You should be able to pass your own session:
import requests
sess = requests.Session()
adapter = requests.adapters.HTTPAdapter(pool_connections=100, pool_maxsize=100)
sess.mount('https://', adapter)
client = ContainerClient.from_connection_string(
connection_string,
container_name,
session=sess
)
Could you try and let me know? I will try to get some time to try it too, but can't tell when precisely. Thanks!
Thanks @lmazuel, this works great!
adapter = HTTPAdapter(pool_maxsize=thread_count)
However, session is not a documented parameter on the ContainerClient. Guessing it gets passed down to some lower level Azure.Core handler...
Might I suggest adding it to all client doc strings?
@ctstone yes, the entire set of supported kwargs is not obvious, there is a generic list here with _most_ of them:
https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/core/azure-core/CLIENT_LIBRARY_DEVELOPER.md#available-policies
And this one was missing, so I created another issue:
https://github.com/Azure/azure-sdk-for-python/issues/12121
That makes the doc complete, but not discoverable, so created another issue:
https://github.com/Azure/azure-sdk-for-python/issues/12122
Shall we close this one? Additional thoughts?
Closing this issue since it's fixed in the provided solution. Thanks!
Most helpful comment
Thanks @lmazuel, this works great!
However,
sessionis not a documented parameter on theContainerClient. Guessing it gets passed down to some lower level Azure.Core handler...Might I suggest adding it to all client doc strings?