Hi all,
I am trying to download a "folder" inside of a blob container keeping the original tree folder structure. This "folder" contains millions of files.
For this reason first I run list_blobs just to obtain the list of blobs and then download the blobs using get_blob_to_path.
blobs = blob_service.list_blobs('blob_container','data/projects/folder')
for blob in blobs:
print(blob.name)
this function only shows the first 5000 blobs inside of data/projects/folder, but as I said, I have millions of files.
Any idea why this loop only shows the first 5000 elements?
Any other suggestion to download millions of files inside of a blob container?
Thanks in advance and best regards?
You'll need to set a marker for containers with > 5k entries. here is a code snippet I used:
marker = None
while True:
results = blob_service.list_blobs('blob_container',marker=marker, prefix=prefix, delimiter=delimiter)
#...do stuff with results ...
if results.next_marker:
marker = results.next_marker
else:
break
Basically you set the initial marker to None to start at the beginning, then loop until a result set does not return a pointer to a new marker.
Hi,
Really, thanks a lot, is working now. Altough I am not using "prefix" and "delimiter" since I am not sure what is the purpose of those two parameters.
Best regards,
Prefix and delimiter are utility parameters for fetching file lists. For example let's say you have a directory structure like this:
.
โโโ data
โย ย โโโ archive
โย ย โย ย โโโ old_file1.csv
โย ย โย ย โโโ old_file2.csv
โย ย โโโ file1.csv
โย ย โโโ file2.csv
โย ย โโโ file3.csv
โโโ logs
โโโ log.txt
If you set the prefix to 'data', you will get:
data/file1.csv
data/file2.csv
data/file3.csv
data/archive/old_file1.csv
data/archive/old_file2.csv
But if you also set the delimiter to '/', you will only get:
data/file1.csv
data/file2.csv
data/file3.csv
In this example, I used prefix to target a specific directory, and a delimiter of '/' to specify that I only want the files in that directory (and not subdirectories and their contents below it).
Many applications of these parameters, but this is what I use them for anyways.
Most helpful comment
Prefix and delimiter are utility parameters for fetching file lists. For example let's say you have a directory structure like this:
If you set the prefix to 'data', you will get:
But if you also set the delimiter to '/', you will only get:
In this example, I used prefix to target a specific directory, and a delimiter of '/' to specify that I only want the files in that directory (and not subdirectories and their contents below it).
Many applications of these parameters, but this is what I use them for anyways.